Re: [jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
Hi Andraz, I am quite keen to implement this myself. We need this for our project as well. Your environment certainly seems more dynamic. Unfortunately, I haven't been able to find time for implementing this yet due to some immediate project deadlines. While I won't be able to work on this full-time, I am hoping that I will be able to invest size able amount of time in this after another 10-15 days or so. Thanks. Regards, -Vishal On Thu, May 20, 2010 at 6:36 AM, Andraz Tori (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869554#action_12869554] Andraz Tori commented on ZOOKEEPER-107: --- Has anything happened with this feature? There was some talk about what the most important use cases are on the mailing list. We're thinking of migrating home-grown solution to Zookeeper, but can't do it without dynamic addition/removal of the servers. If it helps, here's the use case: We're having fully cloudy solution. Every server that we put into the cluster runs a set of services that make themselves available to a local resource manager that shares the list of resources with all other servers in the cluster. When we do upgrades we simply fire up new servers with new versions of the services and connect their resource managers to the old ones into the same cluster. Then we simply shut down the old servers. Beside adding/removing servers when upgrading, we also do the same thing when we need to temporarily scale - we fire up a few more servers and connect their resource managers to the cluster to make the services available to the cluster. We never know how many servers there are going to be in the cluster and we don't assign any dns entries to them (just another point of failure). The clients that need to know about resources connect to any of the resource managers and get a list of all resources available and also about other resource managers. As servers move around they also can connect to different resource manager. This is a bit unusual configuration since cloud practically lives on its own without any kind of static addresses. As long as you are able to connect to it at one point in time, you can keep up with it 'motion'. So the idea was to migrate the above system to Zookeeper. Every service would connect to local Zookeeper and create ephemeral node announcing it. So every server would run its own Zookeeper node connected to the Zookeeper cloud. However without dynamic addition/removal of the servers all this becomes infeasible. Ideally we'd like to have a situation where we just start a Zookeeper node by giving it a list of known other Zookeeper nodes in the cloud. And then it should take on to the life of its own. Hope that the use case helps. I am really looking forward to this! Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Henry Robinson Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869554#action_12869554 ] Andraz Tori commented on ZOOKEEPER-107: --- Has anything happened with this feature? There was some talk about what the most important use cases are on the mailing list. We're thinking of migrating home-grown solution to Zookeeper, but can't do it without dynamic addition/removal of the servers. If it helps, here's the use case: We're having fully cloudy solution. Every server that we put into the cluster runs a set of services that make themselves available to a local resource manager that shares the list of resources with all other servers in the cluster. When we do upgrades we simply fire up new servers with new versions of the services and connect their resource managers to the old ones into the same cluster. Then we simply shut down the old servers. Beside adding/removing servers when upgrading, we also do the same thing when we need to temporarily scale - we fire up a few more servers and connect their resource managers to the cluster to make the services available to the cluster. We never know how many servers there are going to be in the cluster and we don't assign any dns entries to them (just another point of failure). The clients that need to know about resources connect to any of the resource managers and get a list of all resources available and also about other resource managers. As servers move around they also can connect to different resource manager. This is a bit unusual configuration since cloud practically lives on its own without any kind of static addresses. As long as you are able to connect to it at one point in time, you can keep up with it 'motion'. So the idea was to migrate the above system to Zookeeper. Every service would connect to local Zookeeper and create ephemeral node announcing it. So every server would run its own Zookeeper node connected to the Zookeeper cloud. However without dynamic addition/removal of the servers all this becomes infeasible. Ideally we'd like to have a situation where we just start a Zookeeper node by giving it a list of known other Zookeeper nodes in the cloud. And then it should take on to the life of its own. Hope that the use case helps. I am really looking forward to this! Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Henry Robinson Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863359#action_12863359 ] Vishal K commented on ZOOKEEPER-107: Hi Henry, We are using ZK for one the projects at VMware. We are very much interested in having dynamic membership managment. I went through the dev mailing list above . I would like to contribute and develop this feature. It sounds like a fun project. Can you please provide an update regarding how far we are with this and any documentation that you may have? I will start off a separte discussion thread regarding this on the dev mailing list instead of having it over the jira. Thanks. Regards, -Vishal Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Henry Robinson Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
ZOOKEEPER-107 - Allow dynamic changes to server cluster membership
Hi Henry, I just commented on the Jira. I would be happy to contribute. Please advise on the current status and next steps. Thanks. Regards, -Vishal
Re: ZOOKEEPER-107 - Allow dynamic changes to server cluster membership
Hi Vishal - Great that you're interested in contributing! This would be a really neat feature to get into ZK. The documentation that exists is essentially all on the JIRA. I had a patch that 'worked' but was nowhere near commit-ready. I'm trying to dig it up, but it appears it may have gone to the great bit-bucket in the sky. Trunk has moved sufficiently that a new patch would be required anyhow. There were two main difficulties with this issue. The first is changing the voting protocol to cope with changes in views. Since proposals are pipelined, the leader needs to keep track of what the view was that should vote for a proposal. IIRC, the other subtlety is making sure that when a view change is proposed, a quorum of votes is received from both the outgoing view and the incoming one. Otherwise it's possible to transition to a 'dead' view in which no progress can be made. The second is to figure out the metadata management - how do we 'find' ZooKeeper servers if the ensemble may have moved onto a completely separate set of machines? That is, if the original ensemble was on A, B, C and the current ensemble is D, E, F - where do we look to find where the ensemble is located? The first is a solved issue, the second is more a matter of taste than designing distributed protocols. Really happy to help with this issue - I'd love to see it get resurrected. cheers, Henry On 3 May 2010 07:25, Vishal K vishalm...@gmail.com wrote: Hi Henry, I just commented on the Jira. I would be happy to contribute. Please advise on the current status and next steps. Thanks. Regards, -Vishal -- Henry Robinson Software Engineer Cloudera 415-994-6679
Re: ZOOKEEPER-107 - Allow dynamic changes to server cluster membership
Hi Henry, Thanks for the info. I will spend some more time to understand the issues before starting with the implementation. I will let you know if I have any questions (which I am sure I will). Just to clarify, by solved issue you mean from design perspective and not from implementation right? Regards, -Vishal On Mon, May 3, 2010 at 1:16 PM, Henry Robinson he...@cloudera.com wrote: Hi Vishal - Great that you're interested in contributing! This would be a really neat feature to get into ZK. The documentation that exists is essentially all on the JIRA. I had a patch that 'worked' but was nowhere near commit-ready. I'm trying to dig it up, but it appears it may have gone to the great bit-bucket in the sky. Trunk has moved sufficiently that a new patch would be required anyhow. There were two main difficulties with this issue. The first is changing the voting protocol to cope with changes in views. Since proposals are pipelined, the leader needs to keep track of what the view was that should vote for a proposal. IIRC, the other subtlety is making sure that when a view change is proposed, a quorum of votes is received from both the outgoing view and the incoming one. Otherwise it's possible to transition to a 'dead' view in which no progress can be made. The second is to figure out the metadata management - how do we 'find' ZooKeeper servers if the ensemble may have moved onto a completely separate set of machines? That is, if the original ensemble was on A, B, C and the current ensemble is D, E, F - where do we look to find where the ensemble is located? The first is a solved issue, the second is more a matter of taste than designing distributed protocols. Really happy to help with this issue - I'd love to see it get resurrected. cheers, Henry On 3 May 2010 07:25, Vishal K vishalm...@gmail.com wrote: Hi Henry, I just commented on the Jira. I would be happy to contribute. Please advise on the current status and next steps. Thanks. Regards, -Vishal -- Henry Robinson Software Engineer Cloudera 415-994-6679
Re: ZOOKEEPER-107 - Allow dynamic changes to server cluster membership
Hi Vishal - That's right - design, not implementation! I'd encourage you to share a design document once you feel you understand exactly what's required. This is probably going to be complex patch and reviewers will need a study guide :) cheers, Henry On 3 May 2010 10:26, Vishal Kher vishalm...@gmail.com wrote: Hi Henry, Thanks for the info. I will spend some more time to understand the issues before starting with the implementation. I will let you know if I have any questions (which I am sure I will). Just to clarify, by solved issue you mean from design perspective and not from implementation right? Regards, -Vishal On Mon, May 3, 2010 at 1:16 PM, Henry Robinson he...@cloudera.com wrote: Hi Vishal - Great that you're interested in contributing! This would be a really neat feature to get into ZK. The documentation that exists is essentially all on the JIRA. I had a patch that 'worked' but was nowhere near commit-ready. I'm trying to dig it up, but it appears it may have gone to the great bit-bucket in the sky. Trunk has moved sufficiently that a new patch would be required anyhow. There were two main difficulties with this issue. The first is changing the voting protocol to cope with changes in views. Since proposals are pipelined, the leader needs to keep track of what the view was that should vote for a proposal. IIRC, the other subtlety is making sure that when a view change is proposed, a quorum of votes is received from both the outgoing view and the incoming one. Otherwise it's possible to transition to a 'dead' view in which no progress can be made. The second is to figure out the metadata management - how do we 'find' ZooKeeper servers if the ensemble may have moved onto a completely separate set of machines? That is, if the original ensemble was on A, B, C and the current ensemble is D, E, F - where do we look to find where the ensemble is located? The first is a solved issue, the second is more a matter of taste than designing distributed protocols. Really happy to help with this issue - I'd love to see it get resurrected. cheers, Henry On 3 May 2010 07:25, Vishal K vishalm...@gmail.com wrote: Hi Henry, I just commented on the Jira. I would be happy to contribute. Please advise on the current status and next steps. Thanks. Regards, -Vishal -- Henry Robinson Software Engineer Cloudera 415-994-6679 -- Henry Robinson Software Engineer Cloudera 415-994-6679
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851426#action_12851426 ] Quinton Hoole commented on ZOOKEEPER-107: - OK, not to worry. I just found the answer to my question here: http://www.mail-archive.com:80/zookeeper-dev@hadoop.apache.org/msg07382.html Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Henry Robinson Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Reg: Status of ZOOKEEPER-107 - Allow dynamic changes to server cluster membership
There are ways to dynamically update the cluster membership today, granted while they are dynamic they are not dynamic in the sense of 107: http://bit.ly/bTTdDK (see item 6) 107 is still pending. Henry was interested to implement this, I'm not sure his current status. Henry? I think this is a good feature to add, but it brings up a number of issues that we need to work through. Given that most ppl rarely change cluster membership, and the fact that there are options available today, there hasn't been a lot of demand for this. Patrick Hariharan Subramanian wrote: Dear Zookeeper community, I would like to know the status of issue ZOOKEEPER-107 (Allow dynamic changes to server cluster membership) which is assigned to Henry Robinson Is this the right forum to ask this question or should I post an update to the bug report? Thanks -- Hari
Re: Reg: Status of ZOOKEEPER-107 - Allow dynamic changes to server cluster membership
Hi - Yes, I'm still interested in this JIRA but it's a significant change to ZK which would be best suited for probably the 4.0 release. I have a rough implementation that 'works' but still leaves some issues unresolved. I don't have a great deal of time for this issue at the moment though, which is why it's stalled a bit - other ZK issues are more pressing. This would be a really great project for anyone looking for a meaty distributed systems project; I'm happy to offer guidance and advice if anyone wants to take it on! Henry On 18 February 2010 09:58, Patrick Hunt ph...@apache.org wrote: There are ways to dynamically update the cluster membership today, granted while they are dynamic they are not dynamic in the sense of 107: http://bit.ly/bTTdDK (see item 6) 107 is still pending. Henry was interested to implement this, I'm not sure his current status. Henry? I think this is a good feature to add, but it brings up a number of issues that we need to work through. Given that most ppl rarely change cluster membership, and the fact that there are options available today, there hasn't been a lot of demand for this. Patrick Hariharan Subramanian wrote: Dear Zookeeper community, I would like to know the status of issue ZOOKEEPER-107 (Allow dynamic changes to server cluster membership) which is assigned to Henry Robinson Is this the right forum to ask this question or should I post an update to the bug report? Thanks -- Hari -- Henry Robinson Software Engineer Cloudera 415-994-6679
Re: Reg: Status of ZOOKEEPER-107 - Allow dynamic changes to server cluster membership
FYI, Ben has been working on Netty integration with ZooKeeper. One of the benefits of this is it provides encryption and certificate based connection auth - which would be a great way (cert) to solve some of the security issues highlighted in 107. So the pieces are there, interest is there, support is there, as Henry mentioned we need to find the resources. Patrick Henry Robinson wrote: Hi - Yes, I'm still interested in this JIRA but it's a significant change to ZK which would be best suited for probably the 4.0 release. I have a rough implementation that 'works' but still leaves some issues unresolved. I don't have a great deal of time for this issue at the moment though, which is why it's stalled a bit - other ZK issues are more pressing. This would be a really great project for anyone looking for a meaty distributed systems project; I'm happy to offer guidance and advice if anyone wants to take it on! Henry On 18 February 2010 09:58, Patrick Hunt ph...@apache.org wrote: There are ways to dynamically update the cluster membership today, granted while they are dynamic they are not dynamic in the sense of 107: http://bit.ly/bTTdDK (see item 6) 107 is still pending. Henry was interested to implement this, I'm not sure his current status. Henry? I think this is a good feature to add, but it brings up a number of issues that we need to work through. Given that most ppl rarely change cluster membership, and the fact that there are options available today, there hasn't been a lot of demand for this. Patrick Hariharan Subramanian wrote: Dear Zookeeper community, I would like to know the status of issue ZOOKEEPER-107 (Allow dynamic changes to server cluster membership) which is assigned to Henry Robinson Is this the right forum to ask this question or should I post an update to the bug report? Thanks -- Hari
Re: Reg: Status of ZOOKEEPER-107 - Allow dynamic changes to server cluster membership
FYI, Ben has been working on Netty integration with ZooKeeper. One of the benefits of this is it provides encryption and certificate based connection auth - which would be a great way (cert) to solve some of the security issues highlighted in 107. So the pieces are there, interest is there, support is there, as Henry mentioned we need to find the resources. Would any community members be interested in pushing this with Henry? We (Canonical/Ubuntu) are also very interested in both the encryption and dynamic membership features. We don't have resources to work on it right now, but might try help somehow. -- Gustavo Niemeyer http://niemeyer.net
Re: Reg: Status of ZOOKEEPER-107 - Allow dynamic changes to server cluster membership
Thanks to all of you that responded. This feature is also important to the project I am working on. Although we would have to find someone with the right skills to contribute to it. Henry, Do you have an estimate on how long this task would take if someone were to start with your prototype? Thanks again ~ Hari On Thu, Feb 18, 2010 at 1:35 PM, Henry Robinson he...@cloudera.com wrote: Hi - Yes, I'm still interested in this JIRA but it's a significant change to ZK which would be best suited for probably the 4.0 release. I have a rough implementation that 'works' but still leaves some issues unresolved. I don't have a great deal of time for this issue at the moment though, which is why it's stalled a bit - other ZK issues are more pressing. This would be a really great project for anyone looking for a meaty distributed systems project; I'm happy to offer guidance and advice if anyone wants to take it on! Henry On 18 February 2010 09:58, Patrick Hunt ph...@apache.org wrote: There are ways to dynamically update the cluster membership today, granted while they are dynamic they are not dynamic in the sense of 107: http://bit.ly/bTTdDK (see item 6) 107 is still pending. Henry was interested to implement this, I'm not sure his current status. Henry? I think this is a good feature to add, but it brings up a number of issues that we need to work through. Given that most ppl rarely change cluster membership, and the fact that there are options available today, there hasn't been a lot of demand for this. Patrick Hariharan Subramanian wrote: Dear Zookeeper community, I would like to know the status of issue ZOOKEEPER-107 (Allow dynamic changes to server cluster membership) which is assigned to Henry Robinson Is this the right forum to ask this question or should I post an update to the bug report? Thanks -- Hari -- Henry Robinson Software Engineer Cloudera 415-994-6679 -- --… …-- .- .-. ..
Re: Reg: Status of ZOOKEEPER-107 - Allow dynamic changes to server cluster membership
Hari - It's very difficult to say how long this would take. There are a number of issues that need to be resolved by design rather than code. For example, the question of finding the cluster when the membership has changed is unanswered. The changes to the core protocol will need substantial peer review and testing. Judging by my experience with the observers patch, which I think is comparable in size but probably less complex, I would expect about 3-6 months from the time someone picks up the work. cheers, Henry On 18 February 2010 12:28, Hariharan Subramanian hurryha...@gmail.comwrote: Thanks to all of you that responded. This feature is also important to the project I am working on. Although we would have to find someone with the right skills to contribute to it. Henry, Do you have an estimate on how long this task would take if someone were to start with your prototype? Thanks again ~ Hari On Thu, Feb 18, 2010 at 1:35 PM, Henry Robinson he...@cloudera.comwrote: Hi - Yes, I'm still interested in this JIRA but it's a significant change to ZK which would be best suited for probably the 4.0 release. I have a rough implementation that 'works' but still leaves some issues unresolved. I don't have a great deal of time for this issue at the moment though, which is why it's stalled a bit - other ZK issues are more pressing. This would be a really great project for anyone looking for a meaty distributed systems project; I'm happy to offer guidance and advice if anyone wants to take it on! Henry On 18 February 2010 09:58, Patrick Hunt ph...@apache.org wrote: There are ways to dynamically update the cluster membership today, granted while they are dynamic they are not dynamic in the sense of 107: http://bit.ly/bTTdDK (see item 6) 107 is still pending. Henry was interested to implement this, I'm not sure his current status. Henry? I think this is a good feature to add, but it brings up a number of issues that we need to work through. Given that most ppl rarely change cluster membership, and the fact that there are options available today, there hasn't been a lot of demand for this. Patrick Hariharan Subramanian wrote: Dear Zookeeper community, I would like to know the status of issue ZOOKEEPER-107 (Allow dynamic changes to server cluster membership) which is assigned to Henry Robinson Is this the right forum to ask this question or should I post an update to the bug report? Thanks -- Hari -- Henry Robinson Software Engineer Cloudera 415-994-6679 -- --… …-- .- .-. .. -- Henry Robinson Software Engineer Cloudera 415-994-6679
Re: Reg: Status of ZOOKEEPER-107 - Allow dynamic changes to server cluster membership
I think it would be good to have a design document, just to reason about it and make sure that the mechanism we are introducing is not flawed. It is easy to get these things wrong. -Flavio On Feb 18, 2010, at 11:30 PM, Henry Robinson wrote: Hari - It's very difficult to say how long this would take. There are a number of issues that need to be resolved by design rather than code. For example, the question of finding the cluster when the membership has changed is unanswered. The changes to the core protocol will need substantial peer review and testing. Judging by my experience with the observers patch, which I think is comparable in size but probably less complex, I would expect about 3-6 months from the time someone picks up the work. cheers, Henry On 18 February 2010 12:28, Hariharan Subramanian hurryha...@gmail.comwrote: Thanks to all of you that responded. This feature is also important to the project I am working on. Although we would have to find someone with the right skills to contribute to it. Henry, Do you have an estimate on how long this task would take if someone were to start with your prototype? Thanks again ~ Hari On Thu, Feb 18, 2010 at 1:35 PM, Henry Robinson he...@cloudera.comwrote: Hi - Yes, I'm still interested in this JIRA but it's a significant change to ZK which would be best suited for probably the 4.0 release. I have a rough implementation that 'works' but still leaves some issues unresolved. I don't have a great deal of time for this issue at the moment though, which is why it's stalled a bit - other ZK issues are more pressing. This would be a really great project for anyone looking for a meaty distributed systems project; I'm happy to offer guidance and advice if anyone wants to take it on! Henry On 18 February 2010 09:58, Patrick Hunt ph...@apache.org wrote: There are ways to dynamically update the cluster membership today, granted while they are dynamic they are not dynamic in the sense of 107: http://bit.ly/bTTdDK (see item 6) 107 is still pending. Henry was interested to implement this, I'm not sure his current status. Henry? I think this is a good feature to add, but it brings up a number of issues that we need to work through. Given that most ppl rarely change cluster membership, and the fact that there are options available today, there hasn't been a lot of demand for this. Patrick Hariharan Subramanian wrote: Dear Zookeeper community, I would like to know the status of issue ZOOKEEPER-107 (Allow dynamic changes to server cluster membership) which is assigned to Henry Robinson Is this the right forum to ask this question or should I post an update to the bug report? Thanks -- Hari -- Henry Robinson Software Engineer Cloudera 415-994-6679 -- --… …-- .- .-. .. -- Henry Robinson Software Engineer Cloudera 415-994-6679
Reg: Status of ZOOKEEPER-107 - Allow dynamic changes to server cluster membership
Dear Zookeeper community, I would like to know the status of issue ZOOKEEPER-107 (Allow dynamic changes to server cluster membership) which is assigned to Henry Robinson Is this the right forum to ask this question or should I post an update to the bug report? Thanks -- Hari -- --… …-- .- .-. ..
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731776#action_12731776 ] Raghu S commented on ZOOKEEPER-107: --- @henry, Sorry if this sounds like a repeat, thought I will summarize the error handling during view change. Could you comment if this makes sense? -- 1. Configuration change succeeds if the change is successfully committed in both the old view and the new view. An observer is promoted to a follower only after it receives a COMMIT for the new view. 2. Each peer could have two views of the cluster -- the last committed view and the last proposed view (which is created after a VIEWCHANGE proposal is received). The latter can be NULL if there is no view change attempt in progress. 2.A. Each peer will always attempt an election with the last committed view. Proposed views will be converted to committed views (or deleted) post leader election. 2.B. The proposal record of a peer contains (in addition to last logged ZXID and server ID) the last committed view of the peer 3. During election, if the last committed view of the peer with the smaller ZXID (P(ZXLOW)) is different from the last committed view of the peer with the higher ZXID (P(ZXHIGH), then P(ZXLOW) adapts P(ZXHIGH)'s last committed view and broadcasts the adapted view to all other peers. 3.A. Two nodes with the same ZXID should have the same committed views 3.B. If the last committed views of P(ZXLOW) and P(ZXHIGH) are the same, but P(ZXHIGH) has a proposed new view (not committed yet though), that view will not be considered by both the peers during election. Similarly, if the N(ZXLOW) has a proposed view, that will not be considered either. 3.C. If P(ZXLOW) adapts P(ZXHIGH)'s last committed view and that view doesn't include P(ZXLOW), P(ZXLOW) drops out of election (should it self destruct??) 4. Once a leader is elected, it will sync up the logs of the followers that are lagging behind just like it's done today: - If there is a follower who's last committed view is different from the leader's, log synchronization will make sure follower's last committed view gets updated to be in sync with the leader's. Follower doesn't do anything when its last committed view changes (the new view MUST have the follower since 3.C prevents a follower that is not in the leading candidate's committed view from successfully completing an election) - If there is an observer who upon log synchronization learns that the committed view includes the observer, the observer will promote itself to a follower - If a follower with a proposed view joins an already established leader who doesn't know about that proposed view, the follower's proposed view will be erased when the leader synchronizes the followers log - If the leader has a proposed new view in its log, the leader will send a COMMIT for the new view after majority peers in the old view and the new view have synced their log to the leader's log 4.A. The view change COMMIT doesn't mean much for the followers that are not impacted by the view change 4.B. The observer that gets view change COMMIT will promote itself to a follower if the new view includes the observer 4.C. The follower that gets the view change will drop out of the cluster if the new view doesn't include the follower 4.D. The leader will drop out of the cluster once COMMIT is delivered locally if the new view doesn't include the leader. This will result in a new election. 4.E. The leader will adjust the quorum size as per the new view otherwise. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Henry Robinson Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731056#action_12731056 ] Patrick Hunt commented on ZOOKEEPER-107: I've only been following this a bit, and I see bits/pieces in the comments but not sure I follow it all -- some questions around the plan wrt manageability: 1) adding removing servers, server itself needs to be configured, any changes needed to config on existing ensemble? I see Raghu has similar comment on this 2) JMX - what's the plan? what additional properties/actions will be supported? 3) 4letter words - same issues as jmx 4) debug-ability - ensure adequate logging (log4j) on ensemble 5) security - will an ensemble allow any server to connect to it? today we have ensemble participants hardwired into the config of each of the servers right? testing and b/w compat -- are we ensuring b/w compat btw this version and previous versions? (I'm probably going to look at beefing up unit systest next, esp around b/w compat, so would be good to have a better idea where this is headed). IMO this patch must include unit as well as systest before it is committed. documentation will be needed as well. Perhaps a wiki proposal page should be created that will capture the current proposal for easy review of this feature? This JIRA can capture ongoing discussion, with agreed upon results capture in the wiki design/functional document. I know it would help me alot. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Henry Robinson Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730384#action_12730384 ] Raghu S commented on ZOOKEEPER-107: --- Sorry to jump around bit, I thought I will mention this if we haven't already talked about it. How do we plan to deal with a situation when a set of nodes can form a majority but can't form an ensemble because one or more peers have a grossly outdated configuration? Say an ensemble of ABCDE moved to EFGHI while E was offline and only EFG are up? They form a majority but can't form an ensemble since E doesn't know about any of the other servers yet? One way to address this is to implement an out of band synchronization mechanism in which E will realize that the ensemble has changed when F and G try to connect to E and have one them synchronize E's logs since their last know zxids are ahead of E's. E can then attempt to restart an election. Also, it is possible that F and G could see different ensembles (F is a bit out dated, G is the most up to date), in which case E might first sync up form F and then both E and F sync up form G if G comes online a bit later. Any simpler solutions? Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Henry Robinson Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730414#action_12730414 ] Henry Robinson commented on ZOOKEEPER-107: -- Raghu: Your solution is essentially what will happen. F and G will contact E while they are trying to elect a leader. During this process they can all exchange the most recent view that they saw so that E realises the current view. If EFG form a quorum in any view then we can see that either a) it is the latest view or b) at least one of them will know about a later view. Therefore there's also no concern about resurrecting old views. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Henry Robinson Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728589#action_12728589 ] Flavio Paiva Junqueira commented on ZOOKEEPER-107: -- I think this discussion is really interesting, but we can we move the discussion on the behavior of the observer to ZOOKEEPER-368? I'll add my comments on the last set of comments there. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Henry Robinson Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12722314#action_12722314 ] Flavio Paiva Junqueira commented on ZOOKEEPER-107: -- That's a great catch, Henry, the one related to having any new (perhaps invalid) follower being able to submit requests. When you start a new follower not in the configuration, do you run it as a regular replica and let it find its way or you explicitly tell the follower to connect to the leader? I'm not sure if we should discuss detail of the observer here or in the other jira, but I'm wondering how an observer is able to find the leader to connect. The default leader election uses identifiers to connect and form quorums, so I'm not sure a server not in the configuration would be able to determine which replica is the leader. I think we can do it with leader election 0, though, if a leader has been elected and is running Are you planning on having observers as a separate feature, as per ZK-368? It would be great to have it, since you are going through the effort of implementing it already. As for the message to observers containing the transaction, the advantage of having a special message (e.g., INFORM) is that we cut down the number of messages to observers: INFORM is essentially a COMMIT containing the request. If we don't change the protocol, then we can just have the leader sending a PROPOSAL to everyone, including the observers. As observers will receive the COMMIT as well, we have higher message complexity. For now, I'm good either way. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Henry Robinson Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12722368#action_12722368 ] Henry Robinson commented on ZOOKEEPER-107: -- I think the issue of how to locate an ensemble whose makeup has changed needs to be discussed separately. I've got an idea for how I'd suggest doing it, but will leave that until I've got the view change stuff working. Once a new leader has been elected, it will need to publish this somewhere (probably both internal to ZK in /zookeeper/ensemble and externally). Observers can use one of those routes to find the leader. At the moment, Observers are just followers that a) can't make most mutable proposals b) don't get either PROPOSE or COMMIT messages, just INFORM ones with the payload and c) can propose view changes, not necessarily to include themselves. So an Observer attaches to a leader, syncs and maybe listens in on the proposal stream for a while and then upgrades itself by issuing a view change request. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Henry Robinson Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12722153#action_12722153 ] Flavio Paiva Junqueira commented on ZOOKEEPER-107: -- +1, I think it is a good idea to use observers (ZOOKEEPER-368). This way we make sure that once the new configuration is committed the new active member is in sync with the leader. I have a slightly different idea of how to make it work, though. I was thinking that once the observer finsihes synchronizing with the leader, it can simply submit a setData. This way we have no special code path for this operation. Only when finalizing the setData operation we have to update all appropriate data structures. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Henry Robinson Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12722245#action_12722245 ] Henry Robinson edited comment on ZOOKEEPER-107 at 6/20/09 12:09 PM: As it turns out, I've pretty much implemented Observers in all but name already - they go through the same connection logic as normal followers, and therefore sync, but are disbarred from sending Leader.REQUEST packets to the leader. Similarly, when a leader is sending a proposal packet it only gets sent to those followers which are in the current view. Since the logic is very similar, and we will be able to distinguish observers from followers by whether they are members of the current view, I haven't duplicated code into Observer* classes. I added this when finding that any new follower can join an existing ensemble and issue proposals to it, even if the static configuration of the ensemble does not contain it. This seemed to deadlock the ensemble pretty quickly :) Edit: of course, this means that Observers can't actually see the payload of a transaction, as per the note on ZK-368. Either the leader sends special packets (INFORM, perhaps) to Observers containing the transaction payload, or the Observers must know not to participate in voting. That said, the Leader will ignore the votes of Observers, but we want to cut down on traffic. was (Author: henryr): As it turns out, I've pretty much implemented Observers in all but name already - they go through the same connection logic as normal followers, and therefore sync, but are disbarred from sending Leader.REQUEST packets to the leader. Similarly, when a leader is sending a proposal packet it only gets sent to those followers which are in the current view. Since the logic is very similar, and we will be able to distinguish observers from followers by whether they are members of the current view, I haven't duplicated code into Observer* classes. I added this when finding that any new follower can join an existing ensemble and issue proposals to it, even if the static configuration of the ensemble does not contain it. This seemed to deadlock the ensemble pretty quickly :) Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Henry Robinson Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721697#action_12721697 ] Flavio Paiva Junqueira commented on ZOOKEEPER-107: -- I suggest that the system starts as a standalone instance and the other replicas join by contacting the standalone replica using the new dynamic membership mechanism. This way we avoid pre-loading a configuration. An important observation is that there will be a transition from standalone to ensemble, which I think won't be difficult to deal with in the code, but we have to make sure that this observation is correct. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12722063#action_12722063 ] Benjamin Reed commented on ZOOKEEPER-107: - i think if we use the notion of observers it helps: an observer can sync with a leader, but it doesn't get to vote. i think this makes it easy because the leader can then determine that it can commit with both the active followers and active observers if needed: for example start with A, B, C and move to A, B, D, E, F. if A and C are active followers and E and F are observers then the leader will propose the new configuration. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Henry Robinson Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721547#action_12721547 ] Henry Robinson commented on ZOOKEEPER-107: -- Yes, I'd like to take ownership of implementing this. I'd like to have a patch available within one to two weeks. There are some implementation issues to work through which might take time (for example, how do we manage the connections between joining followers and the current leader - who connects to whom?). I see the initial version of the patch simply as adding functionality to the core protocol. Adding any extensions to the client APIs would come in a second revision. Ironing out the kinks in the first patch will also doubtless take some time. Does that sound ok? You can go with an unstable implementation as soon as the patch is released. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721554#action_12721554 ] Raghu S commented on ZOOKEEPER-107: --- That sounds great! I know this is a complex task and lot of work, can live with the kinks in the beginning. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721566#action_12721566 ] Raghu S commented on ZOOKEEPER-107: --- Henry, the JIRA is unassigned. You might want to assign it to yourself. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720790#action_12720790 ] Benjamin Reed commented on ZOOKEEPER-107: - oh right. you are correct. i guess it is more of a liveness/correctness issue: 1) start with A, B, C, D 2) B is down and A is the leader and proposes LEAVE C and fails where only D gets it. 3) C and D cannot get quorum since C has an older view. 4) D fails 5) A and B come back up and B is elected leader. 6) B proposes LEAVE A and C gets it before B fails. Now what happens? we cannot get quorum with just A and C since A has the old view. even if D comes up it will not elect C because it does not believe C is part of the ensemble. if they all come up either C or D can be elected leader, but if C is elected you end up with conflicting views: A thinks (B, C, D), B thinks (B, C, D), C thinks (B, C, D), and D thinks (A, B, D), so both A and D will effectively be out of the ensemble and you can't tolerate any failures. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720174#action_12720174 ] Flavio Paiva Junqueira commented on ZOOKEEPER-107: -- I think having the messages explicitly in the protocol helps to convey the implemented abstraction, making it easier to read and understand. However, it is bad for backward compatibility, although it might be the case that we silently ignore unknown messages. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720323#action_12720323 ] Benjamin Reed commented on ZOOKEEPER-107: - just a caveat to my last comment. for point 1) we actually do need to touch the protocol code a bit to ensure that the setData that changes the view commits in both the old and new views. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720334#action_12720334 ] Raghu S commented on ZOOKEEPER-107: --- Ben, to be honest, I wasn't thinking batch addition/deletion. I was thinking we will allow only one node to join or leave the cluster at a time, in which case we won't end up in a split brain. One thing I am still missing is, how do we plan to reconcile the divergence in conifguration info during leader election if we use ZAB? With ZAB, we go ahead and write to the log as soon as a PROPOSAL is sent. COMMIT is used only to notify the servers that the a majority have logged the update and the clients can start reading the new update. So I am not really seeing how this will help configuration change. Now in the example that you bring up, if D, E and F have logged the new view and all the nodes are brought up after a power cycle, a split brain could still occur, no? Should we allow only one node to be added/deleted at a time? Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720380#action_12720380 ] Benjamin Reed commented on ZOOKEEPER-107: - so if you do one at a time without using Zab, without working through the details 1) start with A, B, C, D 2) A is the leader and proposes LEAVE D and fails where only A and C get it. 3) B is the leader and proposes LEAVE C and fails where only B and D get it because of a complete power outage. 4) everything comes back up 5) A is elected leader by C 6) B is elected leader by D if we use ZAB split brain will not occur because we do not use the configuration until it has been committed. since it has been accepted by both the old and new quorums, we will eventually converge on the new configuration. (that is my conjecture, still needs to be proven) Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720411#action_12720411 ] Raghu S commented on ZOOKEEPER-107: --- Ben, I still believe the split brain won't occur: A. After (2), A and C have config verion X + 1, B and D are at X B. After A dies, a leader election is not possible without C. During LE, B and D discover that C is at X + 1. This will force B and D to update their configuration to X + 1 and restart the election. This is what I refer to when I say reconciling configuration divergence in my write up. D now leaves the cluster since it just learnt that it was deleted. C. A new quorum is formed with B and C. D. When A comes back, config version of A B and C are the same. A will simply join the leader. If A were still at X, then it will first update it's configuration to X + 1 when it starts an election and then restart the election. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719641#action_12719641 ] Henry Robinson commented on ZOOKEEPER-107: -- Hi - Thanks for the proposal - it does a really good job of framing the important questions. I am in favour of a solution that uses ZAB and the existing consensus framework for dynamic group membership. I believe this can be achieved without an out-of-band protocol or significant changes to the way the current protocols work; this has the advantage of keeping things simple. I'm not certain I've read your proposal correctly, but it seems that step 6 has followers commit the CONFIGCHANGE proposal on receipt, rather than waiting for a COMMIT message. By my understanding of ZAB, this means there is a possibility where fewer than a quorum of followers will commit this proposal, if the leader fails halfway through sending the proposal messages, leading to the possibility of divergent histories at followers. The tool approach is one way of wrapping up the authentication required if an ensemble wishes to restrict those nodes that can join it. Currently there is some implicit authentication done as the leader only establishes connections with followers that belong to the static membership. However there's certainly a need, as a result of this JIRA, for a better authentication mechanism inside ZK. I see this as orthogonal to the mechanisms required to do dynamic membership. I suggest that we simply augment the current ZooKeeper protocol with four new proposals: NEWVIEW, GETVIEW, JOIN and LEAVE. NEWVIEW proposes an entirely new view, and may aggregate many JOIN or LEAVE proposals into one. Since NEWVIEW likely requires knowledge of the current view, GETVIEW returns the current view and its version. JOIN and LEAVE incrementally change the current view, whatever it is, and so do not require a GETVIEW call to establish the current view. All proposals go through the usual ZAB two-phase protocol, except for the fact that the leader coordinating the current ZAB instance must wait for acknowledgements from quorums in both the current and new view before committing the change. It's possible that this can lead to the proposal blocking if a quorum cannot be assembled in either view. Although it might seem an error that the proposal will block even if a quorum in the current view can be established, the same behaviour would be observed even if the proposal could be committed - all subsequent proposals would require a quorum from the new view and would block. If an ensemble is currently blocked due to the failure of n/2 + 1 nodes, it is not possible to resume progress by issuing a LEAVE on behalf of the failed nodes; however in general failed nodes may both JOIN and LEAVE the ensemble. If a leader election is required during a proposal, there are no correctness issues assuming the current required invariants of ZAB leader election hold. In particular, as long as the new leader has seen the most recent proposals then the view change proposal will be committed once the new leader is elected. This property will be maintained without changes to the current leader election protocols - as the view change proposal will have been seen by a quorum from the current view, the new leader is guaranteed to have a record of the proposal. A node that fails after it has issued a join proposal, but before it hears of its success must be able to find the status of the proposal once it recovers. There are several ways to do this. I have some sketches of correctness proofs for this and could produce a more detailed design document if required - however, if there's consensus that this is the right approach I'd rather get coding :) It turns out after much agonising that ZK's existent invariants are already pretty much strong enough to build this protocol. The only extension is the requirement to listen for two different sets of quorum acknowledgements. I've deliberately avoided the issue of exposing the view to the outside world (although this requires attention, as new nodes need to be able to find the ensemble!) - I have outlined some ideas earlier in this JIRA and I know other people have good suggestions, but I think we can solve both issues independently. Would love to hear comments, things that I've missed, errors in logic etc. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719823#action_12719823 ] Benjamin Reed commented on ZOOKEEPER-107: - Raghu, i think henry is correct that you must get an ack from quorums in both the old and new views before committing the change. otherwise you get split brain which could result in multiple leaders. henry, i think we are thinking along the same lines, but i'm a bit skeptical of JOIN and LEAVE. in some sense they are a bit of an optimization that can be implemented with GETVIEW and NEWVIEW. it would be nice to make the mechanism as simple as possible. it also seems like you would also require a GETVIEW to be done before doing a NEWVIEW, just for sanity. (require an expected version on NEWVIEW and not allow a -1.) i was thinking that we would just push NEWVIEW through Zab making sure we get acks from quorums in both the old and new views. to help mitigate the case where proposing the NEWVIEW leads to a case where the system freezes up when the NEWVIEW proposal goes out and there isn't a quorum in the new view, the leader should probably make sure that it currently has quorum of followers in the new view before proposing the request. if it doesn't, it should error out the request. even with this we can still freeze up if we lose quorum in the new view after issuing the proposal, but that would happen anyway (as you point out), but it would prevent us from doing something that has no chance of working. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719831#action_12719831 ] Flavio Paiva Junqueira commented on ZOOKEEPER-107: -- In general I like Henry's solution, and I think it works. However, I'm not entirely convinced that we need to augment the protocol with messages such as JOIN and LEAVE. I believe we can make it work by simply writing to a special znode and reading from it, which we need to do anyway if we want to use the mechanisms we have in place for durability. Of course, the leader has to follow changes to this znode and adapt its behavior accordingly (e.g., when sending proposals and commits). Followers, as far as I can tell, only need to register the changes to the znode as they make no use of such information, only for leader election I also agree that there is an authentication problem as we don't want some arbitrary machine trying to join an ensemble. If you're willing to share your proof sketches, I would be pleased to take a look at them. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-107: Fix Version/s: 3.2.0 Assignee: Patrick Hunt Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.2.0 Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-107: Fix Version/s: (was: 3.2.0) Assignee: (was: Patrick Hunt) updated the wrong jira! :) Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12706320#action_12706320 ] Benjamin Reed commented on ZOOKEEPER-107: - the information needed for bootstrapping is the same as the information needed for a normal zookeeper client, so it could either use the standard string that is a list of host:port pairs, or it could use the scheme proposed in ZOOKEEPER-390. with that URL it could fetch /.zookeeper/ensemble and grab the configuration information that it needs. conf/zoo.cfg isn't really a good URI for this purpose since is doesn't really have the needed client ports. plus there is information in zoo.cfg that is particular to a given server. for example, the data and log directories may be different on all the machines. the client port should also probably stay in the zoo.cfg. the server lists and probably the timing variables should probably be stored in a znode and maintained with the atomic broadcast. recovery is a bit more than you mention, but at the same time simpler. first off, to change quorum configuration you must commit the change in both the old quorum configuration and in the new quorum configuration. for example, if you have the configuration A, B, C and you are changing to A, B, C, D, E you must be able to get quorum in both the old and new configuration for the change to work. if only A and B are up or A, D, and E are up you cannot commit the change. this means that the leader should check the new configuration carefully before proposing it, because we always roll the proposals forward, we never rollback. so really a zookeeper server doesn't know whether he is able to participate or not, the election will sort it out. a simple example is an ensemble A, B, C, D, E. E goes down. the last zxid it saw was 57,3. while it is down the quorum configuration gets changed to A, B, C by 57,52. lets say there is a leadership change and at 58,6 the power goes out and comes back on. E now tries to vote (it thinks it is permitted to participate), but it won't win any election since its zxid is too low. A, B, and C will ignore E's votes anyway because they know that E has been removed from the ensemble. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12706697#action_12706697 ] Raghu S commented on ZOOKEEPER-107: --- I think there are some corner cases that may make the leader election impossible during a node addition. Say the current config is A,B,C and the new config is A,B,C,D. When the leader is trying to commit the new configuration, the power goes out and comes back on when only A and B have logged the new configuration. Peer count in A,B,C,D = 4,4,3,3 now. An election is not possible if C is down because A and B think the majority is 3 peers and D can't participate in the election since it hasn't joined the cluster yet. It sounds like some out of band communication between an existing peer and a new peer is needed to make this thing work. If a peer restarts or notices quorum loss and if the last logged update is a node addition, the peer should try to contact the newly added server so that it can push it's log to the new peer (if the new peer doesn't already have an up to date log) and ask the new peer to restart. Until A or B do that in the above case, an election may not be possible. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12706097#action_12706097 ] Benjamin Reed commented on ZOOKEEPER-107: - i agree with everything you are saying and yes to all the questions. it's not as strange as it sounds. today we have to pre-populate the cluster config. it would just be that now rather than creating a file with vi we would need to use a utility to create an initial snapshot that has the config in it. i think this would also help with some deployment errors by tightly tying the data with the cluster config. the previously mentioned utility would also allow you to avoid having to start with a single node cluster and growing from there. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12706122#action_12706122 ] Mahadev konar commented on ZOOKEEPER-107: - Henry, one thing I would like to point out is that please post a concrete proposal (since this invloves the core internals of zookeeper) before you start working on this, so that their is agreement and no wasted effort... Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12706219#action_12706219 ] Henry Robinson commented on ZOOKEEPER-107: -- I agree with pretty much everything I've read here, (in particular, the importance of getting consensus!), but wanted to clarify my initial comment a bit. Rather than choose between strategies 1 and 2 as outlined by Benjamin, I think there's a hybrid approach needed. If a node is a member of a quorate cluster, then the most up to date membership information should be available to it in a znode. I think this is the most elegant approach, and is trivially achieved by pushing join/leave requests through the atomic broadcast pipeline. If a node is joining the cluster, it needs to be able to bootstrap the location of the cluster from somewhere. There therefore needs to be a externally available resource containing a list of machines in the cluster that is at least accurate for one machine (as a joining node will try all servers in that list in turn). When I say available at some URI, this is what I mean. Currently, this information is kept statically at a URI that addresses conf/zoo.cfg on the local filesystem. I suggest generalising that to a general URI. One nice property is that it then does not tie a cluster to a particular machine, as the URI provides a level of indirection. It is then the cluster administrator's responsibility to keep this URI up-to-date (although of course this should be automated), possibly via a client that just pulls membership information from the cluster periodically. As I said earlier, it's only important for the contents of this list to have one node in common with the true membership of the cluster, so it's allowed to get a bit out of sync. We can certainly easily imagine ways that ZK can help here. Of course the URI must be highly available, but it also has to exist, otherwise we could have 'orphaned' clusters that are running on machines whose identity we don't necessarily know. The URI can be a front for almost any scheme we like - periodic heartbeating of live nodes is one. The format of this file can be anything at all - from a serialised snapshot to a list of ip:port pairs, as long as it contains enough information for a client to find the cluster. Personally I would prefer human readable, simple formats. To talk about recovery for a moment: when a node recovers from a crash and rejoins the cluster, it can help the cluster elect a master if the cluster is current non-quorate. This is because it was originally part of the cluster, and therefore the protocol guarantees that a quorum of nodes including the recovering one will have seen all committed proposals (this is important to correctness). If the node was not originally a member of the cluster, it must not help get a master elected as it cannot be part of a quorum. Similarly, a node cannot query the cluster to find out if it was originally a member because the quorum required to do so might not exist. Therefore every node that ever successfully joins a cluster must store this fact in its own persistent storage, as only it can know whether it is permitted to help run the election. Finally, the startup problem. Given a URI, nodes can bootstrap themselves onto a cluster simply by being told to start in startup mode. Alternatively, a single node can be distinguished (again, in the URI contents perhaps) which will start in single-node mode and process join requests one-by-one. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12705794#action_12705794 ] Benjamin Reed commented on ZOOKEEPER-107: - sounds great henry! it would be great if you could work on this. i think we have two strategies: 1) have the cluster agree on a list of servers and use the atomic broadcast to agree on changes. (this might be a bit more difficult with the flexible quorum configuration. right flavio?) this is mostly in line with your first three points. btw, i don't think you need to quiesce for this or even do the sync. i think you can do a conditional update. 2) use some external resource file indicated by a URL to define the machines that make up a cluster. this is in line with your last point and you hint at this with your first point. i think the first approach is safer and more reliable. the second is easier to implement and easier to see what is going on, but i during transition time you have a problem as the resource file propagates through the cluster. (you could have different members with different views.) the thing i was thinking of for the first option is exposing the cluster config through a znode '/.zookeeper/ensemble' or something like that. then changing the configuration would be as simple as conditionally setting a new version of that file. the tricky part is that you could only commit the change if you have a quorum of followers in both the old and the new configuration. this seems to be in line with what you are thinking correct? Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12705192#action_12705192 ] Henry Robinson commented on ZOOKEEPER-107: -- This is something I'd be willing to work on. Just to sum up my current understanding of the requirements: 1. Must support off-cluster getPeers operation for a recovering peer to bootstrap itself (can cache in its own persistent storage, but that could potentially be out of date by recovery time). This is probably best realised with the URI idea as before. 2. Support for join and leave operation. With a quiescent cluster, join is probably as simple as a sync followed by a commit of the new peer's id to all followers (if nothing else, this ensures that if one of them should be elected the master, they know how big the quorum should be). Leaves are similar, without the sync obviously. If a peer leaves before the Leave( ) operation completes, it will look like a crash. 3. If joining / leaving a cluster that doesn't have a currently elected master, block until one exists. If the cluster is currently failed due to f+1 failures, it might be necessary to timeout in order to prevent being permanently blocked if this is in the middle of a code path. 4. However, if joining / leaving a cluster that has never bootstrapped it's important to do something different so as to allow the cluster to achieve a quorum. One solution is for a node to check if its id is in the list of peers at the cluster URI which will tell it if it was ever a member of the cluster previously (or part of the initial membership) and then participate in master elections. This places a requirement on the peer list to be kept reasonably accurate (but this could only affect liveness, not safety, I think). Please chime in with comments / stuff that I've missed / bugs, otherwise I'll work on fleshing this out. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12619035#action_12619035 ] Benjamin Reed commented on ZOOKEEPER-107: - +1 I like the idea. You can currently use DNS for this functionality: make zookeeper.acme.com resolve to 5 different IP addresses and then specify new ZooKeeper(zookeeper.acme.com:3233, 1000, this), but DNS is hard to modify. A replicate webserver would be much easier to update. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12619070#action_12619070 ] Patrick Hunt commented on ZOOKEEPER-107: Obviously it would be great if we supported reading from a ZooKeeper cluster! This just reminded me of another comment I got recently on this. The suggestion was to use a URI (similar to jdbc for example) rather than a host/port list. Perhaps we should have some sort of plugin architecture here, where the uri would be provided and each registered plugin would map the host/port mapping based on the scheme. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12619106#action_12619106 ] Hiram Chirino commented on ZOOKEEPER-107: - I personally think that this needs to stay decoupled so that group membership can be controlled via different implementations. In other words, I think that the QuorumPeer should not have to have any constructor args for it to know it's peers. It should persistently store/remember what the list of peers are part of the group since it last started. Not sure if it makes sense to keep that list in the ZK db or not. When a node that is not part of a cluster first starts up, it needs to know if it's starting a new cluster or if it is joining an existing cluster. Therefore, I think the QuorumPeer class needs methods like the following: {code} /** * Contacts a ZK server in the cluster, adds this peer to the cluster and gets a listing of the rest of the peers in * the cluster. * * Optional: is slaveOnly is true, then this peer should never be elected master. * * Throws an error if this peer is already part of a cluster. */ void joinCluster( URI server, bool slaveOnly ) /** * Starts this peer as the first node in the cluster and makes him the master. * * Throws an error if this peer is already part of a cluster. */ void createCluster() /** * Removes this peer from the peer list maintained by the cluster. * * Throws an error if this peer is not part of a cluster. */ void leaveCluster() /** * Gets a list of peers in the cluser. * * @return null if not part of a cluster yet. */ ListURI getClusterPeers() {code} If methods like the above are available, then an administrator can dynamically manage adding/removing nodes on an existing ZooKeeper cluster. or some automated agent could do it. Note that the peer list needs to get replicated to all cluster members and persisted to avoid split brain issues on peer restart. Operations like joinCluster(), leaveCluster(), getClusterPeers() would block until a master is elected in the cluster. Please note the 'nice to have feature' where you have the ability to designate some peers as NOT being eligible to become a master. This would allow you to support using heterogeneous peers, and enforce only allowing the higher end machines to become the masters. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12619109#action_12619109 ] Benjamin Reed commented on ZOOKEEPER-107: - I think there are two issues here: 1) adding/removing servers to a ZooKeeper cluster and 2) letting clients know about the change. We should probably separate them. I like the URL idea for dealing with 1) (especially when used in conjunction with the other idea in this Jira of defining a URL scheme for ZooKeeper). For 2) I agree with Hiram that it should be stored persistently at each replica and changed via the replication protocol. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12619118#action_12619118 ] Mahadev konar commented on ZOOKEEPER-107: - +1 for using URI's on the client side to get a list of zookeeper servers . We can always update the zookeeper client periodically by fetching from the URI Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12619119#action_12619119 ] Patrick Hunt commented on ZOOKEEPER-107: In my comment URI rather than a host/port list I was specifically referring to the client's host/port list used to specify the servers to which the client should connect. Probably a good idea to use something like this on the servers as well. Regarding the idea of join/leave a cluster, this sounds good. How does this mesh with the common case of starting up a set of 5 servers forming a new cluster? Specifically the idea of operations blocking (hiram's comment) until master is elected. Not sure I see how this works... Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.