No that you would want to do this, but simply overwriting a config file would "uncommit" a configuration and make that server think the last committed configuration was whatever is in the file?
Jared On Jul 28, 2012, at 11:33 AM, Alexander Shraer <[email protected]> wrote: > No problem! > > The way it works is that before a server acks a reconfig operation it writes > a special tmp file to disk (dynamicConfigFilename + ".tmp"). Servers look for > this file during recovery, they don't look for the configuration in the log > as for normal data, because we found it to be difficult to extract the right > info from the log exactly at the stage we needed it in the recovery. When a > commit message is received a server renames the tmp file to > dynamicConfigFilename. > > Recently there was a change committed by someone to start using atomic file > operations for different files in ZooKeeper. At some point we'll probably > change the renaming above to use these atomic operations. > > Alex > > > On Sat, Jul 28, 2012 at 10:17 AM, Jared Cantwell <[email protected]> > wrote: > Thanks Alex for the detailed explanations-- it really helps to fill in my > understanding of the implementation left open by the papers/presentations > I've read (without having to read the code yet :-) ). #2 is what I was > unsure of, but makes perfect sense. > > Obviously committing the new configuration to the internal database is a > prerequisite to committing on a server, but is writing the new configuration > file to disk also a prerequisite for committing the new configuration? I'm > curious about this so I can match it with my observations, since reading the > configuration file is much easier than getting the database state. > > ~Jared > > > On Sat, Jul 28, 2012 at 11:02 AM, Alexander Shraer <[email protected]> wrote: > Hi Jared, > > figuring out what happened and how to recover is part of the reconfiguration > protocol. I don't think that this is something you as a user should do, > unless I missunderstand what you're trying to do. This should be handled by > ZooKeeper just like it handles other failures without admin intervention. > > In your scenario, D-F come up and one of them is elected leader (since you > said they know about the commit), so they start running the new config > normally. When A-C come up, several things may happen: > > 1. During the preliminary FastLeaderElection, A-C will try to connect to D > and E, and in fact they'll also try to connect with the new config members > that they know was proposed. So most chances are that someone in the new > config will send them the new config file and they'll store it and act > accordingly (connect as non-voting followers in the new config). To make this > happen, I changed FastLeaderElection to talk with proposed configs (if known) > and to piggiback the last active config you know of on all messages. > > 2. Its possible that somehow A-C complete FastLeaderElection without talking > to D-F. But since a reconfiguration was committed, it was acked by a quorum > of the old config (and a quorum of the new one). Therefore, whoever is > "elected" in the old config, knows about the reconfig proposal (this is > guaranteed by normal ZooKeeper leader recovery). Before doing anything else, > the new leader among A-C will try to complete the reconfiguration, which > involves getting enough acks from a quorum of the new config. But in your > scenario the servers in the new config will not connect to it because they > moved on, so the candidate-leader will just give up and go back to (1) above. > > 3. In the remote chance that someone who heard about the reconfig commit > connects to a candidate-leader who didn't hear about it, the first thing it > does is to tell that candidate-leader that its not up to date, and the > leader just updates its config file, gives up on being a leader and returns > to (1). This was done by changing the first message that a follower/observer > sends to a leader it is connecting to, even before the synchronization starts. > > Alex > > > > On Sat, Jul 28, 2012 at 8:43 AM, Jared Cantwell <[email protected]> > wrote: > So I'm working through some failure scenarios and I want to make sure I fully > understand the way that dynamic membership changes previous behavior, so are > my expectations correct in this situation: > > As in my previous example, lets say that the current membership of voting > participants is {A,B,C,D,E} and we're looking to change membership to > {D,E,F,G,H}. > 1. Reconfiguration to {D,E,F,G,H} completes internally > 2. D-F update their local configuration files, but A-C do not yet. > 3. Power loss to all nodes > > Now what happens if A,B, and C come up with configuration files that still > say {A,B,C,D,E}, but no other servers start up yet? Can A,B and C form a > quorum and elect a leader since they all agree on the same state? What then > happens when the new membership of D-H starts up? > > We're trying to automatically handle node failures during reconfiguration > situations, but it seems like without being able to query all nodes to make > sure you know of the latest membership list there is no safe way to do this. > I'm wondering if only doing single node additions/removals would create less > complicated failure scenarios. What are your thoughts and best practices > around this? > > Thanks! > Jared > > On Fri, Jul 27, 2012 at 8:57 PM, Jared Cantwell <[email protected]> > wrote: > We are trying to remove the need for all admin intervention so that is one > failure scenario that is interesting to us. > > Jared > > > On Jul 27, 2012, at 7:42 PM, Alexander Shraer <[email protected]> wrote: > >> Yes, this entry will be deleted. I don't like this either - if a new >> follower reboots before added to the config it will not be able to boot up >> without manual help from an admin. That's why I'm considering maybe to >> remove the check that a participant must always initially be in its own >> config, but for now its there. >> >> Alex >> >> On Fri, Jul 27, 2012 at 6:34 PM, Jared Cantwell <[email protected]> >> wrote: >> Sorry for the confusion in terminology, I was unfamiliar with the exact >> leader/follower semantics previously. >> >> So if all connected servers update their config file, does that mean that >> non-voting followers who aren't part of the new ensemble will lose the entry >> specific to them in their config file? I can test this myself, but getting >> an inside perspective is very helpful. >> >> Thanks again for the help! >> Jared >> >> >> On Jul 27, 2012, at 6:55 PM, Alexander Shraer <[email protected]> wrote: >> >>> Yes, any number of followers which are not in the configuration can just >>> connect and listen in. This has always been the case, also in 3.4, I just >>> made use of this for the purpose of adding members during reconfiguration. >>> Moreover, in 3.4 there this bug ZOOKEEPER-1113 >>> where the leader actually counts the votes of anyone connected, regardless >>> of config membership :) This is fixed in ZK-107, so they are really >>> non-voting followers. >>> >>> > I am assuming that's the case, and that it is a follower (and not >>> > participant) by virtue of not being in the official configuration stored >>> > in >>> > zookeeper itself. >>> >>> Follower and participant types of servers is not something that was defined >>> in ZK-107. In ZooKeeper every follower/leader is a "participant". Its just >>> that the votes of participants that are not in the configuration are not >>> counted that's why we call them non-voting followers. BTW, obviously a >>> non-voting follower can not become leader (like ZK-1113 this was also not >>> enforced before ZK-107). >>> >>> > And a followup... does zookeeper only overwrite the dynamic >>> > configuration file for nodes that are voting participants? Such that if >>> > I >>> > started a follower and then left it running through some >>> > reconfigurations, its file would not get updated if it was never added as >>> > part of those reconfigurations? >>> >>> No, as soon as it connects to the current leader, its dynamic config file >>> is overwritten with the current configuration as part of the >>> synchronization with the leader. Every time a new configuration is >>> committed, all connected servers (voting, non-voting, observers) will >>> update their dynamic config file, doesn't matter if they're in the config. >>> >>> Alex >>> >>> On Fri, Jul 27, 2012 at 5:35 PM, Jared Cantwell <[email protected]> >>> wrote: >>> So does just having the server started and pointing to the existing >>> ensemble automatically make it a "non participating follower"? In other >>> words, there is no need to inform the existing nodes that this new node is >>> joining as a follower? And to extend that, there could be any number of >>> followers that are simply listening in on the event stream? I am assuming >>> that's the case, and that it is a follower (and not participant) by virtue >>> of not being in the official configuration stored in zookeeper itself. >>> >>> On Fri, Jul 27, 2012 at 6:29 PM, Alexander Shraer <[email protected]> wrote: >>> there are just two supported types - participant and observer. >>> (participant can act as either follower or leader). >>> >>> So you can either write participant or leave it unspecified (which means >>> participant by default). Also, since the ip is the same for all your ports >>> you don't have to write it twice. All of these should work in the same way: >>> >>> server.5=10.10.5.17:2182:2183:participant;10.10.5.17:2181 >>> server.5=10.10.5.17:2182:2183:participant;2181 >>> server.5=10.10.5.17:2182:2183;10.10.5.17:2181 >>> server.5=10.10.5.17:2182:2183;2181 >>> >>> >>> >>> On Fri, Jul 27, 2012 at 5:25 PM, Jared Cantwell <[email protected]> >>> wrote: >>> Thanks Alex for the response. Our current lines in the configuration look >>> like this: >>> >>> server.5=10.10.5.17:2182:2183:participant;10.10.5.17:2181 >>> >>> For the new servers is it ok for their entry to have "participant"? Or >>> should that be something different (e.g. "follower")? >>> >>> ~Jared >>> >>> On Fri, Jul 27, 2012 at 6:20 PM, Alexander Shraer <[email protected]> wrote: >>> Hi Jared, >>> >>> Thanks for experimenting with this feature. >>> >>> The idea is that new servers join as "non voting followers". Which means >>> that they act as normal followers but the leader ignores their votes since >>> they are not part of the current configuration. The leader only counts >>> their votes during the reconfiguration itself (to make sure a quorum of the >>> new config is ready before the new config can be committed/activated). >>> Defining them as observers is not a good idea, for example in your scenario >>> if they were observers they wouldn't be able to participate in the >>> reconfiguration protocol (which is similar to the protocol for committing >>> any other operation in which observers don't participate) and since we >>> don't have a quorum of followers in the new config that can ack, >>> reconfiguration would throw an exception (of >>> KeeperException.NEWCONFIGNOQUORUM type). >>> Of course if you intend them to be observers in the new config you can >>> define them as observers since their votes are not needed during reconfig >>> anyway. >>> >>> You're right, the new servers must be able to connect to the old quorum. At >>> minimum, their file should contain the current leader, but >>> you can also copy the current configuration file to the new members if you >>> wish. >>> >>> In addition, you should add a line for the member itself, so that server F >>> appears in F's config file (Its not important that the other new servers >>> appear in F's file, but it won't hurt either, so you can do a union of old >>> and new if you wish). The constructor of QuorumPeer checks that the server >>> itself is in the configuration its started with, otherwise its not going to >>> run. This check has always been there, but I'm thinking of possibly >>> changing it in the future. >>> >>> As soon as F connects to the leader, its config file will be overwritten >>> with the current config file as part of the synchronization process. >>> >>> Alex >>> >>> >>> On Fri, Jul 27, 2012 at 10:06 AM, Jared Cantwell <[email protected]> >>> wrote: >>> Hi, >>> >>> We are testing integration with 3.5.0 and dynamic membership and I have a >>> question. If I have a current set of servers in my ensemble {A,B,C,D,E} >>> and I want to reconfigure the ensemble to {D,E,F,G,H}, how should the >>> dynamic config file on servers F,G,H be configured on startup? Should they >>> have the old ensemble, the new ensemble, or the union of both ensembles? >>> It seems like these new servers need to know about the old quorum, but >>> since they aren't part of it yet its not clear to me how they should be >>> configured. Should there be an intermediate configuration with F,G, and H >>> as simply Observers? >>> >>> I can't find much documentation on this so I want to make sure I understand >>> things correctly. >>> >>> Thanks! >>> ~Jared >>> >>> >>> >>> >>> >> > > > >
