FYI Curator now has a staged connection notification mechanism for dealing with issues like this. When the Curator managed connection receives a Disconnect, it posts a message to listeners that the connection is SUSPENDED. If the connection can be re-established (via a background sync() using the current retry policy) the listeners receive RECONNECTED otherwise they receive LOST. Thus, users of the Curator LeaderSelector can know if they should pause their leader activity and/or stop leader activity.
-JZ ________________________________________ From: Ted Dunning [[email protected]] Sent: Monday, November 14, 2011 6:24 PM To: [email protected] Subject: Re: Missing session state handling in most Leader Election implementations On Mon, Nov 14, 2011 at 2:41 PM, Jordan Zimmerman <[email protected]>wrote: > It turns out that this is tricky to solve. When the server you're > connected to goes down, you get a Watcher.Event.KeeperState.Disconnected. > However, it could be that you are able to reconnect to another server so > the disconnected event should be ignored. The event should not be ignored. The master should pause in being a master, but not unload any major data structures. If it reconnects instantly, then it should continue as if nothing had happened. You can also have a time limit for how long you wait before you decide to pause operation as master. As you increase that time, you increase the probability of two masters existing at the same time. If the reconnect happens before the timeout, you don't need to both the master.
