I believe if local sessions are in use and the session in question hasn't been upgraded to global by creating an ephemeral node, it would see session expiration after a leader election (unless maybe if it lands on the same peer - I do not remember if the session table gets recycled completely in that case).
On Wed, Jun 22, 2016 at 10:58 AM, Patrick Hunt <[email protected]> wrote: > Hi Mark. See this jira for background: > https://issues.apache.org/jira/browse/ZOOKEEPER-1277 > > However what you describe is correct behavior from our perspective. When > the lower 32 roll over we now (that was the fix) force a re-election of the > leader. Leader re-election causes the quorum to stop serving clients until > a new quorum forms. > > Leader re-election is a normal behavior for the ZK service, it happens > whenever the current leader is lost and a new quorum, with a (possibly new) > leader needs to reform. Say if the current leader process is restarted. > Your clients need to be able to handle this situation (typically the client > library does this for you). > > That said, you should not be seeing session expiration as a result of this. > Client timeouts certainly, but not session expiration. It might happen for > other reasons, but the leader is the one responsible for expiring sessions. > If there is no leader (e.g. being re-elected) there is no session > expiration. When the new leader is elected it will reset the clock on > session expiration, for all sessions, from the time it's reelected. For > example you can shutdown the entire ZK server ensemble, start it back up an > hour later and the clients should all be able to rejoin. Hm, that said I'm > not sure if Curator is doing some special magic, that's the behavior of the > stock client that we ship. > > Patrick > > > On Wed, Jun 22, 2016 at 6:18 AM, Figura, Mark <[email protected]> wrote: > > > Hi, > > > > We are using ZooKeeper 3.4.5 along with Curator to perform leader > > elections and also store some application data on a 3-node ensemble. Our > > application is not hard-realtime, but glitches in stream processing do > get > > noticed and may raise support tickets. > > > > Yesterday, we had such a glitch and by looking through the logs, I found > > there was an XID rollover. When this happened, a new election within the > > ensemble was triggered and all client connections were closed. From our > > application's point of view (possibly filtered through Curator), we saw > the > > session expire and then the connection was lost. This caused our > > application to shutdown each component, re-perform leader elections, and > > eventually start back up. > > > > We do have an issue where our application is making many more writes than > > it should, but once this is fixed, we'll still run into an XID rollover > > sooner or later. > > > > Is there something our application can do to handle this situation > better? > > Are there any plans for Zookeeper to handle this situation without > closing > > client connections? > > > > Thanks! > > Mark > > >
