On Tue, May 23, 2017 at 10:21 PM, Patrick Hunt <[email protected]> wrote:
> On Tue, May 23, 2017 at 3:47 PM, Mike Heffner <[email protected]> wrote: > > > Hi, > > > > I'm curious what the best practices are for handling zxid rollover in a > ZK > > ensemble. We have a few five-node ZK ensembles (some 3.4.8 and some > 3.3.6) > > and they periodically rollover their zxid. We see the following in the > > system logs on the leader node: > > > > 2017-05-22 12:54:14,117 [myid:15] - ERROR [ProcessThread(sid:15 > > cport:-1)::ZooKeeperCriticalThread@49] - Severe unrecoverable error, > from > > thread : ProcessThread(sid:15 cport:-1): > > org.apache.zookeeper.server.RequestProcessor$RequestProcessorException: > > zxid lower 32 bits have rolled over, forcing re-election, and therefore > new > > epoch start > > > > From my best understanding of the code, this exception will end up > causing > > the leader to enter shutdown(): > > > > https://github.com/apache/zookeeper/blob/09cd5db55446a4b390f > > 82e3548b929f19e33430d/src/java/main/org/apache/zookeeper/ > > server/ZooKeeperServer.java#L464-L464 > > > > This shuts down the zookeeper instance from servicing requests, but the > JVM > > is still actually running. What we experience is that while this ZK > > instance is still running, the remaining follower nodes can't re-elect a > > leader (at least within 15 mins) and quorum is offline. Our remediation > so > > far has been to restart the original leader node, at which point the > > cluster recovers. > > > > The two questions I have are: > > > > 1. Should the remaining 4 nodes be able to re-elect a leader after zxid > > rollover without intervention (restarting)? > > > > > Hi Mike. > > That is the intent. Originally the epoch would rollover and cause the > cluster to hang (similar to what you are reporting), the JIRA is here > https://issues.apache.org/jira/browse/ZOOKEEPER-1277 > However the patch, calling shutdown of the leader, was intended to force a > re-election before the epoch could rollover. > Should the leader JVM actually exit during this shutdown, thereby allowing the init system to restart it? > > > > 2. If the leader enters shutdown() state after a zxid rollover, is there > > any scenario where it will return to started? If not, how are others > > handling this scenario -- maybe a healthcheck that kills/restarts an > > instance that is in shutdown state? > > > > > I have run into very few people who have seen the zxid rollover and testing > under real conditions is not easily done. We have unit tests but that code > is just not exercised sufficiently in everyday use. You're not seeing > what's intended, please create a JIRA and include any additional details > you can (e.g. config, logs) > Sure, I've opened one here: https://issues.apache.org/jira/browse/ZOOKEEPER-2791 > > What I heard people (well really one user, I have personally only seen this > at one site) were doing prior to 1277 was monitoring the epoch number, and > when it got close to rolling over (within 10% say) they would force the > current leader to restart by restarting the process. The intent of 1277 was > to effectively do this automatically. > We are looking at doing something similar, maybe once a week finding the current leader and restarting it. From testing this quickly re-elects a new leader and resets the zxid to zero so it should avoid the rollover that occurs after a few weeks of uptime. > > Patrick > > > > > > Cheers, > > > > Mike > > > > > Mike
