Unsubscribe
Mike Richardson Senior Software Engineer *MoTuM N.V. | Dellingstraat 34 | B-2800 MECHELEN | Belgium* T +32(0)15 28 16 63 M +41 7943 69538 www.motum.be On 26 May 2017 at 20:45, Patrick Hunt <[email protected]> wrote: > On Wed, May 24, 2017 at 8:08 AM, Mike Heffner <[email protected]> wrote: > > > On Tue, May 23, 2017 at 10:21 PM, Patrick Hunt <[email protected]> wrote: > > > > > On Tue, May 23, 2017 at 3:47 PM, Mike Heffner <[email protected]> > wrote: > > > > > > > Hi, > > > > > > > > I'm curious what the best practices are for handling zxid rollover > in a > > > ZK > > > > ensemble. We have a few five-node ZK ensembles (some 3.4.8 and some > > > 3.3.6) > > > > and they periodically rollover their zxid. We see the following in > the > > > > system logs on the leader node: > > > > > > > > 2017-05-22 12:54:14,117 [myid:15] - ERROR [ProcessThread(sid:15 > > > > cport:-1)::ZooKeeperCriticalThread@49] - Severe unrecoverable error, > > > from > > > > thread : ProcessThread(sid:15 cport:-1): > > > > org.apache.zookeeper.server.RequestProcessor$RequestProcesso > > rException: > > > > zxid lower 32 bits have rolled over, forcing re-election, and > therefore > > > new > > > > epoch start > > > > > > > > From my best understanding of the code, this exception will end up > > > causing > > > > the leader to enter shutdown(): > > > > > > > > https://github.com/apache/zookeeper/blob/09cd5db55446a4b390f > > > > 82e3548b929f19e33430d/src/java/main/org/apache/zookeeper/ > > > > server/ZooKeeperServer.java#L464-L464 > > > > > > > > This shuts down the zookeeper instance from servicing requests, but > the > > > JVM > > > > is still actually running. What we experience is that while this ZK > > > > instance is still running, the remaining follower nodes can't > re-elect > > a > > > > leader (at least within 15 mins) and quorum is offline. Our > remediation > > > so > > > > far has been to restart the original leader node, at which point the > > > > cluster recovers. > > > > > > > > The two questions I have are: > > > > > > > > 1. Should the remaining 4 nodes be able to re-elect a leader after > zxid > > > > rollover without intervention (restarting)? > > > > > > > > > > > Hi Mike. > > > > > > That is the intent. Originally the epoch would rollover and cause the > > > cluster to hang (similar to what you are reporting), the JIRA is here > > > https://issues.apache.org/jira/browse/ZOOKEEPER-1277 > > > However the patch, calling shutdown of the leader, was intended to > force > > a > > > re-election before the epoch could rollover. > > > > > > > Should the leader JVM actually exit during this shutdown, thereby > allowing > > the init system to restart it? > > > > > iirc it should not be necessary but it's been some time since I looked at > it. > > > > > > > > > > > > > > 2. If the leader enters shutdown() state after a zxid rollover, is > > there > > > > any scenario where it will return to started? If not, how are others > > > > handling this scenario -- maybe a healthcheck that kills/restarts an > > > > instance that is in shutdown state? > > > > > > > > > > > I have run into very few people who have seen the zxid rollover and > > testing > > > under real conditions is not easily done. We have unit tests but that > > code > > > is just not exercised sufficiently in everyday use. You're not seeing > > > what's intended, please create a JIRA and include any additional > details > > > you can (e.g. config, logs) > > > > > > > Sure, I've opened one here: > > https://issues.apache.org/jira/browse/ZOOKEEPER-2791 > > > > > > > > > > What I heard people (well really one user, I have personally only seen > > this > > > at one site) were doing prior to 1277 was monitoring the epoch number, > > and > > > when it got close to rolling over (within 10% say) they would force the > > > current leader to restart by restarting the process. The intent of 1277 > > was > > > to effectively do this automatically. > > > > > > > We are looking at doing something similar, maybe once a week finding the > > current leader and restarting it. From testing this quickly re-elects a > new > > leader and resets the zxid to zero so it should avoid the rollover that > > occurs after a few weeks of uptime. > > > > > Exactly. This is pretty much the same scenario that I've seen in the past, > along with a similar workaround. > > You might want to take a look at the work Benedict Jin has done here: > https://issues.apache.org/jira/browse/ZOOKEEPER-2789 > Given you are seeing this so frequently it might be something you could > collaborate on with the author of the patch? I have not looked at it in > great detail but it may allow you to run longer w/o seeing the issue. I > have not thought through all the implications though... (including b/w > compat). > > Patrick > > > > > > > > > > Patrick > > > > > > > > > > > > > > Cheers, > > > > > > > > Mike > > > > > > > > > > > > > > > Mike > > >
