[ https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Flavio Junqueira updated ZOOKEEPER-790: --------------------------------------- Attachment: ZOOKEEPER-790.v2.patch Thanks for pointing this issue out, Sergei. It sounds like the previous patch solved the issue discussed without making sure that the leader was ready to process messages when learner handlers started to read them in. This v2 patch does a number of things: # It moves the startup method to processAck. This way we make sure that start up the leader as soon as we have a quorum of acks for the newleader message; # It moves the initialization of the database in startup to a method startdata. There are two reasons for doing it. First, it didn't sound like a good idea to throw exceptions or catch exceptions in processAck, and they were only necessary because of the call to startup(). Second, the method startup() in ZooKeeperServer throws these exceptions because of loadData(), which is called separately in Leader.lead(), so it is not necessary to call it in processAck after hearing from a quorum; # It waits in LearnerHandler.run() until the leader ready before it starts the while(true) loop. I also had to receive an ack before executing the code to wait, otherwise the leader would never receive acks and form a quorum, thus causing the system to halt. To get some feedback on the changes implemented in this patch, I have discussed them with Ben. Thanks, Ben! Sergei, I would appreciate if you could give it a try, and if you could tell me if it works for you. > Last processed zxid set prematurely while establishing leadership > ----------------------------------------------------------------- > > Key: ZOOKEEPER-790 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790 > Project: Zookeeper > Issue Type: Bug > Components: quorum > Affects Versions: 3.3.1 > Reporter: Flavio Junqueira > Assignee: Flavio Junqueira > Priority: Blocker > Fix For: 3.3.2, 3.4.0 > > Attachments: ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790-3.3.patch, > ZOOKEEPER-790-follower-request-NPE.log, ZOOKEEPER-790.patch, > ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, > ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2, ZOOKEEPER-790.v2.patch > > > The leader code is setting the last processed zxid to the first of the new > epoch even before connecting to a quorum of followers. Because the leader > code sets this value before connecting to a quorum of followers > (Leader.java:281) and the follower code throws an IOException > (Follower.java:73) if the leader epoch is smaller, we have that when the > false leader drops leadership and becomes a follower, it finds a smaller > epoch and kills itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.