[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-790:
---------------------------------------

    Attachment: ZOOKEEPER-790.v2.patch

Thanks for pointing this issue out, Sergei. It sounds like the previous patch 
solved the issue discussed without making sure that the leader was ready to 
process messages when learner handlers started to read them in. This v2 patch 
does a number of things:

# It moves the startup method to processAck. This way we make sure that start 
up the leader as soon as we have a quorum of acks for the newleader message;
# It moves the initialization of the database in startup to a method startdata. 
There are two reasons for doing it. First, it didn't sound like a good idea to 
throw exceptions or catch exceptions in processAck, and they were only 
necessary because of the call to startup(). Second, the method startup() in 
ZooKeeperServer throws these exceptions because of loadData(), which is called 
separately in Leader.lead(), so it is not necessary to call it in processAck 
after hearing from a quorum; 
# It waits in LearnerHandler.run() until the leader ready before it starts the 
while(true) loop. I also had to receive an ack before executing the code to 
wait, otherwise the leader would never receive acks and form a quorum, thus 
causing the system to halt.

To get some feedback on the changes implemented in this patch, I have discussed 
them with Ben. Thanks, Ben! 

Sergei, I would appreciate if you could give it a try, and if you could tell me 
if it works for you. 

> Last processed zxid set prematurely while establishing leadership
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-790
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.1
>            Reporter: Flavio Junqueira
>            Assignee: Flavio Junqueira
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790-3.3.patch, 
> ZOOKEEPER-790-follower-request-NPE.log, ZOOKEEPER-790.patch, 
> ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, 
> ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2, ZOOKEEPER-790.v2.patch
>
>
> The leader code is setting the last processed zxid to the first of the new 
> epoch even before connecting to a quorum of followers. Because the leader 
> code sets this value before connecting to a quorum of followers 
> (Leader.java:281) and the follower code throws an IOException 
> (Follower.java:73) if the leader epoch is smaller, we have that when the 
> false leader drops leadership and becomes a follower, it finds a smaller 
> epoch and kills itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to