[
https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888304#action_12888304
]
Flavio Paiva Junqueira commented on ZOOKEEPER-790:
--------------------------------------------------
Hi Vishal, Thanks for all the information. I haven't been able to reproduce it
yet, but here are some thoughts after looking over your logs again:
1- It is not a problem that server 0 is declaring itself leader, even though
there is another leader running. Server 0 will be ignored by the others and
eventually will drop its leadership as you have observed;
2- The notifications of 1 and 2 say looking because they have been queued at
the time 1 and 2 were looking for a leader. That's not an issue;
3- I don't understand why the patch doesn't work. Let me tell you how I'm
interpreting your run. Server 0 is receiving the notifications from 1 and 2,
and deciding that it should be the leader. Because in the current trunk code we
set the first zxid for the new epoch before hearing from a quorum, once server
0 drops leadership, it has a higher zxid than everyone else. Consequently, it
correctly refuses to talk to the current leader. Now, setting the first epoch
zxid prematurely is a problem, and the patch I have uploaded should fix it. The
bottom line is that I can't understand why the patch I uploaded does not fix
it. Have you made sure to apply it before running your new tests? Either way, I
would appreciate if you could upload logs out of a run with the current 790
patch.
Thanks!
> Last processed zxid set prematurely while establishing leadership
> -----------------------------------------------------------------
>
> Key: ZOOKEEPER-790
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790
> Project: Zookeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.3.1
> Reporter: Flavio Paiva Junqueira
> Assignee: Flavio Paiva Junqueira
> Priority: Blocker
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-790.patch
>
>
> The leader code is setting the last processed zxid to the first of the new
> epoch even before connecting to a quorum of followers. Because the leader
> code sets this value before connecting to a quorum of followers
> (Leader.java:281) and the follower code throws an IOException
> (Follower.java:73) if the leader epoch is smaller, we have that when the
> false leader drops leadership and becomes a follower, it finds a smaller
> epoch and kills itself.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.