[jira] [Comment Edited] (ZOOKEEPER-335) zookeeper servers should commit the new leader txn to their logs.

maoling (Jira) Sun, 17 Nov 2019 02:06:38 -0800


    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975961#comment-16975961
 ]


maoling edited comment on ZOOKEEPER-335 at 11/17/19 10:05 AM:
--------------------------------------------------------------

Users may confuse about these two variables:*acceptedEpoch and currentEpoch* 
introduced by this ticket.

The implementation up to version 3.3.3 has not included epoch variables 
*acceptedEpoch and currentEpoch*. This omission has generated problems in a 
production version and was noticed by many ZooKeeper clients.

− *acceptedEpoch*: the epoch number of the last *NEWEPOCH* message accepted;
 − *currentEpoch*: the epoch number of the last *NEWLEADER* message accepted;

The origin of this problem is at the beginning of *Recovery* Phase, when the 
leader increments its epoch (contained in *lastZxid*) even before acquiring a 
quorum of successfully connected followers (such leader is called *false 
leader*). Since a follower goes back to *FLE* if its epoch is larger than the 
leader’s epoch, when a *false leader* drops leadership and becomes a follower 
of a leader from a previous epoch, it finds a smaller epoch and goes back to 
FLE. This behavior can loop, switching from *Recovery* Phase to *FLE* 
repeatedly.
 Consequently, using *lastZxid* to store the epoch number, there is no 
distinction between a *tried* epoch and a *joined* epoch in the implementation. 
Those are the respective purposes for *acceptedEpoch and currentEpoch*, hence 
the omission of them render such problems.

More details can be found in this report paper: _*ZooKeeper’s atomic broadcast 
protocol: Theory and practice. Andr ́e Medeiros March 20, 2012*_


was (Author: maoling):
Users may confuse about these two variables:*acceptedEpoch and currentEpoch* 
introduced by this ticketThe implementation up to version 3.3.3 has not 
included epoch variables *acceptedEpoch and currentEpoch*. This omission has 
generated problems in a production version and was noticed by many ZooKeeper 
clients.

− *acceptedEpoch*: the epoch number of the last *NEWEPOCH* message accepted;
− *currentEpoch*: the epoch number of the last *NEWLEADER* message accepted;


The origin of this problem is at the beginning of *Recovery* Phase, when the 
leader increments its epoch (contained in *lastZxid*) even before acquiring a 
quorum of successfully connected followers (such leader is called *false 
leader*). Since a follower goes back to *FLE* if its epoch is larger than the 
leader’s epoch, when a *false leader* drops leadership and becomes a follower 
of a leader from a previous epoch, it finds a smaller epoch and goes back to 
FLE. This behavior can loop, switching from *Recovery* Phase to *FLE* 
repeatedly.
Consequently, using *lastZxid* to store the epoch number, there is no 
distinction between a *tried* epoch and a *joined* epoch in the implementation. 
Those are the respective purposes for *acceptedEpoch and currentEpoch*, hence 
the omission of them render such problems.

More details can be found in this report paper: _*ZooKeeper’s atomic broadcast 
protocol: Theory and practice. Andr ́e Medeiros March 20, 2012*_

> zookeeper servers should commit the new leader txn to their logs.
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-335
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-335
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.1.0
>            Reporter: Mahadev Konar
>            Assignee: Benjamin Reed
>            Priority: Blocker
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-335.patch, ZOOKEEPER-335_2.patch, 
> ZOOKEEPER-335_3.patch, ZOOKEEPER-335_4.patch, ZOOKEEPER-335_5.patch, 
> ZOOKEEPER-790.travis.log.bz2, faultynode-vishal.txt, zk.log.gz, zklogs.tar.gz
>
>
> currently the zookeeper followers do not commit the new leader election. This 
> will cause problems in a failure scenarios with a follower acking to the same 
> leader txn id twice, which might be two different intermittent leaders and 
> allowing them to propose two different txn's of the same zxid.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ZOOKEEPER-335) zookeeper servers should commit the new leader txn to their logs.

Reply via email to