[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910531#action_12910531
 ] 

Diogo commented on ZOOKEEPER-869:
---------------------------------

While trying to implement this, I found an interesting issue. Say we have an 
ensemble with 3 nodes. Say we start all nodes together and all have the state 
synchronized, meaning, all replicas return the same value with 
ZKDatabase().getLastLoggedZxid(). It seems that the leader will send a snapshot 
to all followers, although that is not necessary. They need no state transfer.

The leader (quorum/Leader.java:283) reads its lastLoggedZxid() and adds a new 
epoch on it and stores it as lastProposed. In LearnerHandler.java:308 the 
thread will decide if the replica needs an empty DIFF otherwise a SNAP. (I am 
assuming the state of the system described above). But startForwarding will 
return lastProposed, which is necessarily larger than any other zxid. Then SNAP 
will be selected and sent.

Here there is the part of an output, where 2 replicas have the same state 
stored and one is behind.

2010-09-17 12:11:27,296 [myid:3] - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2183:files...@82] - Reading snapshot 
/tmp/zoo3/version-2/snapshot.700000000
2010-09-17 12:11:27,298 [myid:3] - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2183:files...@82] - Reading snapshot 
/tmp/zoo3/version-2/snapshot.700000000
2010-09-17 12:11:27,301 [myid:3] - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2183:filetxnsnap...@208] - Snapshotting: 700000000
2010-09-17 12:11:27,303 [myid:3] - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2183:lea...@285] - lastLoggedZxid = 700000000 
lastProposed = 800000000   <---------- added line just after leader sets its 
lastProposed
2010-09-17 12:11:27,309 [myid:3] - INFO  
[LearnerHandler-/127.0.0.1:48318:learnerhand...@247] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.quorumpeer$quorumser...@12d3205
2010-09-17 12:11:27,310 [myid:3] - WARN  
[LearnerHandler-/127.0.0.1:48318:learnerhand...@326] - Sending snapshot last 
zxid of peer is 0x700000000  zxid of leader is 0x800000000   <------ snapshot 
being sent!
2010-09-17 12:11:27,312 [myid:3] - WARN  
[LearnerHandler-/127.0.0.1:48318:lea...@474] - Commiting zxid 0x800000000 from 
/127.0.0.1:2890 not first!
2010-09-17 12:11:27,313 [myid:3] - WARN  
[LearnerHandler-/127.0.0.1:48318:lea...@476] - First is 0
2010-09-17 12:11:27,313 [myid:3] - INFO  
[LearnerHandler-/127.0.0.1:48318:lea...@500] - Have quorum of supporters; 
starting up and setting last processed zxid: 34359738368
2010-09-17 12:11:28,290 [myid:3] - INFO  
[LearnerHandler-/127.0.0.1:48319:learnerhand...@247] - Follower sid: 2 : info : 
org.apache.zookeeper.server.quorum.quorumpeer$quorumser...@1319c
2010-09-17 12:11:28,291 [myid:3] - WARN  
[LearnerHandler-/127.0.0.1:48319:learnerhand...@326] - Sending snapshot last 
zxid of peer is 0x600000000  zxid of leader is 0x800000000  <---- this follower 
needs the snapshot.


Am I understanding something wrong?

> Support for election of leader with arbitrary zxid
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-869
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-869
>             Project: Zookeeper
>          Issue Type: New Feature
>            Reporter: Diogo
>            Priority: Minor
>
> Currently, the leader election algorithm implemented guarantees that the 
> leader has the maximum zxid of the ensemble. The state synchronization after 
> the election was built based on this assumption. However, other leader 
> elections algorithms might elect leaders with arbitrary zxid. 
> To support other leader election algorithms, the state synchronization should 
> allow the leader to have an arbitrary zxid.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to