[ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927833#action_12927833 ]
Alexandre Hardy commented on ZOOKEEPER-917: ------------------------------------------- Excerpt from logs on 192.168.130.10: {noformat} 2010-11-02 09:36:28,060 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: New election: 4294967742 2010-11-02 09:36:28,061 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2010-11-02 09:36:28,061 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x0 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/192.168.130.10:2181 remote=/192.168.130.10:37781] 2010-11-02 09:36:28,061 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 0, 4294967742, 2, 0, LOOKING, LOOKING, 0 2010-11-02 09:36:28,063 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Adding vote 2010-11-02 09:36:28,064 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2010-11-02 09:36:28,064 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x0 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/192.168.130.10:2181 remote=/192.168.130.14:50222] 2010-11-02 09:36:28,064 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2, -1, 1, 0, LOOKING, FOLLOWING, 1 2010-11-02 09:36:28,065 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2010-11-02 09:36:28,065 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x0 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/192.168.130.10:2181 remote=/192.168.130.14:50223] 2010-11-02 09:36:28,068 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2010-11-02 09:36:28,068 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x0 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/192.168.130.10:2181 remote=/192.168.130.12:59044] 2010-11-02 09:36:28,073 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2, -1, 1, 0, LOOKING, LEADING, 2 2010-11-02 09:36:28,073 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2010-11-02 09:36:28,073 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x0 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/192.168.130.10:2181 remote=/192.168.130.10:37786] 2010-11-02 09:36:28,073 INFO org.apache.zookeeper.server.quorum.QuorumPeer: FOLLOWING 2010-11-02 09:36:28,073 INFO org.apache.zookeeper.server.ZooKeeperServer: Created server 2010-11-02 09:36:28,074 INFO org.apache.zookeeper.server.quorum.Follower: Following zookeeper3/192.168.130.13:2888 {noformat} Excerpt from logs on 192.168.130.11: {noformat} 2010-11-02 09:36:14,065 INFO org.apache.zookeeper.server.quorum.QuorumPeerConfig: Defaulting to majority quorums 2010-11-02 09:36:14,120 INFO org.apache.zookeeper.server.quorum.QuorumPeerMain: Starting quorum peer 2010-11-02 09:36:14,172 INFO org.apache.zookeeper.server.quorum.QuorumCnxManager: My election bind port: 3888 2010-11-02 09:36:14,182 INFO org.apache.zookeeper.server.quorum.QuorumPeer: LOOKING 2010-11-02 09:36:14,183 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: New election: -1 2010-11-02 09:36:14,191 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2, -1, 1, 2, LOOKING, LOOKING, 2 2010-11-02 09:36:14,191 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Adding vote 2010-11-02 09:36:14,193 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1952) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:345) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:532) 2010-11-02 09:36:14,194 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Send worker leaving thread 2010-11-02 09:36:14,194 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2, -1, 1, 2, LOOKING, LOOKING, 1 2010-11-02 09:36:14,194 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Adding vote 2010-11-02 09:36:14,195 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1952) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:345) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:532) 2010-11-02 09:36:14,195 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Send worker leaving thread 2010-11-02 09:36:14,202 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Connection broken: java.nio.channels.AsynchronousCloseException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:281) at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:593) 2010-11-02 09:36:14,401 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: About to leave instance:2, -1, 2, LEADING 2010-11-02 09:36:14,402 INFO org.apache.zookeeper.server.quorum.QuorumPeer: LEADING {noformat} I'm not sure why 192.168.130.13 ended up as the leader when it did not have the most up to date transaction ID. Also, I don't see the notification messages of the other nodes in the logs of 192.168.130.13. Is there any reason why other nodes would accept 192.168.130.13 as the leader? > Leader election selected incorrect leader > ----------------------------------------- > > Key: ZOOKEEPER-917 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917 > Project: Zookeeper > Issue Type: Bug > Components: leaderElection, server > Affects Versions: 3.2.2 > Environment: Cloudera distribution of zookeeper (patched to never > cache DNS entries) > Debian lenny > Reporter: Alexandre Hardy > Priority: Critical > Attachments: zklogs-20101102144159SAST.tar.gz > > > We had three nodes running zookeeper: > * 192.168.130.10 > * 192.168.130.11 > * 192.168.130.14 > 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 > (automated startup). The new node had not participated in any zookeeper > quorum previously. The node 192.148.130.11 was permanently removed from > service and could not contribute to the quorum any further (powered off). > DNS entries were updated for the new node to allow all the zookeeper servers > to find the new node. > The new node 192.168.130.13 was selected as the LEADER, despite the fact that > it had not seen the latest zxid. > This particular problem has not been verified with later versions of > zookeeper, and no attempt has been made to reproduce this problem as yet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.