[ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927888#action_12927888 ]
Alexandre Hardy commented on ZOOKEEPER-917: ------------------------------------------- Hi Flavio, The three zookeeper servers are zookeeper1, zookeeper2 and zookeeper3. Initially the servers were * 192.168.130.10: zookeeper1 * 192.168.130.11: zookeeper3 * 192.168.130.14: zookeeper2 After .11 was removed the servers were: * 192.168.130.10: zookeeper1 * 192.168.130.13: zookeeper3 * 192.168.130.14: zookeeper2 All other settings were set by hbase: * tickTime=2000 * initLimit=10 * syncLimit=5 * peerport=2888 * leaderport=3888 zookeeper1 would have node id 0 zookeeper2 would have node id 1 zookeeper3 would have node id 2 I'm not sure what else I can give you concerning the configuration. I note that in 192.168.130.14 (node id 1) we have {noformat} 2010-11-02 09:36:27,988 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: New election: 4294967742 2010-11-02 09:36:27,988 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 1, 4294967742, 2, 1, LOOKING, LOOKING, 1 2010-11-02 09:36:27,988 INFO org.apache.zookeeper.server.quorum.QuorumCnxManager: Have smaller server identifier, so dropping the connection: (2, 1) 2010-11-02 09:36:27,988 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Adding vote 2010-11-02 09:36:27,989 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2, -1, 1, 1, LOOKING, FOLLOWING, 0 {noformat} I don't think there is much chance of some kind of networking configuration, but could that explain what we are seeing? > Leader election selected incorrect leader > ----------------------------------------- > > Key: ZOOKEEPER-917 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917 > Project: Zookeeper > Issue Type: Bug > Components: leaderElection, server > Affects Versions: 3.2.2 > Environment: Cloudera distribution of zookeeper (patched to never > cache DNS entries) > Debian lenny > Reporter: Alexandre Hardy > Priority: Critical > Attachments: zklogs-20101102144159SAST.tar.gz > > > We had three nodes running zookeeper: > * 192.168.130.10 > * 192.168.130.11 > * 192.168.130.14 > 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 > (automated startup). The new node had not participated in any zookeeper > quorum previously. The node 192.148.130.11 was permanently removed from > service and could not contribute to the quorum any further (powered off). > DNS entries were updated for the new node to allow all the zookeeper servers > to find the new node. > The new node 192.168.130.13 was selected as the LEADER, despite the fact that > it had not seen the latest zxid. > This particular problem has not been verified with later versions of > zookeeper, and no attempt has been made to reproduce this problem as yet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.