[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-822: Hadoop Flags: [Reviewed] +1 looks good. ready to commit. Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-822: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk/33 - thanks Vishal and everyone who pushed this through! Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Status: Open (was: Patch Available) Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Attachment: ZOOKEEPER-822-3.3.2.patch Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Attachment: ZOOKEEPER-822.patch Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Status: Patch Available (was: Open) Thanks for the comments, Ben. I have modified zookeeperAdmin and added the zookeeper. prefix to the code. Regarding your question, initiateConnection is called from two methods: testInitiateConnection (used only in tests) and connectOne. connectOne is synchronized. Do you still see an issue? Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Status: Open (was: Patch Available) Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Attachment: ZOOKEEPER-822-3.3.2.patch Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Attachment: ZOOKEEPER-822.patch Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Status: Patch Available (was: Open) Thanks for reviewing it, Vishal. I have fixed the LOG.warn you pointed out and uploaded new patch files. Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Attachment: ZOOKEEPER-822-3.3.2.patch Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Status: Open (was: Patch Available) Going to submit patches that introduce a system property. Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Attachment: ZOOKEEPER-822.patch Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Status: Patch Available (was: Open) I have added a system property called cnxtimeout to change the timeout value in QuorumCnxManager. Tests pass for me. Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Attachment: ZOOKEEPER-822.patch I'm adding a test to the patch. It tries to send a message to an address for which a connection request receives no response, so it has to timeout. The test then checks that the amount of time elapsed is less than 6s (the timeout value is hardcoded 5s). Raising the timeout from 5s to say 7s makes the test fail. Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Status: Patch Available (was: Open) Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Attachment: ZOOKEEPER-822-3.3.2.patch Attaching patch for 3.3.2. Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Attachment: ZOOKEEPER-822.patch Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Attachment: ZOOKEEPER-822.patch I believe the patch I'm attaching achieves the same goal and is even simpler, but I'd like to make sure it suits your needs, Vishal. If you agree with the modifications, I can work on a test. I was also thinking that the 2-second timeout you used before is too low and I've raised to 5 seconds. But, instead of trying to argue which value is ideal, I'd rather use a system property and use a default value of at least 5 seconds. I also commit to redesigning QuorumCnxManager for either 3.4.0 or 4.0.0 to use a selector or some other approach we agree upon. I've been wanting to do it for a while anyway, and I actually thought there was a jira open for it... Maybe not, I can't find it right now. Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal K updated ZOOKEEPER-822: --- Attachment: ZOOKEEPER-822.patch_v1 Hi, I have attached the diff of my changes. It is a pretty simple fix. My intention was to have a fix with minimal code changes. connectOne waits for 2 seconds for a connection to establish. Right now the value is hard-coded. I was planning to add a new property that would specify the amount of time to wait for a connect request. If you think that the proposed changes are good enough, I will make the change for using the property value and resubmit a second patch. Let me know. -Vishal Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-822: Assignee: Vishal K Fix Version/s: 3.3.2 3.4.0 Marking this for 3.3.2, to see if we want this included in 3.3.2. Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal K updated ZOOKEEPER-822: --- Attachment: zookeeper-3.4.0.tar.gz Hi, To rule out any setup specific issues (VM/ESX/ect), I tried to reproduce this problem with 3 physical machines (no VMs). I was able to reproduce the same problem after 15 reboots or so. It took a little more number of reboots on the physical setup. I used the latest code this time. I have attached the logs. I am suspecting 3 problems: 1. blocking connect in WorkerSender.process as described in my earlier comments. 2. SendWorker.run() calls finish at the end. This could result in finish() getting called twice (e.g., finish called from receiveConnection), thus, causing senderWorkerMap.remove(sid) called twice and removing an entry that should be removed. 3. Race condition that causes one of the peers to disconnect other peer's connection (receiveConnection/initiateConnection issue). I am still verifying/investigating the this. In logs of server id 0 and 1, search for ***REBOOTING LEADER*** from the bottom. This line marks the start of faulty FLE. -Vishal Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Priority: Blocker Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal K updated ZOOKEEPER-822: --- Attachment: 822.tar.gz attaching new logs. Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Priority: Blocker Attachments: 822.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal K updated ZOOKEEPER-822: --- Attachment: zk_leader_election.tar.gz Hi Flavio, I have attached the logs. Hope this helps. Do you have more info on loggraph (doc, etc)? Thanks. Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Priority: Blocker Attachments: test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal K updated ZOOKEEPER-822: --- Attachment: test_zookeeper_2.log test_zookeeper_1.log Attaching logs from two nodes that took too long to complete leader election. Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Priority: Blocker Attachments: test_zookeeper_1.log, test_zookeeper_2.log Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.