[
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flavio Junqueira updated ZOOKEEPER-822:
---------------------------------------
Attachment: ZOOKEEPER-822.patch
I believe the patch I'm attaching achieves the same goal and is even simpler,
but I'd like to make sure it suits your needs, Vishal.
If you agree with the modifications, I can work on a test. I was also thinking
that the 2-second timeout you used before is too low and I've raised to 5
seconds. But, instead of trying to argue which value is ideal, I'd rather use a
system property and use a default value of at least 5 seconds.
I also commit to redesigning QuorumCnxManager for either 3.4.0 or 4.0.0 to use
a selector or some other approach we agree upon. I've been wanting to do it for
a while anyway, and I actually thought there was a jira open for it... Maybe
not, I can't find it right now.
> Leader election taking a long time to complete
> -----------------------------------------------
>
> Key: ZOOKEEPER-822
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
> Project: Zookeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.3.0
> Reporter: Vishal K
> Assignee: Vishal K
> Priority: Blocker
> Fix For: 3.3.2, 3.4.0
>
> Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log,
> test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz,
> ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the
> 3. Repeat
> After a few rounds leader election takes anywhere 25- 60 seconds to finish.
> Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster
> after failing the leader. The leader was down anyways so logs from that node
> shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.