[
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906854#action_12906854
]
Vishal K commented on ZOOKEEPER-822:
------------------------------------
Hi Flavio,
> I think we need some time to converge on problems and fixes.
I don't think it would take a long time to converge. I think the patch that I
attached is quite simple. After adding a new property for timeout we should be
good to go.
> My understanding is that we want to have 3.3.2 out soon, and my feeling is
> that this is not a blocker for 3.3.2 given Vishal's description and our
> experience with the system so far, but it would be good to hear from Vishal.
>From our earlier email exchanges I have a feeling that in most cases FLE was
>tested by restarting the ZooKeeper service (and not by rebooting/shutting down
>the host). I am a bit concerned that enough time may not have been spent in
>testing/reproducing this problem. In my opinion, this fix should go in 3.3.2.
>I know for sure that we won't be able to use the next release as is without
>this fix.
Thanks.
-Vishal
> Leader election taking a long time to complete
> -----------------------------------------------
>
> Key: ZOOKEEPER-822
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
> Project: Zookeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.3.0
> Reporter: Vishal K
> Assignee: Vishal K
> Priority: Blocker
> Fix For: 3.3.2, 3.4.0
>
> Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log,
> test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz,
> ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the
> 3. Repeat
> After a few rounds leader election takes anywhere 25- 60 seconds to finish.
> Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster
> after failing the leader. The leader was down anyways so logs from that node
> shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.