[ https://issues.apache.org/jira/browse/ZOOKEEPER-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843279#action_12843279 ]
Flavio Paiva Junqueira commented on ZOOKEEPER-684: -------------------------------------------------- Hi Henry, if I understand correctly your patch, in the case we have an execution like the one of the log attached in this issue, then the test would fail because the latch would never count down to zero. Is this correct? If so, I don't understand how it improves the test. My understanding of the race is that in the first round, server 0 receives a vote from mock server 2, but server 1 does not receive a vote from 2 (udp socket times out while waiting to receive). In the second round, server 1 receives vote from 0 and from 2, both voting for 2, and consequently server 1 elects 2. I think this is what you observe too in your last comment. If the receive call is timing out too soon, don't we have to increase the time out value? I understand that this is not desirable because it increases election time, but if it the current value is not sufficient, then I don't see a better option. > Race in LENonTerminateTest > -------------------------- > > Key: ZOOKEEPER-684 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-684 > Project: Zookeeper > Issue Type: Bug > Components: leaderElection, server > Reporter: Flavio Paiva Junqueira > Assignee: Henry Robinson > Priority: Critical > Fix For: 3.3.0 > > Attachments: zookeeper-684-test-failure.rtf, ZOOKEEPER-684.patch > > > testNonTermination failed during a Hudson run for ZOOKEEPER-59. After > inspecting the output, it looks like server is electing 2 as a leader and > leaving. Given that 2 is just a mock server, server 0 remains alone in leader > election. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.