[ https://issues.apache.org/jira/browse/ZOOKEEPER-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Flavio Paiva Junqueira updated ZOOKEEPER-647: --------------------------------------------- Attachment: ZOOKEEPER-647.patch By inspecting the code of QuorumCnxManager, I've been able to find a corner case that could cause a RecvWorker thread to hang during shutdown. Here is a summary of how the problem can occur: 1- sendWorkerMap is updated during the execution of softHalt (cnx manager is being shut down); 2- The sender worker that was not shut down during the execution of softHalt will later leave its main loop because the value of the attribute shutdown is true; 3- Leaving the loop due to shutdown evaluating to true does not cause finish() to be called, which must happen to kill its recv worker sibling. I'm proposing a fix that is quite simple. The correct interleaving to trigger the bug is quite difficult to reproduce, though, so I'm not providing a test. > hudson failure in testLeaderShutdown > ------------------------------------ > > Key: ZOOKEEPER-647 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-647 > Project: Zookeeper > Issue Type: Bug > Components: server > Reporter: Patrick Hunt > Assignee: Flavio Paiva Junqueira > Priority: Critical > Fix For: 3.3.0 > > Attachments: ZOOKEEPER-647.patch > > > http://hudson.zones.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/666/testReport/org.apache.zookeeper.test/QuorumTest/testLeaderShutdown/ > junit.framework.AssertionFailedError: QP failed to shutdown in 30 seconds > at org.apache.zookeeper.test.QuorumBase.shutdown(QuorumBase.java:293) > at > org.apache.zookeeper.test.QuorumBase.shutdownServers(QuorumBase.java:281) > at org.apache.zookeeper.test.QuorumBase.tearDown(QuorumBase.java:266) > at org.apache.zookeeper.test.QuorumTest.tearDown(QuorumTest.java:55) > Flavio, can you triage this one? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.