I've been working on adding a TCPResponderThread to the leader election process so that if a deployment needs to be TCP only, it can be and still use all election types. Testing this has exposed what might be a race condition in the leader election code that prevents a leader from being elected.
Here's the behaviour I see in LETest occasionally. With three nodes (reduced from 30 for ease of debugging), node 3 gets elected before either node 1 or node 2 finish their election (there is one round where each node that 3 has the highest id, and then 3 completes its second round by receiving votes for itself from 1 and 2, but 1 and 2 do not receive votes from 3). Now 3 is killed by the test harness. 1 and 2 are still voting for it, but every time they try, the vote tally excludes 3 since it hasn't been heard from. They then spin round the voting process, unable to reset their vote. I expect that the heartbeat mechanism in a running QuorumPeer takes care of this when the leader is lost, but the associated QuorumPeers aren't running. If this is the case, then there is a simple fix to reset the nodes vote to themselves if they are voting for a node that hasn't been heard from. I don't know why using TCP instead of UDP for the responder thread is exacerbating this (and we can't rule out my introducing a bug :)); but as it's a race condition the different timings associated with waiting on a TCP socket might just be enough to expose the issue. Can someone verify this might be possible / figure out what I missed? cheers, Henry