Hello Bill, When the VMs restart, is it possible that they are assigned different IP addresses, despite retaining their original hostnames?
The reason I ask is that we currently have a known issue in that a running ZooKeeper server will not redo DNS resolution for previously encountered hostnames in the ensemble. This is documented in issue ZOOKEEPER-1506, where a proposed patch is undergoing review and testing. https://issues.apache.org/jira/browse/ZOOKEEPER-1506 If IP addresses are changing after VM restarts in your environment, then it seems plausible that you're seeing the symptoms of ZOOKEEPER-1506. --Chris Nauroth On 9/9/15, 11:09 PM, "Bill Hastings" <[email protected]> wrote: >On the node, which is not the leader I get the following messages in the >log: > >04:26:43,076 WARN QuorumCnxManager:382 - Cannot open channel to 2 at >election address hvs2.dwa.local/192.168.8.11:4000 >04:26:43,089 WARN QuorumCnxManager:382 - Cannot open channel to 2 at >election address hvs2.dwa.local/192.168.8.11:4000 >06:35:25,844 INFO QuorumCnxManager:511 - Received connection request / >192.168.8.11:51367 >06:38:00,399 INFO QuorumCnxManager:511 - Received connection request / >192.168.8.11:51539 >07:18:27,940 INFO QuorumCnxManager:511 - Received connection request / >192.168.8.11:52720 >07:33:58,042 INFO LeaderElection:187 - Server address: hvs2.dwa.local/ >192.168.8.11:3000 >07:33:59,449 INFO LeaderElection:187 - Server address: hvs2.dwa.local/ >192.168.8.11:3000 >07:34:00,854 INFO LeaderElection:187 - Server address: hvs2.dwa.local/ >192.168.8.11:3000 >07:34:02,257 INFO LeaderElection:187 - Server address: hvs2.dwa.local/ >192.168.8.11:3000 >07:34:03,660 INFO LeaderElection:187 - Server address: hvs2.dwa.local/ >192.168.8.11:3000 >07:34:05,063 INFO LeaderElection:187 - Server address: hvs2.dwa.local/ >192.168.8.11:3000 >07:34:06,266 INFO LeaderElection:187 - Server address: hvs2.dwa.local/ >192.168.8.11:3000 >07:34:06,585 WARN Learner:234 - Unexpected exception, tries=0, connecting >to hvs2.dwa.local/192.168.8.11:3000 >07:55:28,865 WARN QuorumCnxManager:382 - Cannot open channel to 2 at >election address hvs2.dwa.local/192.168.8.11:4000 >07:55:29,066 WARN QuorumCnxManager:382 - Cannot open channel to 2 at >election address hvs2.dwa.local/192.168.8.11:4000 >07:55:29,471 WARN QuorumCnxManager:382 - Cannot open channel to 2 at >election address hvs2.dwa.local/192.168.8.11:4000 >07:55:30,275 WARN QuorumCnxManager:382 - Cannot open channel to 2 at >election address hvs2.dwa.local/192.168.8.11:4000 >07:55:31,878 WARN QuorumCnxManager:382 - Cannot open channel to 2 at >election address hvs2.dwa.local/192.168.8.11:4000 >07:55:34,106 INFO QuorumCnxManager:511 - Received connection request / >192.168.8.11:55863 >07:58:01,872 INFO QuorumCnxManager:511 - Received connection request / >192.168.8.11:56662 > >On the leader I get the following: > >4:19:50,815 WARN QuorumCnxManager:382 - Cannot open channel to 1 at >election address hvs1.dwa.local/192.168.8.10:4000 >04:20:46,903 INFO QuorumCnxManager:511 - Received connection request / >192.168.8.10:46459 >06:36:04,561 WARN QuorumCnxManager:382 - Cannot open channel to 1 at >election address hvs1.dwa.local/192.168.8.10:4000 >06:36:04,771 WARN QuorumCnxManager:382 - Cannot open channel to 1 at >election address hvs1.dwa.local/192.168.8.10:4000 >06:36:05,175 WARN QuorumCnxManager:382 - Cannot open channel to 1 at >election address hvs1.dwa.local/192.168.8.10:4000 >06:36:05,980 WARN QuorumCnxManager:382 - Cannot open channel to 1 at >election address hvs1.dwa.local/192.168.8.10:4000 >06:36:07,585 WARN QuorumCnxManager:382 - Cannot open channel to 1 at >election address hvs1.dwa.local/192.168.8.10:4000 >06:36:10,789 WARN QuorumCnxManager:382 - Cannot open channel to 1 at >election address hvs1.dwa.local/192.168.8.10:4000 >06:36:17,194 WARN QuorumCnxManager:382 - Cannot open channel to 1 at >election address hvs1.dwa.local/192.168.8.10:4000 >06:36:29,999 WARN QuorumCnxManager:382 - Cannot open channel to 1 at >election address hvs1.dwa.local/192.168.8.10:4000 >06:36:53,578 INFO QuorumCnxManager:511 - Received connection request / >192.168.8.10:50285 >07:16:53,244 WARN LearnerHandler:646 - ******* GOODBYE >/192.168.8.10:42097 >******** >07:17:21,117 INFO QuorumCnxManager:511 - Received connection request / >192.168.8.10:51044 >07:32:57,213 INFO LeaderElection:187 - Server address: hvs1.dwa.local/ >192.168.8.10:3000 >07:32:58,427 INFO LeaderElection:187 - Server address: hvs1.dwa.local/ >192.168.8.10:3000 >07:32:59,631 INFO LeaderElection:187 - Server address: hvs1.dwa.local/ >192.168.8.10:3000 >07:34:00,575 WARN LearnerHandler:646 - ******* GOODBYE >/192.168.8.10:43186 >******** >07:56:11,493 WARN LearnerHandler:646 - ******* GOODBYE >/192.168.8.10:43536 >******** >07:56:55,045 INFO QuorumCnxManager:511 - Received connection request / >192.168.8.10:51949 > >On Wed, Sep 9, 2015 at 10:42 PM, Bill Hastings <[email protected]> >wrote: > >> Hi All >> >> I am running ZK as a 3 node cluster. Each ZK instance is a VMWare VM in >>a >> distinct ESX host. Let's assume the three VMs are A, B and C where A is >>the >> leader. Now if I take down VM B and C and then bring one of them back >>up. >> However the ZK cluster is never formed unless I bounce VM A. How can I >> troubleshoot this? This however does not happen in a physical >>environment. >> >> -- >> Cheers >> Bill >> > > > >-- >Cheers >Bill
