From that log, it looks like Node 3 and 2 formed a quorum (Node 1 can’t talk to Node2 for some reason). Even if Node 3 didn’t hear from Node 2 after 12:46, it is still a special case as we had a kinda network partition for about 10 mins. I wonder if you can repro this in an environment that Node 2 is definitely down? I will take a look at the log after 12:46 when I get a chance.
> On Oct 14, 2016, at 1:40 PM, Anand Parthasarathy <anpar...@avinetworks.com> > wrote: > > Hi Michael and Ryan, > > Thanks for looking into the logs. > > Actually, in this case, node 2 was brought down ungracefully and was down > permanently. In this customer deployment, our product runs as a docker > container in a bare metal environment. It is possible that the docker proxy > that provides external connectivity to the zookeeper ports is down but the > container itself is up and running for several more minutes. If you grep > for "2 (n.sid)" you will see that there is no packets after 12:46. > Convergence after that did not happen until 15:57 or so. > > Thanks, > Anand. > > On Thu, Oct 13, 2016 at 5:10 PM, Michael Han <h...@cloudera.com> wrote: > >>>> it started a new round but then I seem to see the election messages >> from Node 2 again. Any idea why? >> >> My guess is node 2 is back online and ZK service was started. In this use >> case node 2 does not get offline permanently, IIUC. >> >> >> On Thu, Oct 13, 2016 at 3:41 PM, Ryan Zhang <yangzhangr...@hotmail.com> >> wrote: >> >>> Hi, Anand, I took a look and I wonder how do you explain the following >>> >>> The N1.log starts at around >>> 2016-10-03 12:37:38,073 [myid:1] - INFO [QuorumPeer[myid=1]/127.0.0.1: >>> 5002:QuorumPeer@714] - LOOKING >>> >>> and it failed to talk to Node 2 (id: 2) >>> 2016-10-03 12:37:38,136 [myid:1] - WARN [WorkerSender[myid=1]:QuorumCn >>> xManager@382] - Cannot open channel to 2 at election address >>> node2.controller.local/127.0.0.8:5004 >>> java.net.ConnectException: Connection refused >>> at java.net.PlainSocketImpl.socketConnect(Native Method) >>> at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSock >>> etImpl.java:339) >>> at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPl >>> ainSocketImpl.java:200) >>> at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocket >>> Impl.java:182) >>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) >>> at java.net.Socket.connect(Socket.java:579) >>> >>> However, after failed to get any election messages from Node 3, it >>> started a new round but then I seem to see the election messages from Node >>> 2 again. Any idea why? >>> >>> 2016-10-03 12:38:32,815 [myid:1] - INFO [QuorumPeer[myid=1]/127.0.0.1: >>> 5002:QuorumPeer@714] - LOOKING >>> 2016-10-03 12:38:32,816 [myid:1] - INFO [QuorumPeer[myid=1]/127.0.0.1: >>> 5002:FileSnap@83] - Reading snapshot /var/lib/zookeeper/version-2/s >>> napshot.20002aa0b >>> 2016-10-03 12:38:32,875 [myid:1] - INFO [QuorumPeer[myid=1]/127.0.0.1: >>> 5002:FastLeaderElection@815] - New election. My id = 1, proposed >>> zxid=0x20002bfb5 >>> 2016-10-03 12:38:32,876 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLe >>> aderElection@597] - Notification: 1 (message format version), 1 >>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1 >>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state) >>> 2016-10-03 12:38:32,878 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLe >>> aderElection@597] - Notification: 1 (message format version), 2 >>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 2 >>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state) >>> 2016-10-03 12:38:32,878 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLe >>> aderElection@597] - Notification: 1 (message format version), 2 >>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1 >>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state) >>> 2016-10-03 12:38:32,917 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLe >>> aderElection@597] - Notification: 1 (message format version), 3 >>> (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3 >>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state) >>> 2016-10-03 12:38:33,117 [myid:1] - INFO [QuorumPeer[myid=1]/127.0.0.1: >>> 5002:FastLeaderElection@849] - Notification time out: 400 >>> 2016-10-03 12:38:33,118 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLe >>> aderElection@597] - Notification: 1 (message format version), 2 >>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1 >>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state) >>> 2016-10-03 12:38:33,159 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLe >>> aderElection@597] - Notification: 1 (message format version), 3 >>> (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3 >>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state) >>> 2016-10-03 12:38:33,559 [myid:1] - INFO [QuorumPeer[myid=1]/127.0.0.1: >>> 5002:FastLeaderElection@849] - Notification time out: 800 >>> 2016-10-03 12:38:33,560 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLe >>> aderElection@597] - Notification: 1 (message format version), 2 >>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1 >>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state) >>> 2016-10-03 12:38:33,561 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLe >>> aderElection@597] - Notification: 1 (message format version), 2 >>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LEADING (n.state), 2 >>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state) >>> 2016-10-03 12:38:33,601 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLe >>> aderElection@597] - Notification: 1 (message format version), 3 >>> (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3 >>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state) >>> 2016-10-03 12:38:34,402 [myid:1] - INFO [QuorumPeer[myid=1]/127.0.0.1: >>> 5002:FastLeaderElection@849] - Notification time out: 1600 >>> >>> >>> >>> On Oct 8, 2016, at 10:05 AM, Anand Parthasarathy < >>> anpar...@avinetworks.com<mailto:anpar...@avinetworks.com>> wrote: >>> >>> Hi Flavio, >>> >>> I have attached the logs from node 1 and node 3. Node 2 was powered off >>> around 10-03 12:36. Leader election kept going until 10-03 15:57:16 when it >>> finally converged. >>> >>> Thanks, >>> Anand. >>> >>> On Sat, Oct 8, 2016 at 7:55 AM, Flavio Junqueira <f...@apache.org<mailto: >>> f...@apache.org>> wrote: >>> Hi Anand, >>> >>> I don't understand whether 1 and 3 were able or even trying to connect to >>> each other. They should be able to elect a leader between them and make >>> progress. You might want to upload logs and let us know. >>> >>> -Flavio >>> >>>> On 08 Oct 2016, at 02:11, Anand Parthasarathy <anpar...@avinetworks.com >>> <mailto:anpar...@avinetworks.com>> wrote: >>>> >>>> Hi, >>>> >>>> We are currently using zookeeper 3.4.6 version and use a 3 node >>> solution in >>>> our system. We see that occasionally, when a node is powered off (in >>> this >>>> instance, it was actually a leader node), the remaining two nodes do not >>>> form a quorum for a really long time. Looking at the logs, it appears >>> the >>>> sequence is as follows: >>>> - Node 2 is the zookeeper leader >>>> - Node 2 is powered off >>>> - Node 1 and Node 3 recognize and start the election >>>> - Node 3 times out after initLimit * tickTime with "Timeout while >>> waiting >>>> for quorum" for Round N >>>> - Node 1 times out after initLimit * tickTime with "Exception while >>> trying >>>> to follow leader" for Round N+1 at the same time. >>>> - And the process continues where N is sequentially incrementing. >>>> - This happens for a long time. >>>> - In one instance, we used tickTime=5000 and initLimit=20 and it took >>>> around 3.5 hours to converge. >>>> - In a given round, Node 1 will try connecting to Node 2, gets >>> connection >>>> refused waits for notification timeout which increases by 2 every >>> iteration >>>> until it hits the initLimit. Connection Refused is because the node 2 >>> comes >>>> up after reboot, but zookeeper process is not started (due to a >>> different >>>> failure). >>>> >>>> It looks similar to ZOOKEEPER-2164 but there it is a connection timeout >>>> where Node 2 is not reachable. >>>> >>>> Could you pls. share if you have seen this issue and if so, what is the >>>> workaround that can be employed in 3.4.6. >>>> >>>> Thanks, >>>> Anand. >>> >>> >>> <log.tgz> >>> >>> >> >> >> -- >> Cheers >> Michael. >>