Re: Zookeeper leader election takes a long time.

Ryan Zhang Fri, 14 Oct 2016 17:45:07 -0700

From that log, it looks like Node 3 and 2 formed a quorum (Node 1 can’t talk to 
Node2 for some reason). Even if Node 3 didn’t hear from Node 2 after 12:46, it 
is still a special case as we had a kinda network partition for about 10 mins.  
 I wonder if you can repro this in an environment that Node 2 is definitely 
down? I will take a look at the log after 12:46 when I get a chance.


> On Oct 14, 2016, at 1:40 PM, Anand Parthasarathy <anpar...@avinetworks.com> 
> wrote:
> 
> Hi Michael and Ryan,
> 
> Thanks for looking into the logs.
> 
> Actually, in this case, node 2 was brought down ungracefully and was down
> permanently. In this customer deployment, our product runs as a docker
> container in a bare metal environment. It is possible that the docker proxy
> that provides external connectivity to the zookeeper ports is down but the
> container itself is up and running for several more minutes. If you grep
> for "2 (n.sid)" you will see that there is no packets after 12:46.
> Convergence after that did not happen until 15:57 or so.
> 
> Thanks,
> Anand.
> 
> On Thu, Oct 13, 2016 at 5:10 PM, Michael Han <h...@cloudera.com> wrote:
> 
>>>> it started a new round but then I seem to see the election messages
>> from Node 2 again. Any idea why?
>> 
>> My guess is node 2 is back online and ZK service was started. In this use
>> case node 2 does not get offline permanently, IIUC.
>> 
>> 
>> On Thu, Oct 13, 2016 at 3:41 PM, Ryan Zhang <yangzhangr...@hotmail.com>
>> wrote:
>> 
>>> Hi, Anand, I took a look and I wonder how do you explain the following
>>> 
>>> The N1.log starts at around
>>> 2016-10-03 12:37:38,073 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>>> 5002:QuorumPeer@714] - LOOKING
>>> 
>>> and it failed to talk to Node 2 (id: 2)
>>> 2016-10-03 12:37:38,136 [myid:1] - WARN  [WorkerSender[myid=1]:QuorumCn
>>> xManager@382] - Cannot open channel to 2 at election address
>>> node2.controller.local/127.0.0.8:5004
>>> java.net.ConnectException: Connection refused
>>>        at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSock
>>> etImpl.java:339)
>>>        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPl
>>> ainSocketImpl.java:200)
>>>        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocket
>>> Impl.java:182)
>>>        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>        at java.net.Socket.connect(Socket.java:579)
>>> 
>>> However, after failed to get any election messages from Node 3, it
>>> started a new round but then I seem to see the election messages from Node
>>> 2 again. Any idea why?
>>> 
>>> 2016-10-03 12:38:32,815 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>>> 5002:QuorumPeer@714] - LOOKING
>>> 2016-10-03 12:38:32,816 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>>> 5002:FileSnap@83] - Reading snapshot /var/lib/zookeeper/version-2/s
>>> napshot.20002aa0b
>>> 2016-10-03 12:38:32,875 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>>> 5002:FastLeaderElection@815] - New election. My id =  1, proposed
>>> zxid=0x20002bfb5
>>> 2016-10-03 12:38:32,876 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 1
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:32,878 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 2
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 2
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:32,878 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 2
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:32,917 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 3
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:33,117 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>>> 5002:FastLeaderElection@849] - Notification time out: 400
>>> 2016-10-03 12:38:33,118 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 2
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:33,159 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 3
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:33,559 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>>> 5002:FastLeaderElection@849] - Notification time out: 800
>>> 2016-10-03 12:38:33,560 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 2
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:33,561 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 2
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LEADING (n.state), 2
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:33,601 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 3
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:34,402 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>>> 5002:FastLeaderElection@849] - Notification time out: 1600
>>> 
>>> 
>>> 
>>> On Oct 8, 2016, at 10:05 AM, Anand Parthasarathy <
>>> anpar...@avinetworks.com<mailto:anpar...@avinetworks.com>> wrote:
>>> 
>>> Hi Flavio,
>>> 
>>> I have attached the logs from node 1 and node 3. Node 2 was powered off
>>> around 10-03 12:36. Leader election kept going until 10-03 15:57:16 when it
>>> finally converged.
>>> 
>>> Thanks,
>>> Anand.
>>> 
>>> On Sat, Oct 8, 2016 at 7:55 AM, Flavio Junqueira <f...@apache.org<mailto:
>>> f...@apache.org>> wrote:
>>> Hi Anand,
>>> 
>>> I don't understand whether 1 and 3 were able or even trying to connect to
>>> each other. They should be able to elect a leader between them and make
>>> progress. You might want to upload logs and let us know.
>>> 
>>> -Flavio
>>> 
>>>> On 08 Oct 2016, at 02:11, Anand Parthasarathy <anpar...@avinetworks.com
>>> <mailto:anpar...@avinetworks.com>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> We are currently using zookeeper 3.4.6 version and use a 3 node
>>> solution in
>>>> our system. We see that occasionally, when a node is powered off (in
>>> this
>>>> instance, it was actually a leader node), the remaining two nodes do not
>>>> form a quorum for a really long time. Looking at the logs, it appears
>>> the
>>>> sequence is as follows:
>>>> - Node 2 is the zookeeper leader
>>>> - Node 2 is powered off
>>>> - Node 1 and Node 3 recognize and start the election
>>>> - Node 3 times out after initLimit * tickTime with "Timeout while
>>> waiting
>>>> for quorum" for Round N
>>>> - Node 1 times out after initLimit * tickTime with "Exception while
>>> trying
>>>> to follow leader" for Round N+1 at the same time.
>>>> - And the process continues where N is sequentially incrementing.
>>>> - This happens for a long time.
>>>> - In one instance, we used tickTime=5000 and initLimit=20 and it took
>>>> around 3.5 hours to converge.
>>>> - In a given round, Node 1 will try connecting to Node 2, gets
>>> connection
>>>> refused waits for notification timeout which increases by 2 every
>>> iteration
>>>> until it hits the initLimit. Connection Refused is because the node 2
>>> comes
>>>> up after reboot, but zookeeper process is not started (due to a
>>> different
>>>> failure).
>>>> 
>>>> It looks similar to ZOOKEEPER-2164 but there it is a connection timeout
>>>> where Node 2 is not reachable.
>>>> 
>>>> Could you pls. share if you have seen this issue and if so, what is the
>>>> workaround that can be employed in 3.4.6.
>>>> 
>>>> Thanks,
>>>> Anand.
>>> 
>>> 
>>> <log.tgz>
>>> 
>>> 
>> 
>> 
>> --
>> Cheers
>> Michael.
>>

Re: Zookeeper leader election takes a long time.

Reply via email to