Hi Andor,

As this is on a production server, I can’t attach the log file entirely, but I 
can try and get you as much information as I can:

Nearly all of the log file is filled with connection errors from ZooKeeper 
clients:

> WARN NIOServerCnxn – Exception causing close of session 0x0 due to 
> java.io.IOException: ZooKeeperServer not running
> INFO NIOServerCnxn – Closed socket connection for client /<redacted> (no 
> session established for client)

I grabbed all of the IP addresses in the log file and they’re all from clients, 
no mention of other ZK servers.

Looking at ‘Quorum’, I see a lot of:

> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] INFO  FastLeaderElection - 
> Notification time out: 60000
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] INFO  QuorumCnxManager - Have 
> smaller server identifier, so dropping the connection: (2, 1)
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] INFO  QuorumCnxManager - Have 
> smaller server identifier, so dropping the connection: (3, 1)
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] INFO  QuorumCnxManager - Have 
> smaller server identifier, so dropping the connection: (4, 1)
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] INFO  QuorumCnxManager - Have 
> smaller server identifier, so dropping the connection: (5, 1)

Let me know if there is anything else you think I should look for. If I find 
anything interesting I’ll share it here.



From: Andor Molnar <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Friday, January 25, 2019 at 10:01
To: "[email protected]" <[email protected]>
Subject: [**SPAM**] Re: [**SPAM**] RE: ZK Server does not join quorum after 
restart

Hi Ian,

Would you please attach logs from all participants of the ensemble or try
to find an exception from when the follower is trying to join?

Regards,
Andor



On Fri, Jan 25, 2019 at 1:37 AM Ian Spence 
<[email protected]<mailto:[email protected]>>
wrote:

Hi Daniel,

Thanks for the quick reply. We use static IP addresses on all of the
servers so it did not change after the reboot.

Thanks,
-Ian

From: Daniel Chan <[email protected]<mailto:[email protected]>> 
on behalf of Daniel Chan <
[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Thursday, January 24, 2019 at 16:36
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: [**SPAM**] RE: ZK Server does not join quorum after restart


If its IP address got changed, then you hit a known bug
https://issues.apache.org/jira/browse/ZOOKEEPER-1506  and you need to
bounce the cluster.

Thanks,
Daniel

-----Original Message-----
From: Ian Spence 
<[email protected]<mailto:[email protected]><mailto:
[email protected]<mailto:[email protected]>>>
Sent: Thursday, January 24, 2019 2:36 PM
To: 
[email protected]<mailto:[email protected]><mailto:[email protected]>
Subject: ZK Server does not join quorum after restart

Hello

We have a cluster of 5 ZK servers, all running ZK 3.4.6 on Java 1.8 on
CentOS 6. These are physical devices, not virtual machines.

One server required hardware maintenance, and was restarted. When the zk
software was restarted, it did not rejoin the quorum as a follower.

Running “stat” or “mntr” commands returns: “This ZooKeeper instance is not
currently serving requests”

I googled this message and came across this bug:
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D2164&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=JE3yjNS4hXa8nS9n2uFCwEqMvv18hzzEnqunUhCoEns&m=S_8TazqwUbEfRtAYQCn8kA7F2tiGUBaVr3c_nj0Fh8A&s=FGIs9YOjwdYrzBH8om70Jx11KemHKRDsMY_kZK6cpK0&e=

Does anybody know if there is a work-around to this issue? We’ve seen this
problem multiple times in the past and our current solution is to bring
down the zk cluster (which is a huge outage-causing pain).

Thanks

- Ian



Reply via email to