>  it was supposed to get back "automatically" right ?
absolutely

I haven't seen this exact problem before. Which ZooKeeper version are you
using?

Can you maybe check the logs for any earlier exceptions / errors before
2021-02-28 09:11:38,119 and after the previous successful leader election?
One possibility is that one of the threads responsible for the leader
election communication (e.g. sending out notifications or receiving the
answers) were killed and never restarted in this ZooKeeper Server. If this
would be the case, we should see a reason / stack trace earlier explaining
the unhandled exception, and we could fix this in the code.

I saw somewhat similar cases, when the current leader had the same issue
with threads that are receiving the leader election notifications. But in
that case no one was able to rejoin the quorum and the leader needed to be
restarted. Your case is different, as restarting a single follower solved
the issue.

Regards,
Mate

On Mon, Mar 1, 2021 at 8:26 PM Jhanssen Fávaro <jhanssenfav...@gmail.com>
wrote:

> Hi all, yesterday during a short VMWare Patching apply, there a was a very
> short network outage, something about 3/5 seconds, but for one of our
> ZooKeeper node it didn't come back automatically.
> Our ensemble cluster has 5 members and only one got this kind of behavior.
> I needed to manually restart it to get it back.  In this case this
> member/zk was not receving connections from kafka, it's only a repository
> backup, but anyway it was supposed to get back "automatically" right ?
> It was trying to call the leader, but only this got this kind of behavior,
> all the others nodes got the ellection process and reestablished the
> ensemble.
> Bellow there is a log's chunk:
>
>
>
>
>
> [2021-02-28 09:11:38,119] WARN Exception when following the leader
> (org.apache.zookeeper.server.quorum.Learner)
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>         at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:84)
>         at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:86)
>         at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:118)
>         at
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:158)
>         at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:92)
>         at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1253)
> [2021-02-28 09:11:38,119] WARN Closing connection to leader, exception
> during packet send
> (org.apache.zookeeper.server.quorum.SendAckRequestProcessor)
> javax.net.ssl.SSLException: Connection or outbound has been closed
>         at sun.security.ssl.Alert.createSSLException(Alert.java:127)
>         at
> sun.security.ssl.TransportContext.fatal(TransportContext.java:324)
>         at
> sun.security.ssl.TransportContext.fatal(TransportContext.java:267)
>         at
> sun.security.ssl.TransportContext.fatal(TransportContext.java:262)
>         at
> sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:979)
>         at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>         at
> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>         at
> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:144)
>         at
> org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:62)
>         at
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:186)
>         at
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> Caused by: java.net.SocketException: Connection or outbound has been closed
>         at
> sun.security.ssl.SSLSocketOutputRecord.deliver(SSLSocketOutputRecord.java:267)
>         at
> sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:974)
>         ... 6 more
> [2021-02-28 09:11:38,120] INFO shutdown called
> (org.apache.zookeeper.server.quorum.Learner)
> java.lang.Exception: shutdown Follower
>         at
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:201)
>         at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1257)
> [2021-02-28 09:11:38,120] INFO Shutting down
> (org.apache.zookeeper.server.ZooKeeperServer)
> [2021-02-28 09:11:38,120] INFO shutting down
> (org.apache.zookeeper.server.ZooKeeperServer)
> [2021-02-28 09:11:38,120] INFO Shutting down
> (org.apache.zookeeper.server.quorum.FollowerRequestProcessor)
> [2021-02-28 09:11:38,121] INFO Shutting down
> (org.apache.zookeeper.server.quorum.CommitProcessor)
> [2021-02-28 09:11:38,121] INFO FollowerRequestProcessor exited loop!
> (org.apache.zookeeper.server.quorum.FollowerRequestProcessor)
> [2021-02-28 09:11:38,121] INFO CommitProcessor exited loop!
> (org.apache.zookeeper.server.quorum.CommitProcessor)
> [2021-02-28 09:11:38,123] INFO shutdown of request processor complete
> (org.apache.zookeeper.server.FinalRequestProcessor)
> [2021-02-28 09:11:38,132] INFO Shutting down
> (org.apache.zookeeper.server.SyncRequestProcessor)
> [2021-02-28 09:11:38,132] INFO SyncRequestProcessor exited!
> (org.apache.zookeeper.server.SyncRequestProcessor)
> [2021-02-28 09:11:38,132] WARN PeerState set to LOOKING
> (org.apache.zookeeper.server.quorum.QuorumPeer)
> [2021-02-28 09:11:38,132] INFO LOOKING
> (org.apache.zookeeper.server.quorum.QuorumPeer)
> [2021-02-28 09:11:38,133] INFO New election. My id =  111, proposed
> zxid=0x2800000dd6 (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:11:38,135] INFO Notification: 2 (message format version),
> 111 (n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state),
> 111 (n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:11:38,336] INFO Notification time out: 400
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:11:38,337] INFO Notification: 2 (message format version),
> 111 (n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state),
> 111 (n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:11:38,737] INFO Notification time out: 800
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:11:38,738] INFO Notification: 2 (message format version),
> 111 (n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state),
> 111 (n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:11:39,538] INFO Notification time out: 1600
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:11:39,539] INFO Notification: 2 (message format version),
> 111 (n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state),
> 111 (n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:11:41,139] INFO Notification time out: 3200
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:11:41,140] INFO Notification: 2 (message format version),
> 111 (n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state),
> 111 (n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:11:44,340] INFO Notification time out: 6400
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:11:44,341] INFO Notification: 2 (message format version),
> 111 (n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state),
> 111 (n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:11:50,741] INFO Notification time out: 12800
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:11:50,741] INFO Notification: 2 (message format version),
> 111 (n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state),
> 111 (n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:12:03,542] INFO Notification time out: 25600
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:12:03,542] INFO Notification: 2 (message format version),
> 111 (n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state),
> 111 (n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:12:29,142] INFO Notification time out: 51200
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:12:29,143] INFO Notification: 2 (message format version),
> 111 (n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state),
> 111 (n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:13:20,344] INFO Notification time out: 60000
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:13:20,345] INFO Notification: 2 (message format version),
> 111 (n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state),
> 111 (n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:14:20,345] INFO Notification time out: 60000
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:14:20,346] INFO Notification: 2 (message format version),
> 111 (n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state),
> 111 (n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:15:20,346] INFO Notification time out: 60000
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:15:20,347] INFO Notification: 2 (message format version),
> 111 (n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state),
> 111 (n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:16:20,347] INFO Notification time out: 60000
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:16:20,348] INFO Notification: 2 (message format version),
> 111 (n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state),
> 111 (n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:17:20,348] INFO Notification time out: 60000
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:17:20,350] INFO Notification: 2 (message format version),
> 111 (n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state),
> 111 (n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:18:20,350] INFO Notification time out: 60000
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:18:20,351] INFO Notification: 2 (message format version),
> 111 (n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state),
> 111 (n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:19:20,352] INFO Notification time out: 60000
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:19:20,352] INFO Notification: 2 (message format version),
> 111 (n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state),
> 111 (n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:20:20,353] INFO Notification time out: 60000
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:20:20,357] INFO Notification: 2 (message format version),
> 111 (n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state),
> 111 (n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:21:20,357] INFO Notification time out: 60000
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2021-02-28 09:21:20,358] INFO Notification: 2 (message format version),
> 111 (n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state),
> 111 (n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
> (org.apache.zookeeper.server.quorum.FastLeaderElection)
> #############################################################
> Needed to manually restart to force reconnect to ensemble cluster.
> #############################################################
> [2021-02-28 09:21:56,110] INFO Reading configuration from:
> /opt/kafka/config/zookeeper.properties
> (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
> [2021-02-28 09:21:56,113] INFO clientPort is not set
> (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
> [2021-02-28 09:21:56,115] INFO secureClientPortAddress is 0.0.0.0:3184
> (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
> [2021-02-28 09:21:56,118] INFO Setting -D
> jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated
> TLS renegotiation (org.apache.zookeeper.common.X509Util)
> [2021-02-28 09:21:56,126] INFO autopurge.snapRetainCount set to 10
> (org.apache.zookeeper.server.DatadirCleanupManager)
> [2021-02-28 09:21:56,126] INFO autopurge.purgeInterval set to 1
> (org.apache.zookeeper.server.DatadirCleanupManager)
> [2021-02-28 09:21:56,129] INFO Purge task started.
> (org.apache.zookeeper.server.DatadirCleanupManager)
> [2021-02-28 09:21:56,129] INFO Log4j 1.2 jmx support found and enabled.
> (org.apache.zookeeper.jmx.ManagedUtil)
> [2021-02-28 09:21:56,135] INFO Starting quorum peer
> (org.apache.zookeeper.server.quorum.QuorumPeerMain)
> [2021-02-28 09:21:56,137] INFO zookeeper.snapshot.trust.empty : false
> (org.apache.zookeeper.server.persistence.FileTxnSnapLog)
> [2021-02-28 09:21:56,146] INFO Purge task completed.
> (org.apache.zookeeper.server.DatadirCleanupManager)
> [2021-02-28 09:21:56,187] INFO zookeeper.client.portUnification=false
> (org.apache.zookeeper.server.NettyServerCnxnFactory)
> [2021-02-28 09:21:56,220] INFO Using
> org.apache.zookeeper.server.NettyServerCnxnFactory as server connection
> factory (org.apache.zookeeper.server.ServerCnxnFactory)
> [2021-02-28 09:21:56,228] INFO Server successfully logged in.
> (org.apache.zookeeper.Login)
> [2021-02-28 09:21:56,242] INFO zookeeper.snapshot.trust.empty : false
> (org.apache.zookeeper.server.persistence.FileTxnSnapLog)
> [2021-02-28 09:21:56,242] INFO Local sessions disabled
> (org.apache.zookeeper.server.quorum.QuorumPeer)
> [2021-02-28 09:21:56,242] INFO Local session upgrading disabled
> (org.apache.zookeeper.server.quorum.QuorumPeer)
> [2021-02-28 09:21:56,243] INFO tickTime set to 2000
> (org.apache.zookeeper.server.quorum.QuorumPeer)
> [2021-02-28 09:21:56,243] INFO minSessionTimeout set to 4000
> (org.apache.zookeeper.server.quorum.QuorumPeer)
> [2021-02-28 09:21:56,243] INFO maxSessionTimeout set to 40000
> (org.apache.zookeeper.server.quorum.QuorumPeer)
> [2021-02-28 09:21:56,243] INFO initLimit set to 5
> (org.apache.zookeeper.server.quorum.QuorumPeer)
> [2021-02-28 09:21:56,253] INFO zookeeper.snapshotSizeFactor = 0.33
> (org.apache.zookeeper.server.ZKDatabase)
> [2021-02-28 09:21:56,255] INFO Using TLS encrypted quorum communication
> (org.apache.zookeeper.server.quorum.QuorumPeer)
> [2021-02-28 09:21:56,255] INFO Port unification disabled
> (org.apache.zookeeper.server.quorum.QuorumPeer)
> [2021-02-28 09:21:56,255] INFO quorum.auth.enableSasl set to true
> (org.apache.zookeeper.server.quorum.QuorumPeer)
> [2021-02-28 09:21:56,255] INFO quorum.auth.serverRequireSasl set to true
> (org.apache.zookeeper.server.quorum.QuorumPeer)
> [2021-02-28 09:21:56,255] INFO quorum.auth.learnerRequireSasl set to true
> (org.apache.zookeeper.server.quorum.QuorumPeer)
> [2021-02-28 09:21:56,255] INFO quorum.auth.kerberos.servicePrincipal set
> to zkquorum/localhost (org.apache.zookeeper.server.quorum.QuorumPeer)
> [2021-02-28 09:21:56,255] INFO quorum.auth.server.saslLoginContext set to
> QuorumServer (org.apache.zookeeper.server.quorum.QuorumPeer)
> [2021-02-28 09:21:56,255] INFO quorum.auth.learner.saslLoginContext set to
> QuorumLearner (org.apache.zookeeper.server.quorum.QuorumPeer)
> [2021-02-28 09:21:56,255] INFO quorum.cnxn.threads.size set to 20
> (org.apache.zookeeper.server.quorum.QuorumPeer)
> [2021-02-28 09:21:56,257] INFO QuorumServer successfully logged in.
> (org.apache.zookeeper.Login)
> [2021-02-28 09:21:56,258] INFO QuorumLearner successfully logged in.
> (org.apache.zookeeper.Login)
> [2021-02-28 09:21:56,260] INFO Reading snapshot
> /zookeeper-datastore/version-2/snapshot.2400000034
> (org.apache.zookeeper.server.persistence.FileSnap)
> [2021-02-28 09:21:56,511] INFO binding to port 0.0.0.0/0.0.0.0:3184
> (org.apache.zookeeper.server.NettyServerCnxnFactory)
> [2021-02-28 09:21:56,565] INFO bound to port 3184
> (org.apache.zookeeper.server.NettyServerCnxnFactory)
> [2021-02-28 09:21:56,571] INFO Election port bind maximum retries is 3
> (org.apache.zookeeper.server.quorum.QuorumCnxManager)
> [2021-02-28 09:21:56,574] INFO Creating TLS-only quorum server socket
> (org.apache.zookeeper.server.quorum.QuorumCnxManager)
> [2021-02-28 09:21:56,575] INFO 111 is accepting connections now, my
> election bind port: zk-xpto.production.com/10.1.1.99:3888
> (org.apache.zookeeper.server.quorum.QuorumCnxManager)
> [2021-02-28 09:21:56,587] INFO LOOKING
> (org.apache.zookeeper.server.quorum.QuorumPeer)
> [2021-02-28 09:21:56,592] INFO New election. My id =  111, proposed
> zxid=0x2800000dd6 (org.apache.zookeeper.server.quorum.FastLeaderElection)
>
>
>

Reply via email to