Hi all, yesterday during a short VMWare Patching apply, there a was a very
short network outage, something about 3/5 seconds, but for one of our ZooKeeper
node it didn't come back automatically.
Our ensemble cluster has 5 members and only one got this kind of behavior. I
needed to manually restart it to get it back. In this case this member/zk was
not receving connections from kafka, it's only a repository backup, but anyway
it was supposed to get back "automatically" right ?
It was trying to call the leader, but only this got this kind of behavior, all
the others nodes got the ellection process and reestablished the ensemble.
Bellow there is a log's chunk:
[2021-02-28 09:11:38,119] WARN Exception when following the leader
(org.apache.zookeeper.server.quorum.Learner)
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:84)
at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:86)
at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:118)
at
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:158)
at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:92)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1253)
[2021-02-28 09:11:38,119] WARN Closing connection to leader, exception during
packet send (org.apache.zookeeper.server.quorum.SendAckRequestProcessor)
javax.net.ssl.SSLException: Connection or outbound has been closed
at sun.security.ssl.Alert.createSSLException(Alert.java:127)
at sun.security.ssl.TransportContext.fatal(TransportContext.java:324)
at sun.security.ssl.TransportContext.fatal(TransportContext.java:267)
at sun.security.ssl.TransportContext.fatal(TransportContext.java:262)
at
sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:979)
at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at
org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:144)
at
org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:62)
at
org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:186)
at
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
Caused by: java.net.SocketException: Connection or outbound has been closed
at
sun.security.ssl.SSLSocketOutputRecord.deliver(SSLSocketOutputRecord.java:267)
at
sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:974)
... 6 more
[2021-02-28 09:11:38,120] INFO shutdown called
(org.apache.zookeeper.server.quorum.Learner)
java.lang.Exception: shutdown Follower
at
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:201)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1257)
[2021-02-28 09:11:38,120] INFO Shutting down
(org.apache.zookeeper.server.ZooKeeperServer)
[2021-02-28 09:11:38,120] INFO shutting down
(org.apache.zookeeper.server.ZooKeeperServer)
[2021-02-28 09:11:38,120] INFO Shutting down
(org.apache.zookeeper.server.quorum.FollowerRequestProcessor)
[2021-02-28 09:11:38,121] INFO Shutting down
(org.apache.zookeeper.server.quorum.CommitProcessor)
[2021-02-28 09:11:38,121] INFO FollowerRequestProcessor exited loop!
(org.apache.zookeeper.server.quorum.FollowerRequestProcessor)
[2021-02-28 09:11:38,121] INFO CommitProcessor exited loop!
(org.apache.zookeeper.server.quorum.CommitProcessor)
[2021-02-28 09:11:38,123] INFO shutdown of request processor complete
(org.apache.zookeeper.server.FinalRequestProcessor)
[2021-02-28 09:11:38,132] INFO Shutting down
(org.apache.zookeeper.server.SyncRequestProcessor)
[2021-02-28 09:11:38,132] INFO SyncRequestProcessor exited!
(org.apache.zookeeper.server.SyncRequestProcessor)
[2021-02-28 09:11:38,132] WARN PeerState set to LOOKING
(org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-28 09:11:38,132] INFO LOOKING
(org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-28 09:11:38,133] INFO New election. My id = 111, proposed
zxid=0x2800000dd6 (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:11:38,135] INFO Notification: 2 (message format version), 111
(n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state), 111
(n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:11:38,336] INFO Notification time out: 400
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:11:38,337] INFO Notification: 2 (message format version), 111
(n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state), 111
(n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:11:38,737] INFO Notification time out: 800
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:11:38,738] INFO Notification: 2 (message format version), 111
(n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state), 111
(n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:11:39,538] INFO Notification time out: 1600
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:11:39,539] INFO Notification: 2 (message format version), 111
(n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state), 111
(n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:11:41,139] INFO Notification time out: 3200
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:11:41,140] INFO Notification: 2 (message format version), 111
(n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state), 111
(n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:11:44,340] INFO Notification time out: 6400
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:11:44,341] INFO Notification: 2 (message format version), 111
(n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state), 111
(n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:11:50,741] INFO Notification time out: 12800
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:11:50,741] INFO Notification: 2 (message format version), 111
(n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state), 111
(n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:12:03,542] INFO Notification time out: 25600
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:12:03,542] INFO Notification: 2 (message format version), 111
(n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state), 111
(n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:12:29,142] INFO Notification time out: 51200
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:12:29,143] INFO Notification: 2 (message format version), 111
(n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state), 111
(n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:13:20,344] INFO Notification time out: 60000
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:13:20,345] INFO Notification: 2 (message format version), 111
(n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state), 111
(n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:14:20,345] INFO Notification time out: 60000
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:14:20,346] INFO Notification: 2 (message format version), 111
(n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state), 111
(n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:15:20,346] INFO Notification time out: 60000
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:15:20,347] INFO Notification: 2 (message format version), 111
(n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state), 111
(n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:16:20,347] INFO Notification time out: 60000
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:16:20,348] INFO Notification: 2 (message format version), 111
(n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state), 111
(n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:17:20,348] INFO Notification time out: 60000
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:17:20,350] INFO Notification: 2 (message format version), 111
(n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state), 111
(n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:18:20,350] INFO Notification time out: 60000
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:18:20,351] INFO Notification: 2 (message format version), 111
(n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state), 111
(n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:19:20,352] INFO Notification time out: 60000
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:19:20,352] INFO Notification: 2 (message format version), 111
(n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state), 111
(n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:20:20,353] INFO Notification time out: 60000
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:20:20,357] INFO Notification: 2 (message format version), 111
(n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state), 111
(n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:21:20,357] INFO Notification time out: 60000
(org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-28 09:21:20,358] INFO Notification: 2 (message format version), 111
(n.leader), 0x2800000dd6 (n.zxid), 0xf8 (n.round), LOOKING (n.state), 111
(n.sid), 0x28 (n.peerEPoch), LOOKING (my state)0 (n.config version)
(org.apache.zookeeper.server.quorum.FastLeaderElection)
#############################################################
Needed to manually restart to force reconnect to ensemble cluster.
#############################################################
[2021-02-28 09:21:56,110] INFO Reading configuration from:
/opt/kafka/config/zookeeper.properties
(org.apache.zookeeper.server.quorum.QuorumPeerConfig)
[2021-02-28 09:21:56,113] INFO clientPort is not set
(org.apache.zookeeper.server.quorum.QuorumPeerConfig)
[2021-02-28 09:21:56,115] INFO secureClientPortAddress is 0.0.0.0:3184
(org.apache.zookeeper.server.quorum.QuorumPeerConfig)
[2021-02-28 09:21:56,118] INFO Setting -D
jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS
renegotiation (org.apache.zookeeper.common.X509Util)
[2021-02-28 09:21:56,126] INFO autopurge.snapRetainCount set to 10
(org.apache.zookeeper.server.DatadirCleanupManager)
[2021-02-28 09:21:56,126] INFO autopurge.purgeInterval set to 1
(org.apache.zookeeper.server.DatadirCleanupManager)
[2021-02-28 09:21:56,129] INFO Purge task started.
(org.apache.zookeeper.server.DatadirCleanupManager)
[2021-02-28 09:21:56,129] INFO Log4j 1.2 jmx support found and enabled.
(org.apache.zookeeper.jmx.ManagedUtil)
[2021-02-28 09:21:56,135] INFO Starting quorum peer
(org.apache.zookeeper.server.quorum.QuorumPeerMain)
[2021-02-28 09:21:56,137] INFO zookeeper.snapshot.trust.empty : false
(org.apache.zookeeper.server.persistence.FileTxnSnapLog)
[2021-02-28 09:21:56,146] INFO Purge task completed.
(org.apache.zookeeper.server.DatadirCleanupManager)
[2021-02-28 09:21:56,187] INFO zookeeper.client.portUnification=false
(org.apache.zookeeper.server.NettyServerCnxnFactory)
[2021-02-28 09:21:56,220] INFO Using
org.apache.zookeeper.server.NettyServerCnxnFactory as server connection factory
(org.apache.zookeeper.server.ServerCnxnFactory)
[2021-02-28 09:21:56,228] INFO Server successfully logged in.
(org.apache.zookeeper.Login)
[2021-02-28 09:21:56,242] INFO zookeeper.snapshot.trust.empty : false
(org.apache.zookeeper.server.persistence.FileTxnSnapLog)
[2021-02-28 09:21:56,242] INFO Local sessions disabled
(org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-28 09:21:56,242] INFO Local session upgrading disabled
(org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-28 09:21:56,243] INFO tickTime set to 2000
(org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-28 09:21:56,243] INFO minSessionTimeout set to 4000
(org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-28 09:21:56,243] INFO maxSessionTimeout set to 40000
(org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-28 09:21:56,243] INFO initLimit set to 5
(org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-28 09:21:56,253] INFO zookeeper.snapshotSizeFactor = 0.33
(org.apache.zookeeper.server.ZKDatabase)
[2021-02-28 09:21:56,255] INFO Using TLS encrypted quorum communication
(org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-28 09:21:56,255] INFO Port unification disabled
(org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-28 09:21:56,255] INFO quorum.auth.enableSasl set to true
(org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-28 09:21:56,255] INFO quorum.auth.serverRequireSasl set to true
(org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-28 09:21:56,255] INFO quorum.auth.learnerRequireSasl set to true
(org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-28 09:21:56,255] INFO quorum.auth.kerberos.servicePrincipal set to
zkquorum/localhost (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-28 09:21:56,255] INFO quorum.auth.server.saslLoginContext set to
QuorumServer (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-28 09:21:56,255] INFO quorum.auth.learner.saslLoginContext set to
QuorumLearner (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-28 09:21:56,255] INFO quorum.cnxn.threads.size set to 20
(org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-28 09:21:56,257] INFO QuorumServer successfully logged in.
(org.apache.zookeeper.Login)
[2021-02-28 09:21:56,258] INFO QuorumLearner successfully logged in.
(org.apache.zookeeper.Login)
[2021-02-28 09:21:56,260] INFO Reading snapshot
/zookeeper-datastore/version-2/snapshot.2400000034
(org.apache.zookeeper.server.persistence.FileSnap)
[2021-02-28 09:21:56,511] INFO binding to port 0.0.0.0/0.0.0.0:3184
(org.apache.zookeeper.server.NettyServerCnxnFactory)
[2021-02-28 09:21:56,565] INFO bound to port 3184
(org.apache.zookeeper.server.NettyServerCnxnFactory)
[2021-02-28 09:21:56,571] INFO Election port bind maximum retries is 3
(org.apache.zookeeper.server.quorum.QuorumCnxManager)
[2021-02-28 09:21:56,574] INFO Creating TLS-only quorum server socket
(org.apache.zookeeper.server.quorum.QuorumCnxManager)
[2021-02-28 09:21:56,575] INFO 111 is accepting connections now, my election
bind port: zk-xpto.production.com/10.1.1.99:3888
(org.apache.zookeeper.server.quorum.QuorumCnxManager)
[2021-02-28 09:21:56,587] INFO LOOKING
(org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-28 09:21:56,592] INFO New election. My id = 111, proposed
zxid=0x2800000dd6 (org.apache.zookeeper.server.quorum.FastLeaderElection)