Abe, Thanks for responding. We are currently on ZooKeeper 3.4.10; do you think going to 3.4.11 would make a difference?
We can certainly turn on debug logs, but because the problem has only occurred twice in the last several (probably 8-9) months and we’re unable to reproduce it intentionally, it might take a while. Is there anything else you suggest we do to troubleshoot in the meantime? > On Nov 29, 2017, at 4:52 PM, Abraham Fine <af...@apache.org> wrote: > > Hi Jim- > > What version of ZooKeeper are you using? > > It would be great to get a better understanding of what is causing: > Nov 29 19:32:12 localhost java.lang.Exception: shutdown Leader! reason: > Not sufficient followers synced, only synced with sids: [ 3,4 ] > Nov 29 19:32:12 localhost at > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:517) > Nov 29 19:32:12 localhost at > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:492) > > This failure appears to align with: > Nov 29 19:32:12 localhost zookeeper: WARN > [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when > following the leader > > Is there any possibility debug logs could be enabled? > > Thanks, > Abe > > Would you be able to turn on debug logging, I think that might give us > much more visibility? > On Wed, Nov 29, 2017, at 18:06, Jim Wong wrote: >> Hey, folks. We’re running a 5-node ZooKeeper cluster to manage server >> configuration and discovery. This morning, we experienced an outage in >> which the entire cluster went offline for approximately 7 minutes. >> Restarting the ZooKeeper daemons on each node in the cluster got things >> working again, so there doesn’t appear to have been an underlying network >> or hardware fault that would explain the cluster’s inability to recover. >> >> This is actually the second time this has happened in the last several >> months, and all of our attempts to reproduce the behavior by >> intentionally doing bad things to the cluster or the leader have failed. >> >> With all that in mind, we were wondering if anyone had any thoughts as to >> what might be going on, as we’re out of ideas. If anyone can provide any >> hints or guidance, we’d be very grateful. >> >> I’ve included brief excerpts from logs from some of the nodes; the >> problem started around 17:32:12 and wasn’t resolved until 17:39, after >> we’d restarted ZooKeeper. This was was originally the leader: >> >> Nov 29 19:31:44 localhost zookeeper: INFO [ProcessThread(sid:4 >> cport:-1)::PrepRequestProcessor@486] - Processed session termination for >> sessionid: 0x15fdd1d1e0c0313 >> Nov 29 19:32:11 localhost zookeeper: WARN >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught >> end of stream exception >> Nov 29 19:32:11 localhost zookeeper: WARN >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught >> end of stream exception >> Nov 29 19:32:11 localhost zookeeper: WARN >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught >> end of stream exception >> Nov 29 19:32:12 localhost zookeeper: ERROR >> [LearnerHandler-/10.1.29.76:49198:LearnerHandler@648] - Unexpected >> exception causing shutdown while sock still open >> Nov 29 19:32:12 localhost zookeeper: WARN >> [LearnerHandler-/10.1.29.76:49198:LearnerHandler@661] - ******* GOODBYE >> /10.1.29.76:49198 ******** >> Nov 29 19:32:12 localhost zookeeper: ERROR >> [LearnerHandler-/10.1.48.67:44882:LearnerHandler@648] - Unexpected >> exception causing shutdown while sock still open >> Nov 29 19:32:12 localhost zookeeper: ERROR >> [LearnerHandler-/10.1.0.252:50932:LearnerHandler@648] - Unexpected >> exception causing shutdown while sock still open >> Nov 29 19:32:12 localhost zookeeper: WARN >> [LearnerHandler-/10.1.0.252:50932:LearnerHandler@661] - ******* GOODBYE >> /10.1.0.252:50932 ******** >> Nov 29 19:32:12 localhost zookeeper: WARN >> [LearnerHandler-/10.1.48.67:44882:LearnerHandler@661] - ******* GOODBYE >> /10.1.48.67:44882 ******** >> Nov 29 19:32:12 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LEADING (my >> state) >> Nov 29 19:32:12 localhost zookeeper: WARN >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught >> end of stream exception >> Nov 29 19:32:12 localhost zookeeper: WARN >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught >> end of stream exception >> Nov 29 19:32:12 localhost zookeeper: WARN >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught >> end of stream exception >> Nov 29 19:32:12 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LEADING (my >> state) >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader@511] - Shutting down >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader@517] - Shutdown called >> Nov 29 19:32:12 localhost java.lang.Exception: shutdown Leader! reason: >> Not sufficient followers synced, only synced with sids: [ 3,4 ] >> Nov 29 19:32:12 localhost at >> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:517) >> Nov 29 19:32:12 localhost at >> org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:492) >> Nov 29 19:32:12 localhost zookeeper: INFO >> [LearnerCnxAcceptor-zookeeper-4-internal.prod.airtime.com/10.1.72.145:2888:Leader$LearnerCnxAcceptor@331] >> - exception while shutting down acceptor: java.net.SocketException: >> Socket closed >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed >> socket connection for client /10.1.10.8:42938 which had sessionid >> 0x45f3a424dd20208 >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - shutting >> down >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:SessionTrackerImpl@225] - >> Shutting down >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:PrepRequestProcessor@764] - >> Shutting down >> Nov 29 19:32:12 localhost zookeeper: INFO [ProcessThread(sid:4 >> cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop! >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ProposalRequestProcessor@88] - >> Shutting down >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting >> down >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader$ToBeAppliedRequestProcessor@676] >> - Shutting down >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] - >> shutdown of request processor complete >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] - >> Shutting down >> Nov 29 19:32:12 localhost zookeeper: INFO >> [SyncThread:4:SyncRequestProcessor@186] - SyncRequestProcessor exited! >> Nov 29 19:32:12 localhost zookeeper: INFO >> [CommitProcessor:4:CommitProcessor@153] - CommitProcessor exited loop! >> Nov 29 19:32:12 localhost zookeeper: WARN >> [LearnerHandler-/10.1.37.117:52956:LearnerHandler@661] - ******* GOODBYE >> /10.1.37.117:52956 ******** >> Nov 29 19:32:12 localhost zookeeper: WARN >> [LearnerHandler-/10.1.37.117:52956:LearnerHandler@673] - Ignoring >> unexpected exception >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New >> election. My id = 4, proposed zxid=0x2b0260b5a8 >> Nov 29 19:32:12 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: DHCPREQUEST on eth0 to >> 10.1.64.1 port 67 (xid=0x79592ad4) >> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: DHCPACK from 10.1.64.1 >> (xid=0x79592ad4) >> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: bound to 10.1.72.145 -- >> renewal in 1608 seconds. >> Nov 29 19:32:12 zookeeper-4 ec2net: [get_meta] Trying to get >> http://169.254.169.254/latest/meta-data/network/interfaces/macs/02:c0:01:e2:a6:b2/local-ipv4s >> Nov 29 19:32:12 zookeeper-4 ec2net: [rewrite_aliases] Rewriting aliases >> of eth0 >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 400 >> Nov 29 19:32:12 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:32:13 localhost zookeeper: WARN >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught >> end of stream exception >> Nov 29 19:32:13 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 800 >> Nov 29 19:32:13 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:32:13 localhost zookeeper: WARN >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught >> end of stream exception >> Nov 29 19:32:13 localhost zookeeper: WARN >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught >> end of stream exception >> Nov 29 19:32:14 localhost zookeeper: INFO >> [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited loop! >> Nov 29 19:32:14 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 1600 >> Nov 29 19:32:14 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:32:15 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 3200 >> Nov 29 19:32:15 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:32:18 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 6400 >> Nov 29 19:32:18 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:32:23 zookeeper-4 dhclient[2295]: XMT: Solicit on eth0, >> interval 125280ms. >> Nov 29 19:32:25 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 12800 >> Nov 29 19:32:25 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:32:38 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 25600 >> Nov 29 19:32:38 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:33:03 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 51200 >> Nov 29 19:33:03 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:33:54 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 60000 >> Nov 29 19:33:54 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:34:54 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 60000 >> Nov 29 19:34:54 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:35:54 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 60000 >> Nov 29 19:35:54 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:36:54 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 60000 >> Nov 29 19:36:54 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:37:54 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 60000 >> Nov 29 19:37:54 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:38:54 localhost zookeeper: INFO >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 60000 >> Nov 29 19:38:54 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:39:45 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:39:45 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:39:45 localhost zookeeper: WARN >> [RecvWorker:5:QuorumCnxManager$RecvWorker@1028] - Connection broken for >> id 5, my id = 4, error = >> Nov 29 19:39:45 localhost zookeeper: WARN >> [RecvWorker:5:QuorumCnxManager$RecvWorker@1031] - Interrupting SendWorker >> Nov 29 19:39:45 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:39:45 localhost zookeeper: last message repeated 7 times >> Nov 29 19:39:45 localhost zookeeper: WARN >> [SendWorker:5:QuorumCnxManager$SendWorker@949] - Exception when using >> channel: for id 5 my id = 4 error = java.net.SocketException: Socket >> closed >> Nov 29 19:39:45 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:39:45 localhost zookeeper: last message repeated 2 times >> Nov 29 19:39:45 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:39:45 localhost zookeeper: WARN >> [SendWorker:5:QuorumCnxManager$SendWorker@954] - Send worker leaving >> thread >> Nov 29 19:39:47 localhost zookeeper: INFO >> [zookeeper-4-internal.prod.airtime.com/10.1.72.145:3888:QuorumCnxManager$Listener@746] >> - Received connection request /10.1.48.67:58398 >> Nov 29 19:39:47 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x1 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:39:47 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:39:47 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41 >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:39:47 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:39:47 localhost zookeeper: INFO >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:39:51 zookeeper-4 monit[7081]: restart service >> 'zookeeper-server' on user request >> >> >> And here are logs from one of the followers: >> >> Nov 29 19:32:11 localhost zookeeper: INFO >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating >> client: 0x15fc2e9562e0187 >> Nov 29 19:32:12 localhost zookeeper: WARN >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when >> following the leader >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed >> socket connection for client /10.1.29.33:57934 which had sessionid >> 0x35f3a424dc802a2 >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@140] - >> Shutting down >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - shutting >> down >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@107] - >> Shutting down >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting >> down >> Nov 29 19:32:12 localhost zookeeper: INFO >> [FollowerRequestProcessor:5:FollowerRequestProcessor@97] - >> FollowerRequestProcessor exited loop! >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] - >> shutdown of request processor complete >> Nov 29 19:32:12 localhost zookeeper: INFO >> [CommitProcessor:5:CommitProcessor@153] - CommitProcessor exited loop! >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] - >> Shutting down >> Nov 29 19:32:12 localhost zookeeper: INFO >> [SyncThread:5:SyncRequestProcessor@186] - SyncRequestProcessor exited! >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New >> election. My id = 5, proposed zxid=0x2b0260b5a8 >> Nov 29 19:32:12 localhost zookeeper: INFO >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:32:12 localhost zookeeper: WARN >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught >> end of stream exception >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 400 >> Nov 29 19:32:12 localhost zookeeper: INFO >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:32:12 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 800 >> Nov 29 19:32:12 localhost zookeeper: INFO >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:32:13 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 1600 >> Nov 29 19:32:13 localhost zookeeper: INFO >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:32:15 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 3200 >> Nov 29 19:32:15 localhost zookeeper: INFO >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:32:18 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 6400 >> Nov 29 19:32:18 localhost zookeeper: INFO >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:32:24 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 12800 >> Nov 29 19:32:24 localhost zookeeper: INFO >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:32:37 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 25600 >> Nov 29 19:32:37 localhost zookeeper: INFO >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:33:03 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 51200 >> Nov 29 19:33:03 localhost zookeeper: INFO >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:33:54 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 60000 >> Nov 29 19:33:54 localhost zookeeper: INFO >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:34:42 localhost zookeeper: WARN >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught >> end of stream exception >> Nov 29 19:34:54 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 60000 >> Nov 29 19:34:54 localhost zookeeper: INFO >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:35:54 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 60000 >> Nov 29 19:35:54 localhost zookeeper: INFO >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:36:54 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 60000 >> Nov 29 19:36:54 localhost zookeeper: INFO >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:37:54 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 60000 >> Nov 29 19:37:54 localhost zookeeper: INFO >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:38:54 localhost zookeeper: INFO >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - >> Notification time out: 60000 >> Nov 29 19:38:54 localhost zookeeper: INFO >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my >> state) >> Nov 29 19:39:45 zookeeper-5 monit[7157]: restart service >> 'zookeeper-server' on user request >> >>