Using zookeeper 3.4.5 I came across a situation where all the 3 Zookeeper
suddenly stop.
What I see is that NODE1 fails to write on the disk. so it makes sense to
me that NODE1 stops.
But it is unclear why NODE2 and NODE3 would stop running as well, I have a
hard time making sense of the log messages.
Any insight would be greatly appreciated!
see log extracts below:
NODE1:
-- no log for several days before this --
2015-01-04 16:18:22,259 [myid:1] - WARN [SyncThread:1:FileTxnLog@321] -
fsync-ing the write ahead log in SyncThread:1 took 11024ms which will
adversely effect operation latency. See the ZooKeeper troubleshooting guide
2015-01-04 16:18:22,380 [myid:1] - WARN
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
following the leader
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
at
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
2015-01-04 16:18:23,384 [myid:1] - WARN [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of
session 0x0 due to java.io.IOException: ZooKeeperServer not running
2015-01-04 16:18:23,492 [myid:1] - WARN [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of
session 0x0 due to java.io.IOException: ZooKeeperServer not running
2015-01-04 16:18:24,060 [myid:1] - WARN [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of
session 0x0 due to java.io.IOException: ZooKeeperServer not running
NODE2:
-- no log for several days before this --
2015-01-04 16:18:21,899 [myid:3] - WARN
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
following the leader
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
at
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
2015-01-04 16:18:22,760 [myid:3] - WARN [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of
session 0x0 due to java.io.IOException: ZooKeeperServer not running
2015-01-04 16:18:22,801 [myid:3] - WARN [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of
session 0x0 due to java.io.IOException: ZooKeeperServer not running
2015-01-04 16:18:22,886 [myid:3] - WARN [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of
session 0x0 due to java.io.IOException: ZooKeeperServer not running
NODE3 (leader):
-- no log for several days before this --
2015-01-04 16:18:21,897 [myid:2] - WARN
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing
connection to peer due to transaction timeout.
2015-01-04 16:18:21,898 [myid:2] - WARN
[LearnerHandler-/204.53.107.249:43402:LearnerHandler@646] - ******* GOODBYE
/204.53.107.249:43402 ********
2015-01-04 16:18:21,905 [myid:2] - WARN
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing
connection to peer due to transaction timeout.
2015-01-04 16:18:21,907 [myid:2] - WARN
[LearnerHandler-/204.53.107.247:45953:LearnerHandler@646] - ******* GOODBYE
/204.53.107.247:45953 ********
2015-01-04 16:18:21,918 [myid:2] - WARN
[LearnerHandler-/204.53.107.247:45953:LearnerHandler@658] - Ignoring
unexpected exception
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
at
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
at
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
at
org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:656)
at
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:649)
2015-01-04 16:18:23,003 [myid:2] - WARN [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of
session 0x0 due to java.io.IOException: ZooKeeperServer not running
2015-01-04 16:18:23,007 [myid:2] - WARN [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of
session 0x0 due to java.io.IOException: ZooKeeperServer not running
2015-01-04 16:18:23,115 [myid:2] - WARN [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of
session 0x0 due to java.io.IOException: ZooKeeperServer not running
Thanks!
Benjamin