Hi Benjamin, The reason why Node2 and Node 3 stop running is that ZooKeeper must have a quorum of servers to make progress. Zookeeper needs at least 3 servers in order to run. In your scenario, you started with three servers which is fine, but since one of the server fails, the zookeeper stop running because it lacks of the quorum (majority).
Ibrahim -----Original Message----- From: Benjamin Jaton [mailto:[email protected]] Sent: Wednesday, January 07, 2015 10:34 م To: [email protected] Subject: Failover when one node fails to write on the disk? Using zookeeper 3.4.5 I came across a situation where all the 3 Zookeeper suddenly stop. What I see is that NODE1 fails to write on the disk. so it makes sense to me that NODE1 stops. But it is unclear why NODE2 and NODE3 would stop running as well, I have a hard time making sense of the log messages. Any insight would be greatly appreciated! see log extracts below: NODE1: -- no log for several days before this -- 2015-01-04 16:18:22,259 [myid:1] - WARN [SyncThread:1:FileTxnLog@321] - fsync-ing the write ahead log in SyncThread:1 took 11024ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide 2015-01-04 16:18:22,380 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786) 2015-01-04 16:18:23,384 [myid:1] - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-01-04 16:18:23,492 [myid:1] - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-01-04 16:18:24,060 [myid:1] - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running NODE2: -- no log for several days before this -- 2015-01-04 16:18:21,899 [myid:3] - WARN [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786) 2015-01-04 16:18:22,760 [myid:3] - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-01-04 16:18:22,801 [myid:3] - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-01-04 16:18:22,886 [myid:3] - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running NODE3 (leader): -- no log for several days before this -- 2015-01-04 16:18:21,897 [myid:2] - WARN [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing connection to peer due to transaction timeout. 2015-01-04 16:18:21,898 [myid:2] - WARN [LearnerHandler-/204.53.107.249:43402:LearnerHandler@646] - ******* GOODBYE /204.53.107.249:43402 ******** 2015-01-04 16:18:21,905 [myid:2] - WARN [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing connection to peer due to transaction timeout. 2015-01-04 16:18:21,907 [myid:2] - WARN [LearnerHandler-/204.53.107.247:45953:LearnerHandler@646] - ******* GOODBYE /204.53.107.247:45953 ******** 2015-01-04 16:18:21,918 [myid:2] - WARN [LearnerHandler-/204.53.107.247:45953:LearnerHandler@658] - Ignoring unexpected exception java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:656) at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:649) 2015-01-04 16:18:23,003 [myid:2] - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-01-04 16:18:23,007 [myid:2] - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2015-01-04 16:18:23,115 [myid:2] - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running Thanks! Benjamin
