Hi Benjamin,

The reason why Node2 and Node 3 stop running is that ZooKeeper must have a 
quorum of servers to make progress. Zookeeper needs at least 3 servers in order 
to run. In your scenario, you started with three servers which is fine, but 
since one of the server fails, the zookeeper stop running because it lacks of 
the quorum (majority). 

Ibrahim

-----Original Message-----
From: Benjamin Jaton [mailto:[email protected]] 
Sent: Wednesday, January 07, 2015 10:34 م
To: [email protected]
Subject: Failover when one node fails to write on the disk?

Using zookeeper 3.4.5 I came across a situation where all the 3 Zookeeper 
suddenly stop.

What I see is that NODE1 fails to write on the disk. so it makes sense to me 
that NODE1 stops.

But it is unclear why NODE2 and NODE3 would stop running as well, I have a hard 
time making sense of the log messages.

Any insight would be greatly appreciated!

see log extracts below:

NODE1:

-- no log for several days before this --
2015-01-04 16:18:22,259 [myid:1] - WARN  [SyncThread:1:FileTxnLog@321] - 
fsync-ing the write ahead log in SyncThread:1 took 11024ms which will adversely 
effect operation latency. See the ZooKeeper troubleshooting guide
2015-01-04 16:18:22,380 [myid:1] - WARN
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when 
following the leader java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
        at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
        at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
        at
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
        at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
        at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
2015-01-04 16:18:23,384 [myid:1] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 
0x0 due to java.io.IOException: ZooKeeperServer not running
2015-01-04 16:18:23,492 [myid:1] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 
0x0 due to java.io.IOException: ZooKeeperServer not running
2015-01-04 16:18:24,060 [myid:1] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 
0x0 due to java.io.IOException: ZooKeeperServer not running


NODE2:

-- no log for several days before this --
2015-01-04 16:18:21,899 [myid:3] - WARN
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when 
following the leader java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
        at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
        at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
        at
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
        at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
        at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
2015-01-04 16:18:22,760 [myid:3] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 
0x0 due to java.io.IOException: ZooKeeperServer not running
2015-01-04 16:18:22,801 [myid:3] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 
0x0 due to java.io.IOException: ZooKeeperServer not running
2015-01-04 16:18:22,886 [myid:3] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 
0x0 due to java.io.IOException: ZooKeeperServer not running


NODE3 (leader):

-- no log for several days before this --
2015-01-04 16:18:21,897 [myid:2] - WARN
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing 
connection to peer due to transaction timeout.
2015-01-04 16:18:21,898 [myid:2] - WARN
[LearnerHandler-/204.53.107.249:43402:LearnerHandler@646] - ******* GOODBYE
/204.53.107.249:43402 ********
2015-01-04 16:18:21,905 [myid:2] - WARN
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing 
connection to peer due to transaction timeout.
2015-01-04 16:18:21,907 [myid:2] - WARN
[LearnerHandler-/204.53.107.247:45953:LearnerHandler@646] - ******* GOODBYE
/204.53.107.247:45953 ********
2015-01-04 16:18:21,918 [myid:2] - WARN
[LearnerHandler-/204.53.107.247:45953:LearnerHandler@658] - Ignoring unexpected 
exception java.lang.InterruptedException
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
        at
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
        at
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
        at
org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:656)
        at
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:649)
2015-01-04 16:18:23,003 [myid:2] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 
0x0 due to java.io.IOException: ZooKeeperServer not running
2015-01-04 16:18:23,007 [myid:2] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 
0x0 due to java.io.IOException: ZooKeeperServer not running
2015-01-04 16:18:23,115 [myid:2] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 
0x0 due to java.io.IOException: ZooKeeperServer not running


Thanks!
Benjamin

Reply via email to