Hi, I have a ZooKeeper cluster which has 5 nodes. Today the leader cannot be connected due to a hardware issue, and then I found the 4 followers just shutdown, here is the logs:
> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN > [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when > following the leader > java.net.SocketTimeoutException: > Read timed out > at > java.net.SocketInputStream.socketRead0(Native Method) > at > java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at > java.net.SocketInputStream.read(SocketInputStream.java:171) > at > java.net.SocketInputStream.read(SocketInputStream.java:141) > at > java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at > java.io.BufferedInputStream.read(BufferedInputStream.java:265) > at > java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) > at > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) > at > org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153) > at > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937) > May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connectio > n from /10.249.255.10:42306 > May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@896] - Connection request from old cl > ient /10.249.255.10:42306; will be dropped if server is in r-o mode > May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] - Client attempting to establish > new session at /10.249.255.10:42306 > May 18 15:34:28 MD001076 java[29148]: [myid:1] ERROR > [FollowerRequestProcessor:1:ZooKeeperCriticalThread@49] - Severe > unrecoverable error, from threa > d : FollowerRequestProcessor:1 > java.net.SocketException: Socket > closed > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118) > at > java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > at > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > at > org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139) > at > org.apache.zookeeper.server.quorum.Learner.request(Learner.java:188) > at > org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:90) > May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO > [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called > java.lang.Exception: shutdown > Follower > at > org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941) I am confused why all followers shutdown in this case which makes the whole ZooKeeper unusable for a short period, shouldn't they elect a new leader instead? Thanks! Regards, Qian Zhang
