Hi Andor, I am using ZooKeeper release 3.4.10.
I checked the code, if follower fails to read from leader (e.g., read timeout), it will close the socket, see https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/Follower.java#L91:L85 for details. And once the socket is close, it will make follower fails to write (I guess same socket is used here) which will be treated as an severe unrecoverable error, and then shutdown follower, see https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/FollowerRequestProcessor.java#L90:L95 and https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/ZooKeeperCriticalThread.java#L48:L51 . So it seems shutting down follower when it cannot read from leader is the design behavior? Or if my understanding is wrong can you please let me know the design behavior in this case? Thanks! Regards, Qian Zhang On Wed, May 22, 2019 at 8:52 AM Qian Zhang <[email protected]> wrote: > Anyone has any ideas? > > Regards, > Qian Zhang > > > On Sun, May 19, 2019 at 6:15 PM Qian Zhang <[email protected]> wrote: > >> Hi, >> >> I have a ZooKeeper cluster which has 5 nodes. Today the leader cannot be >> connected due to a hardware issue, and then I found the 4 followers just >> shutdown, here is the logs: >> >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN >>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when >>> following the leader >>> java.net.SocketTimeoutException: >>> Read timed out >>> at >>> java.net.SocketInputStream.socketRead0(Native Method) >>> at >>> java.net.SocketInputStream.socketRead(SocketInputStream.java:116) >>> at >>> java.net.SocketInputStream.read(SocketInputStream.java:171) >>> at >>> java.net.SocketInputStream.read(SocketInputStream.java:141) >>> at >>> java.io.BufferedInputStream.fill(BufferedInputStream.java:246) >>> at >>> java.io.BufferedInputStream.read(BufferedInputStream.java:265) >>> at >>> java.io.DataInputStream.readInt(DataInputStream.java:387) >>> at >>> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) >>> at >>> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) >>> at >>> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) >>> at >>> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153) >>> at >>> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) >>> at >>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937) >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO >>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - >>> Accepted socket connectio >>> n from /10.249.255.10:42306 >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN >>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@896] - >>> Connection request from old cl >>> ient /10.249.255.10:42306; will be dropped if server is in r-o mode >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO >>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] - >>> Client attempting to establish >>> new session at /10.249.255.10:42306 >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] ERROR >>> [FollowerRequestProcessor:1:ZooKeeperCriticalThread@49] - Severe >>> unrecoverable error, from threa >>> d : FollowerRequestProcessor:1 >>> java.net.SocketException: Socket >>> closed >>> at >>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118) >>> at >>> java.net.SocketOutputStream.write(SocketOutputStream.java:155) >>> at >>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) >>> at >>> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) >>> at >>> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139) >>> at >>> org.apache.zookeeper.server.quorum.Learner.request(Learner.java:188) >>> at >>> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:90) >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO >>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called >>> java.lang.Exception: shutdown >>> Follower >>> at >>> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) >>> at >>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941) >> >> >> I am confused why all followers shutdown in this case which makes the >> whole ZooKeeper unusable for a short period, shouldn't they elect a new >> leader instead? Thanks! >> >> >> Regards, >> Qian Zhang >> >
