I see, thank you Patrick!
Regards, Qian Zhang On Thu, May 23, 2019 at 9:26 AM Qian Zhang <[email protected]> wrote: > Hi Andor, > > I am using ZooKeeper release 3.4.10. > > I checked the code, if follower fails to read from leader (e.g., read > timeout), it will close the socket, see > https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/Follower.java#L91:L85 > for > details. And once the socket is close, it will make follower fails to write > (I guess same socket is used here) which will be treated as an severe > unrecoverable error, and then shutdown follower, see > https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/FollowerRequestProcessor.java#L90:L95 > and > https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/ZooKeeperCriticalThread.java#L48:L51 > . > > So it seems shutting down follower when it cannot read from leader is the > design behavior? Or if my understanding is wrong can you please let me know > the design behavior in this case? Thanks! > > > Regards, > Qian Zhang > > > On Wed, May 22, 2019 at 8:52 AM Qian Zhang <[email protected]> wrote: > >> Anyone has any ideas? >> >> Regards, >> Qian Zhang >> >> >> On Sun, May 19, 2019 at 6:15 PM Qian Zhang <[email protected]> wrote: >> >>> Hi, >>> >>> I have a ZooKeeper cluster which has 5 nodes. Today the leader cannot be >>> connected due to a hardware issue, and then I found the 4 followers just >>> shutdown, here is the logs: >>> >>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN >>>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when >>>> following the leader >>>> java.net.SocketTimeoutException: >>>> Read timed out >>>> at >>>> java.net.SocketInputStream.socketRead0(Native Method) >>>> at >>>> java.net.SocketInputStream.socketRead(SocketInputStream.java:116) >>>> at >>>> java.net.SocketInputStream.read(SocketInputStream.java:171) >>>> at >>>> java.net.SocketInputStream.read(SocketInputStream.java:141) >>>> at >>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:246) >>>> at >>>> java.io.BufferedInputStream.read(BufferedInputStream.java:265) >>>> at >>>> java.io.DataInputStream.readInt(DataInputStream.java:387) >>>> at >>>> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) >>>> at >>>> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) >>>> at >>>> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) >>>> at >>>> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153) >>>> at >>>> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) >>>> at >>>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937) >>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO >>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - >>>> Accepted socket connectio >>>> n from /10.249.255.10:42306 >>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN >>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@896] - >>>> Connection request from old cl >>>> ient /10.249.255.10:42306; will be dropped if server is in r-o mode >>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO >>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] - >>>> Client attempting to establish >>>> new session at /10.249.255.10:42306 >>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] ERROR >>>> [FollowerRequestProcessor:1:ZooKeeperCriticalThread@49] - Severe >>>> unrecoverable error, from threa >>>> d : FollowerRequestProcessor:1 >>>> java.net.SocketException: Socket >>>> closed >>>> at >>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118) >>>> at >>>> java.net.SocketOutputStream.write(SocketOutputStream.java:155) >>>> at >>>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) >>>> at >>>> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) >>>> at >>>> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139) >>>> at >>>> org.apache.zookeeper.server.quorum.Learner.request(Learner.java:188) >>>> at >>>> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:90) >>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO >>>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown >>>> called >>>> java.lang.Exception: shutdown >>>> Follower >>>> at >>>> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) >>>> at >>>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941) >>> >>> >>> I am confused why all followers shutdown in this case which makes the >>> whole ZooKeeper unusable for a short period, shouldn't they elect a new >>> leader instead? Thanks! >>> >>> >>> Regards, >>> Qian Zhang >>> >>
