Hi Qian, Which version of ZooKeeper are you using? Would you please share the config files and leader logs too? Also looks like you're trying to connect with an older client: >>> Connection request from old client /10.249.255.10:42306; will be dropped if server is in r-o mode
Andor On Wed, May 22, 2019 at 2:52 AM Qian Zhang <[email protected]> wrote: > Anyone has any ideas? > > Regards, > Qian Zhang > > > On Sun, May 19, 2019 at 6:15 PM Qian Zhang <[email protected]> wrote: > > > Hi, > > > > I have a ZooKeeper cluster which has 5 nodes. Today the leader cannot be > > connected due to a hardware issue, and then I found the 4 followers just > > shutdown, here is the logs: > > > >> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN > >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when > >> following the leader > >> java.net.SocketTimeoutException: > >> Read timed out > >> at > >> java.net.SocketInputStream.socketRead0(Native Method) > >> at > >> java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > >> at > >> java.net.SocketInputStream.read(SocketInputStream.java:171) > >> at > >> java.net.SocketInputStream.read(SocketInputStream.java:141) > >> at > >> java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > >> at > >> java.io.BufferedInputStream.read(BufferedInputStream.java:265) > >> at > >> java.io.DataInputStream.readInt(DataInputStream.java:387) > >> at > >> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > >> at > >> > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) > >> at > >> > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) > >> at > >> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153) > >> at > >> > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) > >> at > >> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937) > >> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO > >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - > >> Accepted socket connectio > >> n from /10.249.255.10:42306 > >> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN > >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@896] - > >> Connection request from old cl > >> ient /10.249.255.10:42306; will be dropped if server is in r-o mode > >> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO > >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] - > Client > >> attempting to establish > >> new session at /10.249.255.10:42306 > >> May 18 15:34:28 MD001076 java[29148]: [myid:1] ERROR > >> [FollowerRequestProcessor:1:ZooKeeperCriticalThread@49] - Severe > >> unrecoverable error, from threa > >> d : FollowerRequestProcessor:1 > >> java.net.SocketException: Socket > >> closed > >> at > >> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118) > >> at > >> java.net.SocketOutputStream.write(SocketOutputStream.java:155) > >> at > >> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > >> at > >> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > >> at > >> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139) > >> at > >> org.apache.zookeeper.server.quorum.Learner.request(Learner.java:188) > >> at > >> > org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:90) > >> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO > >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown > called > >> java.lang.Exception: shutdown > >> Follower > >> at > >> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) > >> at > >> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941) > > > > > > I am confused why all followers shutdown in this case which makes the > > whole ZooKeeper unusable for a short period, shouldn't they elect a new > > leader instead? Thanks! > > > > > > Regards, > > Qian Zhang > > >
