Andor, Zk version that i use is zk_version 3.4.5-1392090, built on 09/30/2012 17:52 GMT No Auth or encryption config None my of network graphs showing any dip or unusual pattern thats why i am thinking there may not be any n/w issue. I have those nodes in cloud so checking with them to see if any n/w issue between regions.
Thanks, Ram On Tue, Jul 3, 2018 at 6:29 AM Andor Molnar <an...@cloudera.com.invalid> wrote: > Hi Rammohan, > > Would you please elaborate on the details of your cluster setup? > Which ZooKeeper version do you use? > Do you use authentication / encryption? > Would you please attach config files and log files of other nodes like > leader and followers? > > How did you make sure that there was no network problem at the time when > issue happened? > Would you please attach graphs / diagrams on the network traffic including > latency and bandwidth usage between the affected data centers? > > Regards, > Andor > > > > > On Tue, Jul 3, 2018 at 2:56 PM, rammohan ganapavarapu < > rammohanga...@gmail.com> wrote: > > > Yes I am sure there is no network issues, if leader is busy in GC > followers > > on the same DC would have been shutdown as we right but it wasn't the > case. > > > > On Tue, Jul 3, 2018, 1:56 AM Norbert Kalmar <nkal...@cloudera.com.invalid > > > > wrote: > > > > > Hi Ram, > > > > > > Are you sure there were no network error? For me, this looks like it > > could > > > be due to failed heartbeats (as shutdown was called after the timeout). > > > > > > It is also possible the leader was busy (maybe garbage collection > caused > > > pause?) - especially if you store big(ish) chunks of data in ZooKeeper. > > > (There is plan to integrate JVMPauseMonitor to ZooKeeper for this > reason > > > actually). > > > > > > Regards, > > > Norbert > > > > > > On Mon, Jul 2, 2018 at 9:13 PM rammohan ganapavarapu < > > > rammohanga...@gmail.com> wrote: > > > > > > > All, > > > > > > > > I have multi data-center ldap cluster setup with other data-center > with > > > all > > > > observers all of sudden all the observer threads went down with the > > > > following message, any idea why they went down? We don't see any > > network > > > > related issues between data-centers. > > > > > > > > > > > > 2018-06-29 05:32:59,036 [myid:222] - WARN > > > > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@79] - Exception > > when > > > > observing the leader > > > > java.net.SocketTimeoutException: Read timed out > > > > at java.net.SocketInputStream.socketRead0(Native Method) > > > > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > > > > at java.net.SocketInputStream.read(SocketInputStream.java:170) > > > > at java.net.SocketInputStream.read(SocketInputStream.java:141) > > > > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > > > > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > > > > at java.io.DataInputStream.readInt(DataInputStream.java:387) > > > > at org.apache.jute.BinaryInputArchive.readInt( > > BinaryInputArchive.java:63) > > > > at > > > > > > > > > > > org.apache.zookeeper.server.quorum.QuorumPacket. > > deserialize(QuorumPacket.java:83) > > > > at > > > > > > > org.apache.jute.BinaryInputArchive.readRecord( > > BinaryInputArchive.java:108) > > > > at > > > org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) > > > > at > > > > > > > org.apache.zookeeper.server.quorum.Observer.observeLeader( > > Observer.java:75) > > > > at org.apache.zookeeper.server.quorum.QuorumPeer.run( > > QuorumPeer.java:727) > > > > 2018-06-29 05:32:59,244 [myid:222] - INFO > > > > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@137] - shutdown > > > called > > > > java.lang.Exception: shutdown Observer > > > > at > > > org.apache.zookeeper.server.quorum.Observer.shutdown(Observer.java:137) > > > > at org.apache.zookeeper.server.quorum.QuorumPeer.run( > > QuorumPeer.java:731) > > > > > > > > > > > > Thanks, > > > > Ram > > > > > > > > > >