Re: Follower never recovers and keeps saying ZooKeeperServer not running

2018-07-03 Thread Steph van Schalkwyk
I've seen this bur can't remember what caused it. What's your zNode size? steph On Tue, Jul 3, 2018, 9:17 PM Benjamin Jaton wrote: > Hello, > > I'm wondering what can cause a ZK follower to check out like this: > > 2018-07-03T13:43:28,814 [myid:] - ERROR [LearnerHandler-/10.0.0.248:40282 > :Lear

Follower never recovers and keeps saying ZooKeeperServer not running

2018-07-03 Thread Benjamin Jaton
Hello, I'm wondering what can cause a ZK follower to check out like this: 2018-07-03T13:43:28,814 [myid:] - ERROR [LearnerHandler-/10.0.0.248:40282 :LearnerHandler@620] - Unexpected exception causing shutdown while sock still open java.io.EOFException: null at java.io.DataInputStream.read

Re: Observer went down with Read timed out exception

2018-07-03 Thread rammohan ganapavarapu
Andor, Zk version that i use is zk_version 3.4.5-1392090, built on 09/30/2012 17:52 GMT No Auth or encryption config None my of network graphs showing any dip or unusual pattern thats why i am thinking there may not be any n/w issue. I have those nodes in cloud so checking with them to see if any

Re: Observer went down with Read timed out exception

2018-07-03 Thread Andor Molnar
Hi Rammohan, Would you please elaborate on the details of your cluster setup? Which ZooKeeper version do you use? Do you use authentication / encryption? Would you please attach config files and log files of other nodes like leader and followers? How did you make sure that there was no network pr

Re: Observer went down with Read timed out exception

2018-07-03 Thread rammohan ganapavarapu
Yes I am sure there is no network issues, if leader is busy in GC followers on the same DC would have been shutdown as we right but it wasn't the case. On Tue, Jul 3, 2018, 1:56 AM Norbert Kalmar wrote: > Hi Ram, > > Are you sure there were no network error? For me, this looks like it could > be

Re: Observer went down with Read timed out exception

2018-07-03 Thread Norbert Kalmar
Hi Ram, Are you sure there were no network error? For me, this looks like it could be due to failed heartbeats (as shutdown was called after the timeout). It is also possible the leader was busy (maybe garbage collection caused pause?) - especially if you store big(ish) chunks of data in ZooKeepe