I am going to try to provide as much information as possible but it might
be a bit sparse because I am still actively trying to get a grip on what
exactly I'm seeing with the c client.

Zookeeper client version: 3.4.5
Zookeeper server version: 3.4.10
5 node zookeeper cluster

The workflow I have is essentially a long lived process establishes an
ephemeral node with some data that is read by some number of other
processes located on separate machines, standard cluster coordination
stuff. The issue I am seeing is after about 7-9 hours of runtime, zookeeper
will expire the client session because it has reached the 30 second
timeout. On the zookeeper client side, I've confirmed there are no calls to
the supplied watcher functions or context supplied to zookeeper_init. The
long lived process is doing other things during its runtime but the
interaction with zookeeper is only via callback events and a pipe after
establishing the ephemeral node at the beginning.

One other datapoint is that I created an event loop that uses the same
client that established the ephemeral node to get the data from the
ephemeral node every 60 seconds and log it. While this event loop is
running I do not observe the client session expiring at all even after 14
hours or runtime.

I am not sure how to explain the client disconnecting without any message
to either the callback function or the context. I also am not sure how to
explain this behavior happening after many hours of running without issue.

If anyone has seen something similar, how did you go about fixing it. Also
if there are any ideas on how to debug this issue that would be very
helpful.

Thanks!
Andrew Jorgensen
@ajorgensen

Reply via email to