I am going to try to provide as much information as possible but it might be a bit sparse because I am still actively trying to get a grip on what exactly I'm seeing with the c client.
Zookeeper client version: 3.4.5 Zookeeper server version: 3.4.10 5 node zookeeper cluster The workflow I have is essentially a long lived process establishes an ephemeral node with some data that is read by some number of other processes located on separate machines, standard cluster coordination stuff. The issue I am seeing is after about 7-9 hours of runtime, zookeeper will expire the client session because it has reached the 30 second timeout. On the zookeeper client side, I've confirmed there are no calls to the supplied watcher functions or context supplied to zookeeper_init. The long lived process is doing other things during its runtime but the interaction with zookeeper is only via callback events and a pipe after establishing the ephemeral node at the beginning. One other datapoint is that I created an event loop that uses the same client that established the ephemeral node to get the data from the ephemeral node every 60 seconds and log it. While this event loop is running I do not observe the client session expiring at all even after 14 hours or runtime. I am not sure how to explain the client disconnecting without any message to either the callback function or the context. I also am not sure how to explain this behavior happening after many hours of running without issue. If anyone has seen something similar, how did you go about fixing it. Also if there are any ideas on how to debug this issue that would be very helpful. Thanks! Andrew Jorgensen @ajorgensen
