zk-shell has a command to lookup the session-id + ip:port that owns an ephemeral znode:
https://github.com/rgs1/zk_shell/blob/master/zk_shell/shell.py#L2734 Example: $ pip3 install zk-shell $ zk-shell your-server:2181 (CONNECTED [your-server:2181]) /test> create foo bar ephemeral=true (CONNECTED [your-server:2181]) /test> ephemeral_endpoint foo your-server 0x1504f372ab5cd077 10.1.1.14:54376 your-server:2181 Note: you need to replace your-server with the full list of servers (e.g.: server1:2181,server2:2181,...) in the last argument to the ephemeral_endpoint command. -rgs On Fri, Oct 30, 2020 at 12:30 PM Paul Summermatter <[email protected]> wrote: > RE: ZooKeeper 3.4.6 > > All, > > I'm trying to troubleshoot a problem and could use some guidance > from the experts on ZK administration. I have a cluster of applications > that share work and that create ephemeral nodes representing the work in ZK > expressly so that, if one application fails, the ephemeral nodes should be > deleted, and the other apps should be able to pick up the work that is now > not being completed by the failed instance. > > Yesterday evening, one application instance suffered from some > severe memory pressure and had to run multiple stop the world GC cycles. > The pauses appear to have triggered a SessionExpiredException in > org.apache.zookeeper.ClientCnxn$SendThread.run (I correlated multiple > "Pause Full" statements in the GC logs with the ZK session timeout in the > application logs). After the timeout, the connection was re-established in > under 1,000ms, but the ephemeral nodes remained in ZooKeeper, leaving them > as orphans. We've seen this behavior before and have had to delete the > nodes manually using the zkCli.sh utility. > > In an attempt to troubleshoot this issue, I'm trying to correlate > the ephemeral owner that is listed on a node when you run the 'get' command > with the ID of an active session. Basically, I'm trying to understand > whether ZK thinks there is still an active session associated with the > ephemeral node in the hopes that that might lead to an explanation for why > the ZK server didn't seem to recognize the session timeout sensed on the > client that triggered a new connection and would explain why the ephemeral > nodes were not deleted as they should have been when the connection dropped. > > I've tried the various four letter commands on the server to see > if any of them output anything that looks like the ephemeral owner ID > without any success. Any suggestions/guidance would be greatly appreciated. > Note, right now, upgrading is not an option, but I'm certainly open to that > if there are known issues with ephemeral nodes in 3.4 that are addressed in > newer versions. > > Regards, > Paul
