Good catch. I don't think that we can do anything automatically to resolve this. However there is a jira pending which would allow you to at least remove the watch when this does occur: https://issues.apache.org/jira/browse/ZOOKEEPER-442
Patrick On Fri, Dec 2, 2011 at 1:36 PM, Robert Crocombe <[email protected]> wrote: > Suppose you set an exists() watch on a node, e.g. in Groovy: > > def latch = new CountDownLatch(1) > def stat = zooKeeper.exists("$lockParentNode/$toWatch", [process: { event > -> latch.countDown(); log.debug("fired latch on event $event") }, toString: > {""}] as Watcher) > if (stat != null) { > // Okay, we've set watch: wait for an event and try again > log.debug("Set watch on less than me '$toWatch': blocking until an > event occurs which may let us acquire") > latch.await() > } else { > // Dang! Person immediately less than us is gone, try again > // This is moderately weird unless they were the only ones > // less than us and so might have owned the lock and just > // released it > log.debug("Node '$toWatch' gone when setting watch: trying again to > acquire") > } > > Suppose that exists() does return null. It appears to be the case that the > watch is still registered (both from the evidence below plus a cursory > examination of the ZooKeeper.java client code). In my > case "$lockParentNode/$toWatch" is ultimately a sequential ephemeral node > that will never ever occur again (part of yet another implementation of a > ZooKeeper lock). Thus, I believe this watch will remain until the session > that created it is removed, which for us could be months. Basically we're > leaking a Closure and associated CountDownLatch for each time the node to > be watched is deleted in the interval between when we initially look for it > and when exists() returns null. I only noticed it when playing with "wchc" > as part of trying to understand a lost watch. > > 0x233a3c1db310006 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004876 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004234 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004684 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004588 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000003118 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000003772 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005206 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000001876 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004924 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000002020 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005170 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000006526 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000002260 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000002920 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004414 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005848 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005278 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005752 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005380 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004360 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004624 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000002728 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000001846 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004264 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000006142 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004660 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005956 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004810 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000002428 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000003274 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000003370 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000002398 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000003712 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000003652 > /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005314 > > Does this seem like a correct understanding to those with a deeper > understanding of ZooKeeper internals, and does it seem like a problem worth > rectifying? > > -- > Robert Crocombe
