Hi I am trying to get a better understanding of Zookeeper and how it should be used. Let’s talk about the lock recipe (http://zookeeper.apache.org/doc/r3.4.6/recipes.html#sc_recipes_Locks).
- X aquires the lock - X does some long running work (longer than the session timeout) - X gets partioned away from the quorum while it was doing some work - after some time (determined by the timeout passed to ZK) Y will aquire the lock In that situation both X and Y are holding the lock (unless X is acting properly). If I understand the documentation correctly (http://zookeeper.apache.org/doc/r3.4.6/zookeeperProgrammers.html#ch_zkSessions), X would receive a disconnected event in that situation (but not an expired event unless it successfully reconnects). So, X should stop doing the work it is doing until it gets reconnected. How much time does X have to stop the work it is doing? i.e. how long does it take from disconnected event sent to X to expiration of the ephemeral node used for the lock? Having two clients inside a critical section protected by a lock would not be a good idea. Regards, Simon
