Hello, My colleagues and I are working with the java lock recipe implementation.
We think we found two bugs in the code: 1) The first one is reported on this jira topic<https://issues.apache.org/jira/browse/ZOOKEEPER-645>. The issue is that the znodes used to control the lock are ordered by sessionID first and then by the sequence number. As earlier connected clients appear to have lower sessionID values than those connected latter, who connects first gets the lock disregarding anyone who has already the lock. We've posted a patch on that jira but it has not yet been reviewed. 2) The other bug is when you try to "unlock". When calling unlock(), either having the lock held or waiting for it, the znode lock is removed. However, if you are not holding the lock, there's still a zookeeper watcher waiting on the next znode with lower sequence number, which is necessary to avoid the heard effect on the recipe implementation. When watcher tells the lock implementation that the watched znode has been removed, the lock recipe calls lock(). What happens then is that a new znode lock is created so the client is again (unwilling) waiting for the lock (or eventually is now holding the lock). If that client doesn't do anything (such as unlocking it over and over until eventually getting the lock and doing an ultimate unlock) there would be a deadlock due to the client having the lock without knowing it. We've come up with a patch for this (it's attached to this email). Our question is: should we post this patch on the same jira topic mentioned on the beginning of this email or should we open a new topic for this issue? Thanks, Andre Esteve http://www.lsd.ic.unicamp.br/mc715-1s2011/index.php/Main_Page (wiki in Portuguese about our (and other's) works using zookeeper)
Index: WriteLock.java =================================================================== --- WriteLock.java (revision 1102068) +++ WriteLock.java (working copy) @@ -152,7 +152,10 @@ LOG.debug("Watcher fired on path: " + event.getPath() + " state: " + event.getState() + " type " + event.getType()); try { - lock(); + // avoid locking when not waiting for it + if (id != null) { + lock(); + } } catch (Exception e) { LOG.warn("Failed to acquire lock: " + e, e); }