This sounds suspiciously like a misuse of a watcher somewhere. Are you doing other things with ZooKeeper? ZooKeeper watches are single threaded. If you have a watcher that blocks, etc. it will prevent other ZooKeeper recipes from functioning. See this Tech Note: https://cwiki.apache.org/confluence/display/CURATOR/TN1
Other than that, I haven't heard of this problem. If you can provide a test case, that would help. -Jordan On Oct 17, 2013, at 7:47 AM, Chris Jeris <[email protected]> wrote: > We have run into a knotty problem with InterProcessMutex, where calls to > .acquire will expend their full timeout and then _succeed_ in acquiring the > lock. This generally happens when a different server process was the last > one to hold the lock, but it does not happen every time that is the case. > > The lock is a single InterProcessMutex object per server process (= server > machine), all on a single persistent Zookeeper node name (the object being > access controlled is a single piece of shared state, whose data is not itself > stored in ZK). The problem arises in the context of a test suite where > requests to our server cluster are issued serially, so there is basically no > competing traffic on the lock, although there is traffic on the ZK cluster > from other applications. The frequency of acquires on this lock is not > excessive (order 1 per second), and we are reasonably certain our client code > is not holding the lock longer than it should. > > The problem does not seem to be sensitive to the exact value of the timeout. > If we set it to 15 seconds, we see lock acquires taking 15 seconds and then > succeeding; if we set it to 60 seconds, we see them taking 60 seconds and > then succeeding. > > Right now we observe the problem with Curator 2.1.0 against both ZK 3.3.6 and > 3.4.5. > > Is this a known or familiar issue? Does it sound like we're doing something > wrong? > > thanks, Chris Jeris > -- > Chris Jeris > [email protected] > freenode/twitter/github: ystael
