I've constructed a simple example just using the zkpython library with condition variables, that will deadlock. I've filed a new ticket for it,
https://issues.apache.org/jira/browse/ZOOKEEPER-763 the gdb stack traces look suspiciously like the ones in 591, but sans the watchers. https://issues.apache.org/jira/browse/ZOOKEEPER-591 the attached example on the ticket will deadlock in zk 3.3.0 (which has the fix for 591) and trunk. -kapil On Mon, May 3, 2010 at 9:48 PM, Kapil Thangavelu <kapil.f...@gmail.com>wrote: > Hi Folks, > > I'm constructing an async api on top of the zookeeper python bindings for > twisted. The intent was to make a thin wrapper that would wrap the existing > async api with one that allows for integration with the twisted python event > loop (http://www.twistedmatrix.com) primarily using the async apis. > > One issue i'm running into while developing a unit tests, deadlocks occur > if we attempt to close a handle while there are any outstanding async > requests (aget, acreate, etc). Normally on close both the io thread > terminates and the completion thread are terminated and joined, however > w\ith outstanding async requests, the completion thread won't be in a > joinable state, and we effectively hang when the main thread does the join. > > I'm curious if this would be considered bug, afaics ideal behavior would be > on close of a handle, to effectively clear out any remaining callbacks and > let the completion thread terminate. > > i've tried adding some bookkeeping to the api to guard against closing > while there is an outstanding completion request, but its an imperfect > solution do to the nature of the event loop integration. The problem is that > the python callback invoked by the completion thread in turn schedules a > function for the main thread. In twisted the api for this is implemented by > appending the function to a list attribute on the reactor and then writing a > byte to a pipe to wakeup the main thread. If a thread switch to the main > thread occurs before the completion thread callback returns, the scheduled > function runs and the rest of the application keeps processing, of which the > last step for the unit tests is to close the connection, which results in a > deadlock. > > i've included some of the client log and gdb stack traces from a deadlock'd > client process. > > thanks, > > Kapil > > > >