Re: avoiding deadlocks on client handle close w/ python/c api
Sure, Ill take a look at it. Thanks mahadev On 5/4/10 2:32 PM, "Patrick Hunt" wrote: > Thanks Kapil, Mahadev perhaps you could take a look at this as well? > > Patrick > > On 05/04/2010 06:36 AM, Kapil Thangavelu wrote: >> I've constructed a simple example just using the zkpython library with >> condition variables, that will deadlock. I've filed a new ticket for it, >> >> https://issues.apache.org/jira/browse/ZOOKEEPER-763 >> >> the gdb stack traces look suspiciously like the ones in 591, but sans the >> watchers. >> https://issues.apache.org/jira/browse/ZOOKEEPER-591 >> >> the attached example on the ticket will deadlock in zk 3.3.0 (which has the >> fix for 591) and trunk. >> >> -kapil >> >> On Mon, May 3, 2010 at 9:48 PM, Kapil Thangaveluwrote: >> >>> Hi Folks, >>> >>> I'm constructing an async api on top of the zookeeper python bindings for >>> twisted. The intent was to make a thin wrapper that would wrap the existing >>> async api with one that allows for integration with the twisted python event >>> loop (http://www.twistedmatrix.com) primarily using the async apis. >>> >>> One issue i'm running into while developing a unit tests, deadlocks occur >>> if we attempt to close a handle while there are any outstanding async >>> requests (aget, acreate, etc). Normally on close both the io thread >>> terminates and the completion thread are terminated and joined, however >>> w\ith outstanding async requests, the completion thread won't be in a >>> joinable state, and we effectively hang when the main thread does the join. >>> >>> I'm curious if this would be considered bug, afaics ideal behavior would be >>> on close of a handle, to effectively clear out any remaining callbacks and >>> let the completion thread terminate. >>> >>> i've tried adding some bookkeeping to the api to guard against closing >>> while there is an outstanding completion request, but its an imperfect >>> solution do to the nature of the event loop integration. The problem is that >>> the python callback invoked by the completion thread in turn schedules a >>> function for the main thread. In twisted the api for this is implemented by >>> appending the function to a list attribute on the reactor and then writing a >>> byte to a pipe to wakeup the main thread. If a thread switch to the main >>> thread occurs before the completion thread callback returns, the scheduled >>> function runs and the rest of the application keeps processing, of which the >>> last step for the unit tests is to close the connection, which results in a >>> deadlock. >>> >>> i've included some of the client log and gdb stack traces from a deadlock'd >>> client process. >>> >>> thanks, >>> >>> Kapil >>> >>> >>> >>> >>
Re: avoiding deadlocks on client handle close w/ python/c api
Thanks Kapil, Mahadev perhaps you could take a look at this as well? Patrick On 05/04/2010 06:36 AM, Kapil Thangavelu wrote: I've constructed a simple example just using the zkpython library with condition variables, that will deadlock. I've filed a new ticket for it, https://issues.apache.org/jira/browse/ZOOKEEPER-763 the gdb stack traces look suspiciously like the ones in 591, but sans the watchers. https://issues.apache.org/jira/browse/ZOOKEEPER-591 the attached example on the ticket will deadlock in zk 3.3.0 (which has the fix for 591) and trunk. -kapil On Mon, May 3, 2010 at 9:48 PM, Kapil Thangaveluwrote: Hi Folks, I'm constructing an async api on top of the zookeeper python bindings for twisted. The intent was to make a thin wrapper that would wrap the existing async api with one that allows for integration with the twisted python event loop (http://www.twistedmatrix.com) primarily using the async apis. One issue i'm running into while developing a unit tests, deadlocks occur if we attempt to close a handle while there are any outstanding async requests (aget, acreate, etc). Normally on close both the io thread terminates and the completion thread are terminated and joined, however w\ith outstanding async requests, the completion thread won't be in a joinable state, and we effectively hang when the main thread does the join. I'm curious if this would be considered bug, afaics ideal behavior would be on close of a handle, to effectively clear out any remaining callbacks and let the completion thread terminate. i've tried adding some bookkeeping to the api to guard against closing while there is an outstanding completion request, but its an imperfect solution do to the nature of the event loop integration. The problem is that the python callback invoked by the completion thread in turn schedules a function for the main thread. In twisted the api for this is implemented by appending the function to a list attribute on the reactor and then writing a byte to a pipe to wakeup the main thread. If a thread switch to the main thread occurs before the completion thread callback returns, the scheduled function runs and the rest of the application keeps processing, of which the last step for the unit tests is to close the connection, which results in a deadlock. i've included some of the client log and gdb stack traces from a deadlock'd client process. thanks, Kapil
Re: avoiding deadlocks on client handle close w/ python/c api
I've constructed a simple example just using the zkpython library with condition variables, that will deadlock. I've filed a new ticket for it, https://issues.apache.org/jira/browse/ZOOKEEPER-763 the gdb stack traces look suspiciously like the ones in 591, but sans the watchers. https://issues.apache.org/jira/browse/ZOOKEEPER-591 the attached example on the ticket will deadlock in zk 3.3.0 (which has the fix for 591) and trunk. -kapil On Mon, May 3, 2010 at 9:48 PM, Kapil Thangavelu wrote: > Hi Folks, > > I'm constructing an async api on top of the zookeeper python bindings for > twisted. The intent was to make a thin wrapper that would wrap the existing > async api with one that allows for integration with the twisted python event > loop (http://www.twistedmatrix.com) primarily using the async apis. > > One issue i'm running into while developing a unit tests, deadlocks occur > if we attempt to close a handle while there are any outstanding async > requests (aget, acreate, etc). Normally on close both the io thread > terminates and the completion thread are terminated and joined, however > w\ith outstanding async requests, the completion thread won't be in a > joinable state, and we effectively hang when the main thread does the join. > > I'm curious if this would be considered bug, afaics ideal behavior would be > on close of a handle, to effectively clear out any remaining callbacks and > let the completion thread terminate. > > i've tried adding some bookkeeping to the api to guard against closing > while there is an outstanding completion request, but its an imperfect > solution do to the nature of the event loop integration. The problem is that > the python callback invoked by the completion thread in turn schedules a > function for the main thread. In twisted the api for this is implemented by > appending the function to a list attribute on the reactor and then writing a > byte to a pipe to wakeup the main thread. If a thread switch to the main > thread occurs before the completion thread callback returns, the scheduled > function runs and the rest of the application keeps processing, of which the > last step for the unit tests is to close the connection, which results in a > deadlock. > > i've included some of the client log and gdb stack traces from a deadlock'd > client process. > > thanks, > > Kapil > > > >
avoiding deadlocks on client handle close w/ python/c api
Hi Folks, I'm constructing an async api on top of the zookeeper python bindings for twisted. The intent was to make a thin wrapper that would wrap the existing async api with one that allows for integration with the twisted python event loop (http://www.twistedmatrix.com) primarily using the async apis. One issue i'm running into while developing a unit tests, deadlocks occur if we attempt to close a handle while there are any outstanding async requests (aget, acreate, etc). Normally on close both the io thread terminates and the completion thread are terminated and joined, however w\ith outstanding async requests, the completion thread won't be in a joinable state, and we effectively hang when the main thread does the join. I'm curious if this would be considered bug, afaics ideal behavior would be on close of a handle, to effectively clear out any remaining callbacks and let the completion thread terminate. i've tried adding some bookkeeping to the api to guard against closing while there is an outstanding completion request, but its an imperfect solution do to the nature of the event loop integration. The problem is that the python callback invoked by the completion thread in turn schedules a function for the main thread. In twisted the api for this is implemented by appending the function to a list attribute on the reactor and then writing a byte to a pipe to wakeup the main thread. If a thread switch to the main thread occurs before the completion thread callback returns, the scheduled function runs and the rest of the application keeps processing, of which the last step for the unit tests is to close the connection, which results in a deadlock. i've included some of the client log and gdb stack traces from a deadlock'd client process. thanks, Kapil This is all against zookeeper 3.3.0 Log Trace From a Deadlock access prefixed lines in the log are a wrapper around the zookeeper module which prints the name of any zookeeper module attributes that are accessed by the client. access set_debug_level access LOG_LEVEL_DEBUG access init 2010-05-01 12:55:50,861:15428(0xb77b86c0):zoo_i...@log_env@658: Client environment:zookeeper.version=zookeeper C client 3.3.0 2010-05-01 12:55:50,861:15428(0xb77b86c0):zoo_i...@log_env@662: Client environment:host.name=kapil-laptop 2010-05-01 12:55:50,861:15428(0xb77b86c0):zoo_i...@log_env@669: Client environment:os.name=Linux 2010-05-01 12:55:50,861:15428(0xb77b86c0):zoo_i...@log_env@670: Client environment:os.arch=2.6.32-21-generic 2010-05-01 12:55:50,861:15428(0xb77b86c0):zoo_i...@log_env@671: Client environment:os.version=#32-Ubuntu SMP Fri Apr 16 08:10:02 UTC 2010 2010-05-01 12:55:50,861:15428(0xb77b86c0):zoo_i...@log_env@679: Client environment:user.name=(null) 2010-05-01 12:55:50,861:15428(0xb77b86c0):zoo_i...@log_env@687: Client environment:user.home=/home/kapil 2010-05-01 12:55:50,861:15428(0xb77b86c0):zoo_i...@log_env@699: Client environment:user.dir=/home/kapil/projects/txzookeeper/trunk/_trial_temp 2010-05-01 12:55:50,861:15428(0xb77b86c0):zoo_i...@zookeeper_init@727: Initiating client connection, host=127.0.0.1:2181 sessionTimeout=3 watcher=0xfc2ee0 sessionId=0 sessionPasswd= context=0xa64c898 flags=0 2010-05-01 12:55:50,861:15428(0xb77b86c0):zoo_de...@start_threads@152: starting threads... 2010-05-01 12:55:50,861:15428(0xb7464b70):zoo_de...@do_io@279: started IO thread 2010-05-01 12:55:50,862:15428(0xb6c63b70):zoo_de...@do_completion@326: started completion thread 2010-05-01 12:55:50,862:15428(0xb7464b70):zoo_i...@check_events@1585: initiated connection to server [127.0.0.1:2181] 2010-05-01 12:55:50,913:15428(0xb7464b70):zoo_i...@check_events@1632: session establishment complete on server [127.0.0.1:2181], sessionId=0x12853ee006c0005, negotiated timeout=3 2010-05-01 12:55:50,913:15428(0xb7464b70):zoo_de...@check_events@1638: Calling a watcher for a ZOO_SESSION_EVENT and the state=ZOO_CONNECTED_STATE 2010-05-01 12:55:50,913:15428(0xb6c63b70):zoo_de...@process_completions@1765: Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT access CONNECTED_STATE access EPHEMERAL access acreate 2010-05-01 12:55:50,913:15428(0xb77b86c0):zoo_de...@zoo_acreate@2489: Sending request xid=0x4bdc24d7 for path [/foobar] to 127.0.0.1:2181 access OK 2010-05-01 12:55:50,922:15428(0xb7464b70):zoo_de...@zookeeper_process@1980: Queueing asynchronous response 2010-05-01 12:55:50,922:15428(0xb6c63b70):zoo_de...@process_completions@1817: Calling COMPLETION_STRING for xid=0x4bdc24d7 rc=0 access NODEEXISTS access aget 2010-05-01 12:55:50,922:15428(0xb77b86c0):zoo_de...@zoo_awget@2400: Sending request xid=0x4bdc24d8 for path [/foobar] to 127.0.0.1:2181 access OK 2010-05-01 12:55:50,923:15428(0xb7464b70):zoo_de...@zookeeper_process@1980: Queueing asynchronous response 2010-05-01 12:55:50,923:15428(0xb6c63b70):zoo_de...@process_completions@1772: Calling COMPLETION_DATA for xid=0x4bdc24d8 rc=0 access close 2010-05-01 12:55:50,924:15428(0xb746