[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864488#action_12864488
 ] 

Henry Robinson commented on ZOOKEEPER-763:
------------------------------------------

Kapil - 

Thanks! Adding that sleep helped me understand what was going on. 

pyzoo_close has the GIL but blocks inside zookeeper_close, waiting for the 
completion thread to finish. However, if a completion is still inside Python, 
but has been pre-empted by the main thread which calls pyzoo_close, the 
completion can't get the GIL back to finish up executing, blocking the 
completions_thread for ever more. The fix is simple - relinquish the GIL during 
the zookeeper_close call, and then reacquire it straight after. There are even 
handy macros to do this:

Py_BEGIN_ALLOW_THREADS
ret = zookeeper_close(zhandles[zkhid]);
Py_END_ALLOW_THREADS

This same issue will affect any part of zkpython where a call to the C client 
is blocked on some work being completed in another Python thread - in practice, 
I think this means from callbacks. I'll audit the code to see if any other API 
calls are affected. Patch to fix this issue is following shortly - Kapil, I'd 
be very grateful if you could help us by testing it. 

cheers,
Henry

> Deadlock on close w/ zkpython / c client
> ----------------------------------------
>
>                 Key: ZOOKEEPER-763
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: c client, contrib-bindings
>    Affects Versions: 3.3.0
>         Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
>            Reporter: Kapil Thangavelu
>            Assignee: Mahadev konar
>             Fix For: 3.4.0
>
>         Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt
>
>
> deadlocks occur if we attempt to close a handle while there are any 
> outstanding async requests (aget, acreate, etc). Normally on close both the 
> io thread terminates and the completion thread are terminated and joined, 
> however w\ith outstanding async requests, the completion thread won't be in a 
> joinable state, and we effectively hang when the main thread does the join.
> afaics ideal behavior would be on close of a handle, to effectively clear out 
> any remaining callbacks and let the completion thread terminate.
> i've tried adding some bookkeeping to within a python client to guard against 
> closing while there is an outstanding async completion request, but its an 
> imperfect solution since even after the python callback is executed there is 
> still a window for deadlock before the completion thread finishes the 
> callback.
> a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to