[jira] Commented: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864871#action_12864871 ] Patrick Hunt commented on ZOOKEEPER-763: For some reason I got confused on the 3.3 branch (may not have been up to date), the main patch applies to both just fine. Fixed this in svn. Deadlock on close w/ zkpython / c client Key: ZOOKEEPER-763 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.0 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk Reporter: Kapil Thangavelu Assignee: Henry Robinson Fix For: 3.3.1, 3.4.0 Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt, ZOOKEEPER-763.patch, ZOOKEEPER-763.patch deadlocks occur if we attempt to close a handle while there are any outstanding async requests (aget, acreate, etc). Normally on close both the io thread terminates and the completion thread are terminated and joined, however w\ith outstanding async requests, the completion thread won't be in a joinable state, and we effectively hang when the main thread does the join. afaics ideal behavior would be on close of a handle, to effectively clear out any remaining callbacks and let the completion thread terminate. i've tried adding some bookkeeping to within a python client to guard against closing while there is an outstanding async completion request, but its an imperfect solution since even after the python callback is executed there is still a window for deadlock before the completion thread finishes the callback. a simple example to reproduce the deadlock is attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864429#action_12864429 ] Henry Robinson commented on ZOOKEEPER-763: -- Hi Kapil - As seems to be the norm for me this week, I'm struggling to reproduce :) It does seem like your python script explicitly waits for a completion to be called before closing a handle. Is this enough to leave an outstanding completion on the queue? Can you capture the stacktrace for the completion thread? I think it must be getting stuck in process_completions but it would be very valuable to know where - if it's stuck on the callback into zkpython then that means the deadlock is in the python bindings and not solely in C-land. cheers, Henry Deadlock on close w/ zkpython / c client Key: ZOOKEEPER-763 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763 Project: Zookeeper Issue Type: Bug Components: c client, contrib-bindings Affects Versions: 3.3.0 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk Reporter: Kapil Thangavelu Assignee: Mahadev konar Fix For: 3.4.0 Attachments: deadlock.py, stack-trace-deadlock.txt deadlocks occur if we attempt to close a handle while there are any outstanding async requests (aget, acreate, etc). Normally on close both the io thread terminates and the completion thread are terminated and joined, however w\ith outstanding async requests, the completion thread won't be in a joinable state, and we effectively hang when the main thread does the join. afaics ideal behavior would be on close of a handle, to effectively clear out any remaining callbacks and let the completion thread terminate. i've tried adding some bookkeeping to within a python client to guard against closing while there is an outstanding async completion request, but its an imperfect solution since even after the python callback is executed there is still a window for deadlock before the completion thread finishes the callback. a simple example to reproduce the deadlock is attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864488#action_12864488 ] Henry Robinson commented on ZOOKEEPER-763: -- Kapil - Thanks! Adding that sleep helped me understand what was going on. pyzoo_close has the GIL but blocks inside zookeeper_close, waiting for the completion thread to finish. However, if a completion is still inside Python, but has been pre-empted by the main thread which calls pyzoo_close, the completion can't get the GIL back to finish up executing, blocking the completions_thread for ever more. The fix is simple - relinquish the GIL during the zookeeper_close call, and then reacquire it straight after. There are even handy macros to do this: Py_BEGIN_ALLOW_THREADS ret = zookeeper_close(zhandles[zkhid]); Py_END_ALLOW_THREADS This same issue will affect any part of zkpython where a call to the C client is blocked on some work being completed in another Python thread - in practice, I think this means from callbacks. I'll audit the code to see if any other API calls are affected. Patch to fix this issue is following shortly - Kapil, I'd be very grateful if you could help us by testing it. cheers, Henry Deadlock on close w/ zkpython / c client Key: ZOOKEEPER-763 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763 Project: Zookeeper Issue Type: Bug Components: c client, contrib-bindings Affects Versions: 3.3.0 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk Reporter: Kapil Thangavelu Assignee: Mahadev konar Fix For: 3.4.0 Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt deadlocks occur if we attempt to close a handle while there are any outstanding async requests (aget, acreate, etc). Normally on close both the io thread terminates and the completion thread are terminated and joined, however w\ith outstanding async requests, the completion thread won't be in a joinable state, and we effectively hang when the main thread does the join. afaics ideal behavior would be on close of a handle, to effectively clear out any remaining callbacks and let the completion thread terminate. i've tried adding some bookkeeping to within a python client to guard against closing while there is an outstanding async completion request, but its an imperfect solution since even after the python callback is executed there is still a window for deadlock before the completion thread finishes the callback. a simple example to reproduce the deadlock is attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864523#action_12864523 ] Kapil Thangavelu commented on ZOOKEEPER-763: works for me on a couple of different zkpython apps (ubuntu lucid) , thanks! Deadlock on close w/ zkpython / c client Key: ZOOKEEPER-763 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.0 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk Reporter: Kapil Thangavelu Assignee: Henry Robinson Fix For: 3.3.1, 3.4.0 Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt, ZOOKEEPER-763.patch, ZOOKEEPER-763.patch deadlocks occur if we attempt to close a handle while there are any outstanding async requests (aget, acreate, etc). Normally on close both the io thread terminates and the completion thread are terminated and joined, however w\ith outstanding async requests, the completion thread won't be in a joinable state, and we effectively hang when the main thread does the join. afaics ideal behavior would be on close of a handle, to effectively clear out any remaining callbacks and let the completion thread terminate. i've tried adding some bookkeeping to within a python client to guard against closing while there is an outstanding async completion request, but its an imperfect solution since even after the python callback is executed there is still a window for deadlock before the completion thread finishes the callback. a simple example to reproduce the deadlock is attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.