[jira] Commented: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-06 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864871#action_12864871
 ] 

Patrick Hunt commented on ZOOKEEPER-763:


For some reason I got confused on the 3.3 branch (may not have been up to 
date), the main patch applies to both just fine. Fixed this in svn.

 Deadlock on close w/ zkpython / c client
 

 Key: ZOOKEEPER-763
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.0
 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
Reporter: Kapil Thangavelu
Assignee: Henry Robinson
 Fix For: 3.3.1, 3.4.0

 Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt, 
 ZOOKEEPER-763.patch, ZOOKEEPER-763.patch


 deadlocks occur if we attempt to close a handle while there are any 
 outstanding async requests (aget, acreate, etc). Normally on close both the 
 io thread terminates and the completion thread are terminated and joined, 
 however w\ith outstanding async requests, the completion thread won't be in a 
 joinable state, and we effectively hang when the main thread does the join.
 afaics ideal behavior would be on close of a handle, to effectively clear out 
 any remaining callbacks and let the completion thread terminate.
 i've tried adding some bookkeeping to within a python client to guard against 
 closing while there is an outstanding async completion request, but its an 
 imperfect solution since even after the python callback is executed there is 
 still a window for deadlock before the completion thread finishes the 
 callback.
 a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-05 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864429#action_12864429
 ] 

Henry Robinson commented on ZOOKEEPER-763:
--

Hi Kapil - 

As seems to be the norm for me this week, I'm struggling to reproduce :) It 
does seem like your python script explicitly waits for a completion to be 
called before closing a handle. Is this enough to leave an outstanding 
completion on the queue?

Can you capture the stacktrace for the completion thread? I think it must be 
getting stuck in process_completions but it would be very valuable to know 
where - if it's stuck on the callback into zkpython then that means the 
deadlock is in the python bindings and not solely in C-land.

cheers,
Henry

 Deadlock on close w/ zkpython / c client
 

 Key: ZOOKEEPER-763
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
 Project: Zookeeper
  Issue Type: Bug
  Components: c client, contrib-bindings
Affects Versions: 3.3.0
 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
Reporter: Kapil Thangavelu
Assignee: Mahadev konar
 Fix For: 3.4.0

 Attachments: deadlock.py, stack-trace-deadlock.txt


 deadlocks occur if we attempt to close a handle while there are any 
 outstanding async requests (aget, acreate, etc). Normally on close both the 
 io thread terminates and the completion thread are terminated and joined, 
 however w\ith outstanding async requests, the completion thread won't be in a 
 joinable state, and we effectively hang when the main thread does the join.
 afaics ideal behavior would be on close of a handle, to effectively clear out 
 any remaining callbacks and let the completion thread terminate.
 i've tried adding some bookkeeping to within a python client to guard against 
 closing while there is an outstanding async completion request, but its an 
 imperfect solution since even after the python callback is executed there is 
 still a window for deadlock before the completion thread finishes the 
 callback.
 a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-05 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864488#action_12864488
 ] 

Henry Robinson commented on ZOOKEEPER-763:
--

Kapil - 

Thanks! Adding that sleep helped me understand what was going on. 

pyzoo_close has the GIL but blocks inside zookeeper_close, waiting for the 
completion thread to finish. However, if a completion is still inside Python, 
but has been pre-empted by the main thread which calls pyzoo_close, the 
completion can't get the GIL back to finish up executing, blocking the 
completions_thread for ever more. The fix is simple - relinquish the GIL during 
the zookeeper_close call, and then reacquire it straight after. There are even 
handy macros to do this:

Py_BEGIN_ALLOW_THREADS
ret = zookeeper_close(zhandles[zkhid]);
Py_END_ALLOW_THREADS

This same issue will affect any part of zkpython where a call to the C client 
is blocked on some work being completed in another Python thread - in practice, 
I think this means from callbacks. I'll audit the code to see if any other API 
calls are affected. Patch to fix this issue is following shortly - Kapil, I'd 
be very grateful if you could help us by testing it. 

cheers,
Henry

 Deadlock on close w/ zkpython / c client
 

 Key: ZOOKEEPER-763
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
 Project: Zookeeper
  Issue Type: Bug
  Components: c client, contrib-bindings
Affects Versions: 3.3.0
 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
Reporter: Kapil Thangavelu
Assignee: Mahadev konar
 Fix For: 3.4.0

 Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt


 deadlocks occur if we attempt to close a handle while there are any 
 outstanding async requests (aget, acreate, etc). Normally on close both the 
 io thread terminates and the completion thread are terminated and joined, 
 however w\ith outstanding async requests, the completion thread won't be in a 
 joinable state, and we effectively hang when the main thread does the join.
 afaics ideal behavior would be on close of a handle, to effectively clear out 
 any remaining callbacks and let the completion thread terminate.
 i've tried adding some bookkeeping to within a python client to guard against 
 closing while there is an outstanding async completion request, but its an 
 imperfect solution since even after the python callback is executed there is 
 still a window for deadlock before the completion thread finishes the 
 callback.
 a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-05 Thread Kapil Thangavelu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864523#action_12864523
 ] 

Kapil Thangavelu commented on ZOOKEEPER-763:


works for me on a couple of different zkpython apps (ubuntu lucid) , thanks! 

 Deadlock on close w/ zkpython / c client
 

 Key: ZOOKEEPER-763
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.0
 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
Reporter: Kapil Thangavelu
Assignee: Henry Robinson
 Fix For: 3.3.1, 3.4.0

 Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt, 
 ZOOKEEPER-763.patch, ZOOKEEPER-763.patch


 deadlocks occur if we attempt to close a handle while there are any 
 outstanding async requests (aget, acreate, etc). Normally on close both the 
 io thread terminates and the completion thread are terminated and joined, 
 however w\ith outstanding async requests, the completion thread won't be in a 
 joinable state, and we effectively hang when the main thread does the join.
 afaics ideal behavior would be on close of a handle, to effectively clear out 
 any remaining callbacks and let the completion thread terminate.
 i've tried adding some bookkeeping to within a python client to guard against 
 closing while there is an outstanding async completion request, but its an 
 imperfect solution since even after the python callback is executed there is 
 still a window for deadlock before the completion thread finishes the 
 callback.
 a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.