[ https://issues.apache.org/jira/browse/ZOOKEEPER-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674452#action_12674452 ]
Chris Darroch commented on ZOOKEEPER-320: ----------------------------------------- I should perhaps clarify that the reason one might want to unconditionally wait on the auth completion after calling zoo_add_auth() is so as to provide one's own sync version of that function. The various sync functions such as zoo_wget(), zoo_set2(), etc. are just wrappers of the async versions followed by an unconditional wait on the relevant completion. Similarly, in a context which must be exclusively sync-only, zoo_add_auth() needs to be followed by a wait on its completion. > call auth completion in free_completions() > ------------------------------------------ > > Key: ZOOKEEPER-320 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-320 > Project: Zookeeper > Issue Type: Bug > Components: c client > Affects Versions: 3.0.0, 3.0.1, 3.1.0 > Reporter: Chris Darroch > Fix For: 3.1.1, 3.2.0 > > Attachments: ZOOKEEPER-320-319.patch, ZOOKEEPER-320.patch > > > If a client calls zoo_add_auth() with an invalid scheme (e.g., "foo") the > ZooKeeper server will mark their session expired and close the connection. > However, the C client has returned immediately after queuing the new auth > data to be sent with a ZOK return code. > If the client then waits for their auth completion function to be called, > they can wait forever, as no session event is ever delivered to that > completion function. All other completion functions are notified of session > events by free_completions(), which is called by cleanup_bufs() in > handle_error() in handle_socket_error_msg(). > In actual fact, what can happen (about 50% of the time, for me) is that the > next call by the IO thread to flush_send_queue() calls send() from within > send_buffer(), and receives a SIGPIPE signal during this send() call. > Because the ZooKeeper C API is a library, it properly does not catch that > signal. If the user's code is not catching that signal either, they > experience an abort caused by an untrapped signal. If they are ignoring the > signal -- which is common in context I'm working in, the Apache httpd server > -- then flush_send_queue()'s error return code is EPIPE, which is logged by > handle_socket_error_msg(), and all non-auth completion functions are notified > of a session event. However, if the caller is waiting for their auth > completion function, they wait forever while the IO thread tries repeatedly > to reconnect and is rejected by the server as having an expired session. > So, first of all, it would be useful to document in the C API portion of the > programmer's guide that trapping or ignoring SIGPIPE is important, as this > signal may be generated by the C API. > Next, the two attached patches call the auth completion function, if any, in > free_completions(), which fixes this problem for me. The second attached > patch includes auth lock/unlock function, as per ZOOKEEPER-319. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.