call auth completion in free_completions()
------------------------------------------

                 Key: ZOOKEEPER-320
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-320
             Project: Zookeeper
          Issue Type: Bug
          Components: c client
    Affects Versions: 3.1.0, 3.0.1, 3.0.0
            Reporter: Chris Darroch
             Fix For: 3.1.1, 3.2.0


If a client calls zoo_add_auth() with an invalid scheme (e.g., "foo") the 
ZooKeeper server will mark their session expired and close the connection.  
However, the C client has returned immediately after queuing the new auth data 
to be sent with a ZOK return code.

If the client then waits for their auth completion function to be called, they 
can wait forever, as no session event is ever delivered to that completion 
function.  All other completion functions are notified of session events by 
free_completions(), which is called by cleanup_bufs() in handle_error() in 
handle_socket_error_msg().

In actual fact, what can happen (about 50% of the time, for me) is that the 
next call by the IO thread to flush_send_queue() calls send() from within 
send_buffer(), and receives a SIGPIPE signal during this send() call.  Because 
the ZooKeeper C API is a library, it properly does not catch that signal.  If 
the user's code is not catching that signal either, they experience an abort 
caused by an untrapped signal.  If they are ignoring the signal -- which is 
common in context I'm working in, the Apache httpd server -- then 
flush_send_queue()'s error return code is EPIPE, which is logged by 
handle_socket_error_msg(), and all non-auth completion functions are notified 
of a session event.  However, if the caller is waiting for their auth 
completion function, they wait forever while the IO thread tries repeatedly to 
reconnect and is rejected by the server as having an expired session.

So, first of all, it would be useful to document in the C API portion of the 
programmer's guide that trapping or ignoring SIGPIPE is important, as this 
signal may be generated by the C API.

Next, the two attached patches call the auth completion function, if any, in 
free_completions(), which fixes this problem for me.  The second attached patch 
includes auth lock/unlock function, as per ZOOKEEPER-319.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to