Re: [SSSD] [PATCH] Add common SIGCHLD handling.

Jakub Hrozek Tue, 22 Nov 2011 06:03:49 -0800

On Wed, Nov 09, 2011 at 03:31:42PM +0100, Pavel Zuna wrote:
> On 11/01/2011 02:58 PM, Jakub Hrozek wrote:
> >On Tue, Nov 01, 2011 at 11:34:27AM +0100, Pavel Zuna wrote:
> >>On 10/31/2011 04:29 PM, Jakub Hrozek wrote:
> >>>On Thu, Oct 27, 2011 at 02:03:25PM +0200, Pavel Zuna wrote:
> >>>>On 10/25/2011 01:08 PM, Pavel Zuna wrote:
> >>>>>On 10/21/2011 06:44 PM, Stephen Gallagher wrote:
> >>>>>>On Fri, 2011-10-21 at 15:28 +0200, Pavel Zuna wrote:
> >>>>>>>Base on the second proposal:
> >>>>>>>
> >>>>>>>https://fedorahosted.org/sssd/wiki/DesignDocs/SigChld
> >>>>>>>
> >>>>>>>There is some old SIGCHLD handling code in 
> >>>>>>>src/providers/child_common.[ch], that
> >>>>>>>should probably go away if this gets accepted. There was also a naming 
> >>>>>>>conflict
> >>>>>>>with the sss_child_ctx structure. This structure is only used 
> >>>>>>>internally by
> >>>>>>>functions defined src/providers/child_common.c. I renamed the original 
> >>>>>>>structure
> >>>>>>>to sss_child_ctx_old for now.
> >>>>>>
> >>>>>>
> >>>>>>Nack (but a good start)
> >>>>>>
> >>>>>>sss_child_destructor() cannot return the result of hash_delete(). Talloc
> >>>>>>destructors may only return 0 on success or -1 on failure. However,
> >>>>>>because destructor failure results in cancelling the free(), we should
> >>>>>>just log an error here and return 0.
> >>>>>>
> >>>>>>sss_child_register:
> >>>>>>Don't initialize sss_child_ctx *tmp to the child_ctx passed in. We want
> >>>>>>to leave child_ctx untouched until successful completion. Also, please
> >>>>>>don't use the variable name "tmp". It's not descriptive. Call it
> >>>>>>"child".
> >>>>>>
> >>>>>>Instead of allocating *child_ctx at the beginning, allocate "child" on
> >>>>>>the mem_ctx.
> >>>>>>
> >>>>>>child = talloc_zero(mem_ctx, struct sss_child_ctx);
> >>>>>>
> >>>>>>Then, only at the end of execution, right before returning EOK, do
> >>>>>>*child_ctx = child;
> >>>>>>
> >>>
> >>>In sss_child_register():
> >>>
> >>>+    error = hash_enter(sigchld_ctx->children,&key,&value);
> >>>+    if (error != HASH_SUCCESS&&   error != HASH_ERROR_KEY_NOT_FOUND) {
> >>>
> >>>I don't think hash_enter can return HASH_ERROR_KEY_NOT_FOUND (what would
> >>>it mean?) and even if it did, why ignore it?
> >>>
> >>>>>>
> >>>>>>sss_child_invoke_cb():
> >>>>>>I'm wary of having the callback invocation free the child_ctx. It needs
> >>>>>>to be documented well (so that consumers know they must not free it once
> >>>>>>the callback has fired or they'll get a double-free).
> >>>
> >>>This still needs fixing.
> >>>
> >>
> >>Forgot about that, sorry. Replaced the talloc_free with removing
> >>child_ctx from the hash table only.
> >>
> >>>>>>
> >>>>>>sss_child_handler():
> >>>>>>You need to check whether waitpid() returned an error and handle that
> >>>>>>appropriately. waitpid() will return -1 on error and set errno. If the
> >>>>>>errno value is EINTR, you should retry (because waitpid() was
> >>>>>>interrupted by a signal at the time). Other errors should generate a
> >>>>>>DEBUG message.
> >>>
> >>>I think you misunderstood how the error handling should be done:
> >>>
> >>>+        if (pid == -1) {
> >>>+            if (errno == ECHILD) {
> >>>+                DEBUG(0, ("waitpid failed with ECHILD\n"));
> >>>+            } else if (errno == EINVAL) {
> >>>+                DEBUG(0, ("waitpid failed with EINVAL\n"));
> >>>+            }
> >>>+            return;
> >>>+        }
> >>>
> >>>Instead of having a DEBUG() call per retval, it would be better to inform 
> >>>user
> >>>on why the child process exited, similarly to how child_sig_handler() does 
> >>>it
> >>>now.
> >>>
> >>
> >>I don't agree. The system is designed to inform providers that a
> >>child has exited by invoking their callbacks. It's up to the
> >>provider to decide what to do with the result and inform the user if
> >>appropriate. This would also make sense only in the case of a
> >>successful call (pid>= 0).
> >
> >OK, then why the separate branches for ECHILD and EINVAL? Isn't it
> >easier to just print errno and strerror(errno) ?
> >
> 
> Yeah, it's definitely better. Improved in current patch. :)
> 
> >>
> >>In case of EINTR, we need to repeat the call to waitpid, because it
> >>was interrupted by a signal. In other cases (ECHILD and EINVAL) the
> >>call is going to always fail, so there's won't be any callbacks to
> >>invoke.
> >
> >Yup.
> >
> >>
> >>>>>>
> >>>>>
> >>>>>Ok, I'm done with all of this. It was pretty straight forward, but...
> >>>>>
> >>>>>>
> >>>>>>Please separate any changes you make to the existing sss_child_ctx into
> >>>>>>its own patch. I don't like just renaming it to _old, either. Please
> >>>>>>find a more appropriate way to replace this old object (preferably by
> >>>>>>converting it to use your new mechanism, also in a new patch).
> >>>>>>
> >>>>>
> >>>>>... this, I'm trying to find a way to make the old code use the new 
> >>>>>mechanism. I
> >>>>>think it would be better to remove the old one completely and replace it 
> >>>>>with
> >>>>>the new one everywhere where it's used.
> >>>>>
> >>>>>Pavel
> >>>>
> >>>>New patch with the above mentioned issues fixed is attached to this email.
> >>>>
> >>>>We agreed that we're going to replace all uses of the old system by
> >>>>the new. Expect about 3 patches to follow on this one - one for each
> >>>>provider using the old system.
> >>>>
> >>>>Pavel
> >>>
> >>
> >>Thanks for taking the time to review this. New patch attached.
> >>
> >>Pavel
> >
> >Two more issues (sorry for not catching all this at once):
> >
> >Please don't just use DEBUG(0) for all the messages. DEBUG(0) is
> >supposed to be used for fatal errors that cause the provider to quit
> >completely.
> >
> >See https://fedorahosted.org/pipermail/sssd-devel/2011-June/006407.html
> >for explanation of the log levels. In general, this patch is going to be
> >included in 1.7 and onwards -- please use the new debug levels (see
> >src/util/util.h) defined as macros instead of integers.
> >
> 
> Ok.
> 
> >The second issue is that I'm getting a segfault when logging in via
> >Kerberos, the backtrace is pointing to tevent_common_check_signal().
> 
> Finally found what the problem was. I suspect there might be a bug in tevent.
> 
> The SEGFAULT was generated from tevent_common_check_signal as you
> pointed out. It happened when copying a portion of siginfo that my
> signal handler wasn't using.
> 
> Simply adding the SA_SIGINFO flag to tevent_add_signal solved the issue.
> 
> Updated patch attached.
> 
> Pavel


The patch is OK now, I just needed to use a 3-way merge to get it applied -
probably some patches landed in master since it was submitted for review.

Please ask Simo about the flag, this is too low-level tevent knowledge
for me.

I'll ack the patch once you know what was the tevent problem.

_______________________________________________
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/sssd-devel

Re: [SSSD] [PATCH] Add common SIGCHLD handling.

Reply via email to