On Wed, Nov 09, 2011 at 03:31:42PM +0100, Pavel Zuna wrote: > On 11/01/2011 02:58 PM, Jakub Hrozek wrote: > >On Tue, Nov 01, 2011 at 11:34:27AM +0100, Pavel Zuna wrote: > >>On 10/31/2011 04:29 PM, Jakub Hrozek wrote: > >>>On Thu, Oct 27, 2011 at 02:03:25PM +0200, Pavel Zuna wrote: > >>>>On 10/25/2011 01:08 PM, Pavel Zuna wrote: > >>>>>On 10/21/2011 06:44 PM, Stephen Gallagher wrote: > >>>>>>On Fri, 2011-10-21 at 15:28 +0200, Pavel Zuna wrote: > >>>>>>>Base on the second proposal: > >>>>>>> > >>>>>>>https://fedorahosted.org/sssd/wiki/DesignDocs/SigChld > >>>>>>> > >>>>>>>There is some old SIGCHLD handling code in > >>>>>>>src/providers/child_common.[ch], that > >>>>>>>should probably go away if this gets accepted. There was also a naming > >>>>>>>conflict > >>>>>>>with the sss_child_ctx structure. This structure is only used > >>>>>>>internally by > >>>>>>>functions defined src/providers/child_common.c. I renamed the original > >>>>>>>structure > >>>>>>>to sss_child_ctx_old for now. > >>>>>> > >>>>>> > >>>>>>Nack (but a good start) > >>>>>> > >>>>>>sss_child_destructor() cannot return the result of hash_delete(). Talloc > >>>>>>destructors may only return 0 on success or -1 on failure. However, > >>>>>>because destructor failure results in cancelling the free(), we should > >>>>>>just log an error here and return 0. > >>>>>> > >>>>>>sss_child_register: > >>>>>>Don't initialize sss_child_ctx *tmp to the child_ctx passed in. We want > >>>>>>to leave child_ctx untouched until successful completion. Also, please > >>>>>>don't use the variable name "tmp". It's not descriptive. Call it > >>>>>>"child". > >>>>>> > >>>>>>Instead of allocating *child_ctx at the beginning, allocate "child" on > >>>>>>the mem_ctx. > >>>>>> > >>>>>>child = talloc_zero(mem_ctx, struct sss_child_ctx); > >>>>>> > >>>>>>Then, only at the end of execution, right before returning EOK, do > >>>>>>*child_ctx = child; > >>>>>> > >>> > >>>In sss_child_register(): > >>> > >>>+ error = hash_enter(sigchld_ctx->children,&key,&value); > >>>+ if (error != HASH_SUCCESS&& error != HASH_ERROR_KEY_NOT_FOUND) { > >>> > >>>I don't think hash_enter can return HASH_ERROR_KEY_NOT_FOUND (what would > >>>it mean?) and even if it did, why ignore it? > >>> > >>>>>> > >>>>>>sss_child_invoke_cb(): > >>>>>>I'm wary of having the callback invocation free the child_ctx. It needs > >>>>>>to be documented well (so that consumers know they must not free it once > >>>>>>the callback has fired or they'll get a double-free). > >>> > >>>This still needs fixing. > >>> > >> > >>Forgot about that, sorry. Replaced the talloc_free with removing > >>child_ctx from the hash table only. > >> > >>>>>> > >>>>>>sss_child_handler(): > >>>>>>You need to check whether waitpid() returned an error and handle that > >>>>>>appropriately. waitpid() will return -1 on error and set errno. If the > >>>>>>errno value is EINTR, you should retry (because waitpid() was > >>>>>>interrupted by a signal at the time). Other errors should generate a > >>>>>>DEBUG message. > >>> > >>>I think you misunderstood how the error handling should be done: > >>> > >>>+ if (pid == -1) { > >>>+ if (errno == ECHILD) { > >>>+ DEBUG(0, ("waitpid failed with ECHILD\n")); > >>>+ } else if (errno == EINVAL) { > >>>+ DEBUG(0, ("waitpid failed with EINVAL\n")); > >>>+ } > >>>+ return; > >>>+ } > >>> > >>>Instead of having a DEBUG() call per retval, it would be better to inform > >>>user > >>>on why the child process exited, similarly to how child_sig_handler() does > >>>it > >>>now. > >>> > >> > >>I don't agree. The system is designed to inform providers that a > >>child has exited by invoking their callbacks. It's up to the > >>provider to decide what to do with the result and inform the user if > >>appropriate. This would also make sense only in the case of a > >>successful call (pid>= 0). > > > >OK, then why the separate branches for ECHILD and EINVAL? Isn't it > >easier to just print errno and strerror(errno) ? > > > > Yeah, it's definitely better. Improved in current patch. :) > > >> > >>In case of EINTR, we need to repeat the call to waitpid, because it > >>was interrupted by a signal. In other cases (ECHILD and EINVAL) the > >>call is going to always fail, so there's won't be any callbacks to > >>invoke. > > > >Yup. > > > >> > >>>>>> > >>>>> > >>>>>Ok, I'm done with all of this. It was pretty straight forward, but... > >>>>> > >>>>>> > >>>>>>Please separate any changes you make to the existing sss_child_ctx into > >>>>>>its own patch. I don't like just renaming it to _old, either. Please > >>>>>>find a more appropriate way to replace this old object (preferably by > >>>>>>converting it to use your new mechanism, also in a new patch). > >>>>>> > >>>>> > >>>>>... this, I'm trying to find a way to make the old code use the new > >>>>>mechanism. I > >>>>>think it would be better to remove the old one completely and replace it > >>>>>with > >>>>>the new one everywhere where it's used. > >>>>> > >>>>>Pavel > >>>> > >>>>New patch with the above mentioned issues fixed is attached to this email. > >>>> > >>>>We agreed that we're going to replace all uses of the old system by > >>>>the new. Expect about 3 patches to follow on this one - one for each > >>>>provider using the old system. > >>>> > >>>>Pavel > >>> > >> > >>Thanks for taking the time to review this. New patch attached. > >> > >>Pavel > > > >Two more issues (sorry for not catching all this at once): > > > >Please don't just use DEBUG(0) for all the messages. DEBUG(0) is > >supposed to be used for fatal errors that cause the provider to quit > >completely. > > > >See https://fedorahosted.org/pipermail/sssd-devel/2011-June/006407.html > >for explanation of the log levels. In general, this patch is going to be > >included in 1.7 and onwards -- please use the new debug levels (see > >src/util/util.h) defined as macros instead of integers. > > > > Ok. > > >The second issue is that I'm getting a segfault when logging in via > >Kerberos, the backtrace is pointing to tevent_common_check_signal(). > > Finally found what the problem was. I suspect there might be a bug in tevent. > > The SEGFAULT was generated from tevent_common_check_signal as you > pointed out. It happened when copying a portion of siginfo that my > signal handler wasn't using. > > Simply adding the SA_SIGINFO flag to tevent_add_signal solved the issue. > > Updated patch attached. > > Pavel
The patch is OK now, I just needed to use a 3-way merge to get it applied - probably some patches landed in master since it was submitted for review. Please ask Simo about the flag, this is too low-level tevent knowledge for me. I'll ack the patch once you know what was the tevent problem. _______________________________________________ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/sssd-devel