There is at least one fundamental bug in start.c's signal_handler, as should be fixed by the below. However, this alone did not fix it for me, so more is wrong. There is a minimal testcase at http://people.canonical.com/~serge/signalfd.c which originally reproduced this bug, then was fixed by using the waitpid loop as below.
>From 7333b77759794f5420ea6898494073a28cac445f Mon Sep 17 00:00:00 2001 From: Serge Hallyn <serge.hal...@ubuntu.com> Date: Wed, 3 Jul 2013 14:13:08 -0500 Subject: [PATCH 1/1] start.c:signal_handler: fix wrong assumption about sigchld The monitor process adds a signalfd to the set of fds it watches with epoll. It calls signal_handler() when a signal is found in the sigfd. That returns 1 if the container init was found to be the exiting process. The flaw in reasoning (pointed out by Andy Whitcroft - thanks!) here is that if two children have exited, we assume we will get two sigchilds - in fact we may only get one. So when we get one, we need to reap all the children we have and check if any is the container init. Signed-off-by: Serge Hallyn <serge.hal...@ubuntu.com> --- src/lxc/start.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/src/lxc/start.c b/src/lxc/start.c index 8c8af9c..3fa50ad 100644 --- a/src/lxc/start.c +++ b/src/lxc/start.c @@ -162,7 +162,7 @@ static int signal_handler(int fd, void *data, struct lxc_epoll_descr *descr) { struct signalfd_siginfo siginfo; - int ret; + int ret, status, retval = 0; pid_t *pid = data; ret = read(fd, &siginfo, sizeof(siginfo)); @@ -188,16 +188,16 @@ static int signal_handler(int fd, void *data, return 0; } - /* more robustness, protect ourself from a SIGCHLD sent - * by a process different from the container init + /* + * wait for any and all children which are ours which are reapable, + * since if >1 children have exited, we'll only get one sigchld. */ - if (siginfo.ssi_pid != *pid) { - WARN("invalid pid for SIGCHLD"); - return 0; - } + while ((ret = waitpid(-1, &status, WNOHANG)) > 0) + if (ret == *pid) + retval = 1; // we reaped the container init DEBUG("container init process exited"); - return 1; + return retval; } int lxc_set_state(const char *name, struct lxc_handler *handler, lxc_state_t state) -- 1.8.1.2 ** Changed in: lxc (Ubuntu) Status: Confirmed => Triaged -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1168526 Title: race condition causing lxc to not detect container init process exit To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs