There is at least one fundamental bug in start.c's signal_handler, as should
be fixed by the below.  However, this alone did not fix it for me, so more is 
wrong.  There is a minimal testcase at 
http://people.canonical.com/~serge/signalfd.c which originally reproduced this 
bug, then was fixed by using the waitpid loop as below.

>From 7333b77759794f5420ea6898494073a28cac445f Mon Sep 17 00:00:00 2001
From: Serge Hallyn <serge.hal...@ubuntu.com>
Date: Wed, 3 Jul 2013 14:13:08 -0500
Subject: [PATCH 1/1] start.c:signal_handler: fix wrong assumption about
 sigchld

The monitor process adds a signalfd to the set of fds it watches
with epoll.  It calls signal_handler() when a signal is found in
the sigfd.  That returns 1 if the container init was found to
be the exiting process.

The flaw in reasoning (pointed out by Andy Whitcroft - thanks!)
here is that if two children have exited, we assume we will get
two sigchilds - in fact we may only get one.  So when we get one,
we need to reap all the children we have and check if any is the
container init.

Signed-off-by: Serge Hallyn <serge.hal...@ubuntu.com>
---
 src/lxc/start.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/lxc/start.c b/src/lxc/start.c
index 8c8af9c..3fa50ad 100644
--- a/src/lxc/start.c
+++ b/src/lxc/start.c
@@ -162,7 +162,7 @@ static int signal_handler(int fd, void *data,
                           struct lxc_epoll_descr *descr)
 {
        struct signalfd_siginfo siginfo;
-       int ret;
+       int ret, status, retval = 0;
        pid_t *pid = data;
 
        ret = read(fd, &siginfo, sizeof(siginfo));
@@ -188,16 +188,16 @@ static int signal_handler(int fd, void *data,
                return 0;
        }
 
-       /* more robustness, protect ourself from a SIGCHLD sent
-        * by a process different from the container init
+       /*
+        * wait for any and all children which are ours which are reapable,
+        * since if >1 children have exited, we'll only get one sigchld.
         */
-       if (siginfo.ssi_pid != *pid) {
-               WARN("invalid pid for SIGCHLD");
-               return 0;
-       }
+       while ((ret = waitpid(-1, &status, WNOHANG)) > 0)
+               if (ret == *pid)
+                       retval = 1; // we reaped the container init
 
        DEBUG("container init process exited");
-       return 1;
+       return retval;
 }
 
 int lxc_set_state(const char *name, struct lxc_handler *handler, lxc_state_t 
state)
-- 
1.8.1.2


** Changed in: lxc (Ubuntu)
       Status: Confirmed => Triaged

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1168526

Title:
  race condition causing lxc to not detect container init process exit

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Reply via email to