Scott, this is not my problem and well known. In fact, as you can see in
[4] of the original post, I did all that and it didn't help. I invested
some time and finally found the exact problem.

Anyone who is still encountering this (which is likely with the current
stable vserver patch which hasn't changed for years) or anyone trying to
run upstart within a chroot will encounter the following:

1. If /sbin/init is called from a user process it won't have pid 1 -- The 
common way to react (sysvinit does it too) is to "replace this process with 
telinit", since the user probably wants to switch the runlevel using "init 3". 
(He really wants "telinit 3").
If upstart is started in a chroot then init's pid is always >1, so you can't 
start upstart's /sbin/init in there. It will always execv() telinit and fail 
with either a "wrong usage" error (runlevel is missing) or throw the 
above-mentioned "telinit: Failed to connect to socket /com/ubuntu/upstart: 
Connection refused", because there is no /sbin/init dbus socket listening.

You can fix that for the chroot by patching the upstart source and
replacing the (pid > 1)? condition by something like (pid > 1 &&
have_no_runlevel_in_my_args_so_this_is_not_to_be_intended_as_telinit).
In addition you need to fix some upstart jobs that signal pid 1 using
"kill -SIG... 1" directly. For that to work inside the chroot you would
need something like "killall -SIG... init", since init won't be pid 1.

2. Now here comes the difference with Linux-VServer. If you use the "plain" 
init style (see original post) the /sbin/init process within the guest will end 
up with pid 1. Wait, everything should be fine?!
No, because upstart's /sbin/init is linked with NPTL, in contrast to 
SysV-Init's /sbin/init (which was used for all prior Ubuntu Versions) which is 
NOT.

# SysV:
$ ldd /sbin/init
        libc.so.6 => /lib/libc.so.6 (0x00002b5756108000)
        /lib64/ld-linux-x86-64.so.2 (0x00002b5755edf000)

# Upstart:
$ ldd /sbin/init
        linux-vdso.so.1 =>  (0x00007fff8dcd0000)
        libdbus-1.so.3 => /lib/libdbus-1.so.3 (0x00007faab9a19000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x00007faab97fd000)
        librt.so.1 => /lib/librt.so.1 (0x00007faab95f5000)
        libc.so.6 => /lib/libc.so.6 (0x00007faab9286000)
        /lib64/ld-linux-x86-64.so.2 (0x00007faab9c58000)

OK, fine. Where's the problem with -lpthread? Upstart's init checks for
pid 1 using getpid(2). There is a known BUG, quoting it's manpage:

Since glibc version 2.3.4, the glibc wrapper function for getpid()
caches PIDs, so as to  avoid  additional system calls when a process
calls getpid() repeatedly.  Normally this caching is invisible, but its
correct operation relies on support in the wrapper functions for
fork(2), vfork(2), and clone(2): if an application bypasses  the  glibc
wrappers  for  these system calls by using syscall(2), then a call to
getpid() in the child will return the wrong value (to be precise: it
will return the PID of the parent process).  See  also clone(2) for
discussion of a case where getpid() may return the wrong value even when
invoking clone(2) via the glibc wrapper function.

This is the exact bug causing Upstart's /sbin/init within a Linux-
VServer guest to *always* replace itself with telinit. Calling getpid(2)
will NOT return the real pid, but the cached one, which is the pid of
the util-vserver startup script.

Here is a demonstration, using the attached program as /sbin/init:

$ gcc -lpthread -o /sbin/init init.c  # (this is how upstart's init is linked)
$ vserver foo start
ppid = 20523, pid = 20524
syscall getpid = 1

Notice the wrong "pid = 20524" above coming from getpid(2), which is a
cached pid. The syscall gave us the correct pid, which is 1!

Now the same linked without NPTL:

$ gcc -o /sbin/init init.c  # (this is how sysvinit's init is linked)
$ vserver foo start
ppid = 20278, pid = 1
syscall getpid = 1

Everything is fine!

So here are your options to fix it:

1. Fix upstart and replace the getpid(2) call by syscall(SYS_getpid) to get the 
_real_ process id.
or
2. Upgrade to a development version of the Linux-VServer kernel patch against a 
newer kernel. Even with the exact same host and guest system (same glibc, same 
/sbin/init binary) but newer kernel, this yields a correct result for 
getpid(2). I don't have the time to invest this any further, and the changes to 
clone(2) and the whole namespace system between kernel version 2.6.22 and 
2.6.26+ are immense. Something within the lines has fixed this bug.

Current Linux-VServer stable kernel is: 2.6.22.19-vs2.2.0.7 -- BROKEN with 
GLIBC 2.10.1-0ubuntu15
Development Linux-VServer kernel: 2.6.31.6-vs2.3.0.36.24 -- WORKS fine, 
expected result for getpid(2).

With this fix in place, /sbin/init provided by upstart will work and
everything is fine.

-- 
upstart incompatible with linux-vserver
https://bugs.launchpad.net/bugs/482292
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to