This bug was fixed in the package lxc - 1.0.0~alpha1-0ubuntu2
---
lxc (1.0.0~alpha1-0ubuntu2) saucy; urgency=low
* Add allow-stderr to autopkgtst restrictions as the Ubuntu template
uses policy-rc.d to disable some daemons and that causes a message to
be printed on stderr
This bug was fixed in the package lxc - 1.0.0~alpha1-0ubuntu2
---
lxc (1.0.0~alpha1-0ubuntu2) saucy; urgency=low
* Add allow-stderr to autopkgtst restrictions as the Ubuntu template
uses policy-rc.d to disable some daemons and that causes a message to
be printed on stderr
The following kernel patches fixes it for me, will send to lkml:
diff --git a/debian.master/changelog b/debian.master/changelog
index f8f7a35a..081e666 100644
--- a/debian.master/changelog
+++ b/debian.master/changelog
@@ -1,3 +1,9 @@
+linux (3.11.0-4.9debug1) saucy; urgency=low
+
+ * debug 1
+
Separately, a patch has been committed to upstream lxc to eliminate any
chance of a race stopping the lxc monitor from seeing the container init
exit. Note that this doesn't stop the kernel bug from happening.
** Changed in: linux (Ubuntu)
Status: Incomplete = Confirmed
** Changed in:
The following kernel patches fixes it for me, will send to lkml:
diff --git a/debian.master/changelog b/debian.master/changelog
index f8f7a35a..081e666 100644
--- a/debian.master/changelog
+++ b/debian.master/changelog
@@ -1,3 +1,9 @@
+linux (3.11.0-4.9debug1) saucy; urgency=low
+
+ * debug 1
+
Separately, a patch has been committed to upstream lxc to eliminate any
chance of a race stopping the lxc monitor from seeing the container init
exit. Note that this doesn't stop the kernel bug from happening.
** Changed in: linux (Ubuntu)
Status: Incomplete = Confirmed
** Changed in:
This bug is introduced between v3.7 and v3.8, by commit:
af4b8a83add95ef40716401395b44a1b579965f4 pidns: Wait in
zap_pid_ns_processes until pid_ns-nr_hashed == 1
** Also affects: linux (Ubuntu)
Importance: Undecided
Status: New
** Tags added: bot-stop-nagging
--
You received this
This bug is introduced between v3.7 and v3.8, by commit:
af4b8a83add95ef40716401395b44a1b579965f4 pidns: Wait in
zap_pid_ns_processes until pid_ns-nr_hashed == 1
** Also affects: linux (Ubuntu)
Importance: Undecided
Status: New
** Tags added: bot-stop-nagging
--
You received this
Ok, for the record I cannot reproduce this no precise - I've tried with
both an 8 core fast precise host, and a 4 core precise kvm guest - but
using the same lxc version from the daily ubuntu-lxc ppa.
So it appears to be a kernel regression between 3.2 and 3.8.
--
You received this bug
Ok, for the record I cannot reproduce this no precise - I've tried with
both an 8 core fast precise host, and a 4 core precise kvm guest - but
using the same lxc version from the daily ubuntu-lxc ppa.
So it appears to be a kernel regression between 3.2 and 3.8.
--
You received this bug
There is at least one fundamental bug in start.c's signal_handler, as should
be fixed by the below. However, this alone did not fix it for me, so more is
wrong. There is a minimal testcase at
http://people.canonical.com/~serge/signalfd.c which originally reproduced this
bug, then was fixed by
There is at least one fundamental bug in start.c's signal_handler, as should
be fixed by the below. However, this alone did not fix it for me, so more is
wrong. There is a minimal testcase at
http://people.canonical.com/~serge/signalfd.c which originally reproduced this
bug, then was fixed by
Ok - thanks - I was able to reproduce this in a raring VM with 4 cores.
I thought this might be a tiny race window between us blocking signals
and creating the signalfd, so I reversed those (a patch which I may yet
send upstream) - but that didn't solve the issue.
--
You received this bug
It does seem like this must be a kernel bug in epoll+signalfd (or a hard
to spot misuse thereof in lxc).
When I instrument the signal_handler which is executed when epoll_wait
returns a signalfd event, I do get a sigchld for the very first task
which is spawned (a test to see if kernel supports
Ok - thanks - I was able to reproduce this in a raring VM with 4 cores.
I thought this might be a tiny race window between us blocking signals
and creating the signalfd, so I reversed those (a patch which I may yet
send upstream) - but that didn't solve the issue.
--
You received this bug
It does seem like this must be a kernel bug in epoll+signalfd (or a hard
to spot misuse thereof in lxc).
When I instrument the signal_handler which is executed when epoll_wait
returns a signalfd event, I do get a sigchld for the very first task
which is spawned (a test to see if kernel supports
Hey Serge, let me know if that repro worked for you or when you're
planning to give it a try. I'm keeping the VM image around in case you
need it.
What's odd is that I can't even reproduce it with the daily ppa build,
which doesn't have the workaround which is in the ubuntu package.
Did you
Hey Serge, let me know if that repro worked for you or when you're
planning to give it a try. I'm keeping the VM image around in case you
need it.
What's odd is that I can't even reproduce it with the daily ppa build,
which doesn't have the workaround which is in the ubuntu package.
Did you
Quoting Pavel Bennett (launch...@pavelbennett.com):
Hey Serge, were you able to get a reliable repro for this? I have a
No, I wasn't (assuming you mean with our workaround in place in the
ubuntu package).
In comment #8 you said node.js was able to reproduce this. Regarding
that,
1. Does that
I can't try Saucy right now, but the repro instructions with kernel
versions are in the original post and in #2.
We've tried node v0.11.2 as well on Raring and got the repro.
Repro summary:
Install any of the above kernels, such as the one with the Raring installer,
then install lxc from apt.
Quoting Pavel Bennett (launch...@pavelbennett.com):
I can't try Saucy right now, but the repro instructions with kernel
versions are in the original post and in #2.
We've tried node v0.11.2 as well on Raring and got the repro.
Repro summary:
Install any of the above kernels, such as the
Sure, run these inside the container:
git clone https://github.com/joyent/node.git --depth 1
cd node
./configure
make -j9
sudo make install
Then the binary will be at /usr/local/bin/node
It's v0.11.3-pre, but should still repro.
--
You received this bug notification because you are a member
Quoting Pavel Bennett (launch...@pavelbennett.com):
Sure, run these inside the container:
git clone https://github.com/joyent/node.git --depth 1
cd node
./configure
make -j9
sudo make install
Then the binary will be at /usr/local/bin/node
It's v0.11.3-pre, but should still repro.
I created a VM with Ubuntu Server 13.04 just for this bug. At first, I
was able to run the steps outlined above 50 times with no issues. What
was I missing? Concurrency! I rebooted the VM after adding 1 more core,
and... bingo! Zombies on the 3rd try.
The VM disk image I have here should be
Quoting Pavel Bennett (launch...@pavelbennett.com):
Hey Serge, were you able to get a reliable repro for this? I have a
No, I wasn't (assuming you mean with our workaround in place in the
ubuntu package).
In comment #8 you said node.js was able to reproduce this. Regarding
that,
1. Does that
I can't try Saucy right now, but the repro instructions with kernel
versions are in the original post and in #2.
We've tried node v0.11.2 as well on Raring and got the repro.
Repro summary:
Install any of the above kernels, such as the one with the Raring installer,
then install lxc from apt.
Quoting Pavel Bennett (launch...@pavelbennett.com):
I can't try Saucy right now, but the repro instructions with kernel
versions are in the original post and in #2.
We've tried node v0.11.2 as well on Raring and got the repro.
Repro summary:
Install any of the above kernels, such as the
Sure, run these inside the container:
git clone https://github.com/joyent/node.git --depth 1
cd node
./configure
make -j9
sudo make install
Then the binary will be at /usr/local/bin/node
It's v0.11.3-pre, but should still repro.
--
You received this bug notification because you are a member
Quoting Pavel Bennett (launch...@pavelbennett.com):
Sure, run these inside the container:
git clone https://github.com/joyent/node.git --depth 1
cd node
./configure
make -j9
sudo make install
Then the binary will be at /usr/local/bin/node
It's v0.11.3-pre, but should still repro.
I created a VM with Ubuntu Server 13.04 just for this bug. At first, I
was able to run the steps outlined above 50 times with no issues. What
was I missing? Concurrency! I rebooted the VM after adding 1 more core,
and... bingo! Zombies on the 3rd try.
The VM disk image I have here should be
Hey Serge, were you able to get a reliable repro for this? I have a
reason to upgrade to Raring, and this seems to be the only blocker.
We've reproduced the issue with the stock Linux Mint 15.
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is
Hey Serge, were you able to get a reliable repro for this? I have a
reason to upgrade to Raring, and this seems to be the only blocker.
We've reproduced the issue with the stock Linux Mint 15.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to
** Changed in: lxc (Ubuntu)
Status: Incomplete = Confirmed
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1168526
Title:
race condition causing lxc to not detect container init
** Changed in: lxc (Ubuntu)
Status: Incomplete = Confirmed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1168526
Title:
race condition causing lxc to not detect container init process exit
I've also tried it with a C++ app very similar to yours and was unable
to repro. There is something about having node.js as the init process
running a process.exit(0); js. The init process (node v0.11.0) does
exit as ps faux shows it as a zombie and a child of lxc-start.
I went back to kernel
I've also tried it with a C++ app very similar to yours and was unable
to repro. There is something about having node.js as the init process
running a process.exit(0); js. The init process (node v0.11.0) does
exit as ps faux shows it as a zombie and a child of lxc-start.
I went back to kernel
The newest kernel I've tests on is 3.8.0-17-generic. I'll need to set
up a system with the daily upstream build and re-test.
** Changed in: lxc (Ubuntu)
Importance: Undecided = Medium
** Changed in: lxc (Ubuntu)
Assignee: (unassigned) = Serge Hallyn (serge-hallyn)
--
You received this
No, I cannot reproduce this with the latest upstream kernel build
(3.9.0-030900rc7-generic #201304171402)
What I did:
sudo lxc-create -t ubuntu- n r1
cat exit0.c EOF
#include stdlib.h
int main() {
exit(0);
}
EOF
make exit0
sudo cp exit0 /var/lib/lxc/r0/rootfs/bin/
sudo lxc-start -n r1 --
The newest kernel I've tests on is 3.8.0-17-generic. I'll need to set
up a system with the daily upstream build and re-test.
** Changed in: lxc (Ubuntu)
Importance: Undecided = Medium
** Changed in: lxc (Ubuntu)
Assignee: (unassigned) = Serge Hallyn (serge-hallyn)
--
You received this
No, I cannot reproduce this with the latest upstream kernel build
(3.9.0-030900rc7-generic #201304171402)
What I did:
sudo lxc-create -t ubuntu- n r1
cat exit0.c EOF
#include stdlib.h
int main() {
exit(0);
}
EOF
make exit0
sudo cp exit0 /var/lib/lxc/r0/rootfs/bin/
sudo lxc-start -n r1 --
Quoting Pavel Bennett (launch...@pavelbennett.com):
Btw, that queueing mode would simply mean not calling epoll_wait until
the pid is available. This shouldn't require managing a queue ourselves.
Can you think of anything that this would break?
No - I had wanted to do this originally, but
Quoting Pavel Bennett (launch...@pavelbennett.com):
Btw, that queueing mode would simply mean not calling epoll_wait until
the pid is available. This shouldn't require managing a queue ourselves.
Can you think of anything that this would break?
No - I had wanted to do this originally, but
Btw, that queueing mode would simply mean not calling epoll_wait until
the pid is available. This shouldn't require managing a queue ourselves.
Can you think of anything that this would break?
Or we could go with the patch you've written, although I haven't looked
into why the problem appears to
Btw, that queueing mode would simply mean not calling epoll_wait until
the pid is available. This shouldn't require managing a queue ourselves.
Can you think of anything that this would break?
Or we could go with the patch you've written, although I haven't looked
into why the problem appears to
I should add that these forwarded signal 2 lines are due to me
pressing Ctrl+C and are not actually relevant.
Have you been able to repro this bug on kernel 3.8.6?
I'm thinking how to fix this as lxc_spawn is what gets the pid which is
needed by lxc_poll to listen for SIGCHLD from the correct
I should add that these forwarded signal 2 lines are due to me
pressing Ctrl+C and are not actually relevant.
Have you been able to repro this bug on kernel 3.8.6?
I'm thinking how to fix this as lxc_spawn is what gets the pid which is
needed by lxc_poll to listen for SIGCHLD from the correct
Precisely which version of lxc were you using?
I just put back version 0.9.0-0ubuntu2 (as opposed to the 0.9.0 I built
from source) while on kernel 3.7.9-030709-generic and haven't yet run
into this issue (I assume that's the patch you mentioned). However, when
I update to kernel
Precisely which version of lxc were you using?
I just put back version 0.9.0-0ubuntu2 (as opposed to the 0.9.0 I built
from source) while on kernel 3.7.9-030709-generic and haven't yet run
into this issue (I assume that's the patch you mentioned). However, when
I update to kernel
48 matches
Mail list logo