The last one I'm aware of that did not exhibit this issue was 3.5.0-27.
I wish I had a simpler repro though, since on our system it takes 10-15
hours of heavy processing to hit the uninterruptible sleeps.
Could it be tracked by looking at the state of the OS? Every new lxc-
start ends up hanging
The last one I'm aware of that did not exhibit this issue was 3.5.0-27.
I wish I had a simpler repro though, since on our system it takes 10-15
hours of heavy processing to hit the uninterruptible sleeps.
Could it be tracked by looking at the state of the OS? Every new lxc-
start ends up hanging
** Tags added: kernel-bug-exists-upstream
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1196295
Title:
lxc-start enters uninterruptible sleep
To manage notifications about this bug
Managed to repro with v3.10-saucy last night.
What do you guys suspect it could be?
I'm keeping the server in this state for now if you'd like me to gather
some data.
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
** Tags added: kernel-bug-exists-upstream
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1196295
Title:
lxc-start enters uninterruptible sleep
To manage notifications about this bug go to:
Managed to repro with v3.10-saucy last night.
What do you guys suspect it could be?
I'm keeping the server in this state for now if you'd like me to gather
some data.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
Changing to Confirmed as per instructions in comment #7
** Changed in: linux
Status: Incomplete = Confirmed
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1196295
Title:
Changing to Confirmed as per instructions in comment #7
** Changed in: linux
Status: Incomplete = Confirmed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1196295
Title:
lxc-start enters
Reproduced even with lxc-stop.
dmesg:
[178420.689704] unregister_netdevice: waiting for lo to become free. Usage
count = 1
[178430.919783] unregister_netdevice: waiting for lo to become free. Usage
count = 1
[178441.149854] unregister_netdevice: waiting for lo to become free. Usage
count = 1
This should help.
kern.log:
Jul 2 05:41:32 server1 kernel: [136565.201601] device vethbJ4JsM left
promiscuous mode
Jul 2 05:41:32 server1 kernel: [136565.201603] vmbr: port 5(vethbJ4JsM)
entered disabled state
Jul 2 05:41:38 server1 kernel: [136570.551496] vmbr: port 2(veth49SiBX)
entered
apport information
** Tags added: apport-collected
** Description changed:
After running and terminating around 6000 containers overnight,
something happened on my box that is affecting every new LXC container I
try to start. The DEBUG log file looks like:
lxc-start
apport information
** Attachment added: HookError_generic.txt
https://bugs.launchpad.net/bugs/1196295/+attachment/3722561/+files/HookError_generic.txt
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
apport information
** Attachment added: HookError_cloud_archive.txt
https://bugs.launchpad.net/bugs/1196295/+attachment/3722560/+files/HookError_cloud_archive.txt
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
apport information
** Attachment added: HookError_ubuntu.txt
https://bugs.launchpad.net/bugs/1196295/+attachment/3722564/+files/HookError_ubuntu.txt
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
apport information
** Attachment added: HookError_source_lxc.txt
https://bugs.launchpad.net/bugs/1196295/+attachment/3722563/+files/HookError_source_lxc.txt
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
apport information
** Attachment added: HookError_source_linux.txt
https://bugs.launchpad.net/bugs/1196295/+attachment/3722562/+files/HookError_source_linux.txt
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
Looks like apport was missing some module to gather what it wanted. Let
me know if this info would be valuable and I can re-run it.
** Changed in: lxc (Ubuntu)
Status: Incomplete = Confirmed
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is
Attaching a better apport file after installing the missing dependency.
I will hide the ones from earlier as this will contain the same data and
more.
** Attachment added: apport.lxc.txt
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1196295/+attachment/3722586/+files/apport.lxc.txt
--
Reproduced even with lxc-stop.
dmesg:
[178420.689704] unregister_netdevice: waiting for lo to become free. Usage
count = 1
[178430.919783] unregister_netdevice: waiting for lo to become free. Usage
count = 1
[178441.149854] unregister_netdevice: waiting for lo to become free. Usage
count = 1
This should help.
kern.log:
Jul 2 05:41:32 server1 kernel: [136565.201601] device vethbJ4JsM left
promiscuous mode
Jul 2 05:41:32 server1 kernel: [136565.201603] vmbr: port 5(vethbJ4JsM)
entered disabled state
Jul 2 05:41:38 server1 kernel: [136570.551496] vmbr: port 2(veth49SiBX)
entered
apport information
** Attachment added: HookError_generic.txt
https://bugs.launchpad.net/bugs/1196295/+attachment/3722561/+files/HookError_generic.txt
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
apport information
** Tags added: apport-collected
** Description changed:
After running and terminating around 6000 containers overnight,
something happened on my box that is affecting every new LXC container I
try to start. The DEBUG log file looks like:
lxc-start
apport information
** Attachment added: HookError_cloud_archive.txt
https://bugs.launchpad.net/bugs/1196295/+attachment/3722560/+files/HookError_cloud_archive.txt
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
apport information
** Attachment added: HookError_source_lxc.txt
https://bugs.launchpad.net/bugs/1196295/+attachment/3722563/+files/HookError_source_lxc.txt
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
apport information
** Attachment added: HookError_source_linux.txt
https://bugs.launchpad.net/bugs/1196295/+attachment/3722562/+files/HookError_source_linux.txt
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
apport information
** Attachment added: HookError_ubuntu.txt
https://bugs.launchpad.net/bugs/1196295/+attachment/3722564/+files/HookError_ubuntu.txt
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
Looks like apport was missing some module to gather what it wanted. Let
me know if this info would be valuable and I can re-run it.
** Changed in: lxc (Ubuntu)
Status: Incomplete = Confirmed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is
Attaching a better apport file after installing the missing dependency.
I will hide the ones from earlier as this will contain the same data and
more.
** Attachment added: apport.lxc.txt
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1196295/+attachment/3722586/+files/apport.lxc.txt
--
Public bug reported:
After running and terminating around 6000 containers overnight,
something happened on my box that is affecting every new LXC container I
try to start. The DEBUG log file looks like:
lxc-start 1372615570.399 WARN lxc_start - inherited fd 9
lxc-start
Also, in dmesg:
[54545.873460] unregister_netdevice: waiting for lo to become free. Usage count
= 1
[54556.103535] unregister_netdevice: waiting for lo to become free. Usage count
= 1
[54566.333609] unregister_netdevice: waiting for lo to become free. Usage count
= 1
[54576.563664]
Some basic environment details. I can post more if requested.
Ubuntu Server 13.04 64-bit
$ uname -r
3.8.0-25-generic
$ dpkg -l | grep lxc
ii liblxc00.9.0-0ubuntu3.3
amd64Linux Containers userspace tools (library)
ii lxc
Public bug reported:
After running and terminating around 6000 containers overnight,
something happened on my box that is affecting every new LXC container I
try to start. The DEBUG log file looks like:
lxc-start 1372615570.399 WARN lxc_start - inherited fd 9
lxc-start
Also, in dmesg:
[54545.873460] unregister_netdevice: waiting for lo to become free. Usage count
= 1
[54556.103535] unregister_netdevice: waiting for lo to become free. Usage count
= 1
[54566.333609] unregister_netdevice: waiting for lo to become free. Usage count
= 1
[54576.563664]
Some basic environment details. I can post more if requested.
Ubuntu Server 13.04 64-bit
$ uname -r
3.8.0-25-generic
$ dpkg -l | grep lxc
ii liblxc00.9.0-0ubuntu3.3
amd64Linux Containers userspace tools (library)
ii lxc
Hey Serge, let me know if that repro worked for you or when you're
planning to give it a try. I'm keeping the VM image around in case you
need it.
What's odd is that I can't even reproduce it with the daily ppa build,
which doesn't have the workaround which is in the ubuntu package.
Did you
Hey Serge, let me know if that repro worked for you or when you're
planning to give it a try. I'm keeping the VM image around in case you
need it.
What's odd is that I can't even reproduce it with the daily ppa build,
which doesn't have the workaround which is in the ubuntu package.
Did you
I can't try Saucy right now, but the repro instructions with kernel
versions are in the original post and in #2.
We've tried node v0.11.2 as well on Raring and got the repro.
Repro summary:
Install any of the above kernels, such as the one with the Raring installer,
then install lxc from apt.
Sure, run these inside the container:
git clone https://github.com/joyent/node.git --depth 1
cd node
./configure
make -j9
sudo make install
Then the binary will be at /usr/local/bin/node
It's v0.11.3-pre, but should still repro.
--
You received this bug notification because you are a member
I created a VM with Ubuntu Server 13.04 just for this bug. At first, I
was able to run the steps outlined above 50 times with no issues. What
was I missing? Concurrency! I rebooted the VM after adding 1 more core,
and... bingo! Zombies on the 3rd try.
The VM disk image I have here should be
I can't try Saucy right now, but the repro instructions with kernel
versions are in the original post and in #2.
We've tried node v0.11.2 as well on Raring and got the repro.
Repro summary:
Install any of the above kernels, such as the one with the Raring installer,
then install lxc from apt.
Sure, run these inside the container:
git clone https://github.com/joyent/node.git --depth 1
cd node
./configure
make -j9
sudo make install
Then the binary will be at /usr/local/bin/node
It's v0.11.3-pre, but should still repro.
--
You received this bug notification because you are a member
I created a VM with Ubuntu Server 13.04 just for this bug. At first, I
was able to run the steps outlined above 50 times with no issues. What
was I missing? Concurrency! I rebooted the VM after adding 1 more core,
and... bingo! Zombies on the 3rd try.
The VM disk image I have here should be
Hey Serge, were you able to get a reliable repro for this? I have a
reason to upgrade to Raring, and this seems to be the only blocker.
We've reproduced the issue with the stock Linux Mint 15.
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is
Hey Serge, were you able to get a reliable repro for this? I have a
reason to upgrade to Raring, and this seems to be the only blocker.
We've reproduced the issue with the stock Linux Mint 15.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to
I've also tried it with a C++ app very similar to yours and was unable
to repro. There is something about having node.js as the init process
running a process.exit(0); js. The init process (node v0.11.0) does
exit as ps faux shows it as a zombie and a child of lxc-start.
I went back to kernel
I've also tried it with a C++ app very similar to yours and was unable
to repro. There is something about having node.js as the init process
running a process.exit(0); js. The init process (node v0.11.0) does
exit as ps faux shows it as a zombie and a child of lxc-start.
I went back to kernel
Btw, that queueing mode would simply mean not calling epoll_wait until
the pid is available. This shouldn't require managing a queue ourselves.
Can you think of anything that this would break?
Or we could go with the patch you've written, although I haven't looked
into why the problem appears to
Btw, that queueing mode would simply mean not calling epoll_wait until
the pid is available. This shouldn't require managing a queue ourselves.
Can you think of anything that this would break?
Or we could go with the patch you've written, although I haven't looked
into why the problem appears to
I should add that these forwarded signal 2 lines are due to me
pressing Ctrl+C and are not actually relevant.
Have you been able to repro this bug on kernel 3.8.6?
I'm thinking how to fix this as lxc_spawn is what gets the pid which is
needed by lxc_poll to listen for SIGCHLD from the correct
I should add that these forwarded signal 2 lines are due to me
pressing Ctrl+C and are not actually relevant.
Have you been able to repro this bug on kernel 3.8.6?
I'm thinking how to fix this as lxc_spawn is what gets the pid which is
needed by lxc_poll to listen for SIGCHLD from the correct
Public bug reported:
For the purpose of the repro, my lxc init process is node.js v0.11.0
(built from source) with a single line:
process.exit(0);
When running it in lxc, sometimes lxc doesn't exit. lxc-start remains a
parent of a defunct node process without reaping it or exiting.
I've made a
Precisely which version of lxc were you using?
I just put back version 0.9.0-0ubuntu2 (as opposed to the 0.9.0 I built
from source) while on kernel 3.7.9-030709-generic and haven't yet run
into this issue (I assume that's the patch you mentioned). However, when
I update to kernel
Public bug reported:
For the purpose of the repro, my lxc init process is node.js v0.11.0
(built from source) with a single line:
process.exit(0);
When running it in lxc, sometimes lxc doesn't exit. lxc-start remains a
parent of a defunct node process without reaping it or exiting.
I've made a
Precisely which version of lxc were you using?
I just put back version 0.9.0-0ubuntu2 (as opposed to the 0.9.0 I built
from source) while on kernel 3.7.9-030709-generic and haven't yet run
into this issue (I assume that's the patch you mentioned). However, when
I update to kernel
54 matches
Mail list logo