[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-04-12 Thread Mike Malone
Since no one developed a reproducible test case it is, unfortunately, difficult to say whether this bug is resolved. We moved to a 2.6.35 series kernel and stopped seeing this particular problem (although we have seen other problems that are eerily similar). All of the information to reproduce the

[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-02-01 Thread Mike Malone
Matt, That's definitely in line with what we're seeing. Not to beat a dead horse, and I am not a kernel hacker (so I apologize for any naivety), etc... but that's basically what led me to believe the CLOCK_PROCESS_CPUTIME_ID timers, top issues, and other CPU time monitoring may be relevant here.

[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-02-01 Thread Mike Malone
Hey Matt, I ran it once while the system was idle: https://gist.github.com/fb35566354afc442bf2d And then again in a tight loop with a 50ms sleep while I locked the system up, in the hopes of catching something interesting just before or during the period when the system was locked: https://gist.g

[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-31 Thread Mike Malone
Starting irqbalance (without rebooting) on a node in a state where fork() will hang does not help. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior u

[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-30 Thread Mike Malone
Matt, Still no way to reproduce deterministically. I've just been running variations of the test I posted above, writing to /dev/null and setting timer signals. At some point the tests/system start hanging. irqbalance is not running on any of the test instances that are hanging (it appears to be

[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-29 Thread Mike Malone
Also, potentially related, the 2.6.32 kernels seem to sometimes drop timer signals for CLOCK_PROCESS_CPUTIME_ID and continue to report unusual process CPU times in system monitoring tools like top and via the proc filesystem. I can reproduce the signal behavior with this program: https://gist.githu

[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-29 Thread Mike Malone
Fat fingered that kernel version. Should be 2.6.32-311-ec2. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-

[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-29 Thread Mike Malone
I am able to reproduce this behavior on an instance running kernel version 2.3.32-311 using ami-f8f40591. As with the older kernel, new instances don't immediately exhibit symptoms. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https:

[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-28 Thread Mike Malone
We're still working on a way to repro this on a new instance. Meanwhile, we're moving forward with testing on the newer 2.6.32 kernel and on Maverick's 2.6.35 kernel. One observation we have made is that if we run the libctest in a loop (`while :; do ./libctest; done`) on 2.6.32 it will eventually

[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-28 Thread Mike Malone
We are running ami-fd4aa494 with 2.6.32-305-ec2 in us-east. I'll see what I can do about setting up a couple nodes with the more recent 2.6.32 kernel build and report back. We've already started running a few Maverick instances with 2.6.35-24-virtual, and so far they appear to be more stable. Unfo

[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-27 Thread Mike Malone
The node we were working on this morning was: vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Xeon(R) CPU X5550 @ 2.67GHz stepping: 5 cpu MHz : 2666.760 cache size : 8192 KB -- You received this bug notification b