I've done a lot of looking at this today. It feels like the problem may
lie in the process scheduler. When I pin the CPU burning process to CPU0
(through "taskset -pc 0 $pid_printed_by_a_out"), and pin a bash shell
also to CPU0, I see failure of the bash process to wake after sleeping
(i.e., it's runnable, but CFS isn't giving it time). I've seen the bash
process start to be scheduled after around 3 minutes, and I've also seen
it just sit there.

Every time I've seen a scheduler debug trace (triggered via "echo w >
/proc/sysrq-trigger"), there have been other runnable processes on the
spinning CPU that don't seem to be getting scheduled at all.

I've not been able to reproduce this problem on the kernel used in the
Amazon Linux AMI (currently 2.6.34.7). This is in line with other user's
observations (http://twitter.com/#!/synack/status/30415380321140737).

I think that Canonical might need to look into what (if any) changes
they've made to CFS in the 10.04 kernel tree. It's also possible that
improvements have been made in CFS between 2.6.32 and 2.6.34 that
account for better performance.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to