new ticket opened at
https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/889012
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/708920
Title:
Strange 'fork/clone' blocking behavior under
Reading back through the previous comments I think this should be
handled a a separate issue. The original issue caused processes not to
be forked while I understand this seems to be rather load/system time
related but not as fatal as the main issue of this report. Could you
please open a new ticke
I want to report that the CPU time bug is still present in the following
kernel
2.6.32-317-ec2 on AWS instance m1.large
with CPU
model name : Intel(R) Xeon(R) CPU E5507 @ 2.27GHz
I logged the output of ps every minute and I can now show the bug
happening, as you can see the CPU
Based on the latest comments (unfortunately this bug was not referenced
when committing the change), I am closing this bug as fixed. If anybody
thinks it is still a problem, feel free to re-open it (or a new report).
Thanks.
** Changed in: linux-ec2 (Ubuntu)
Status: New => Fix Released
--
FWIW, I upgraded to 2.6.32-316-ec2 #31 on Ubuntu 10.04 running on an
E5507 and I have not run into this issue on any of the boxes I'm running
since the upgrade.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to the bug report.
https://bugs.launc
Ah, good to know (#62). Next time we'll use the new instances.
By the way, it's running non-stop since last week.
Great job!
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920
Title:
Strange 'fork/
Jan, hopefully we got the fix for this in the next official update. It
currently is waiting to clear proposed but was delayed by findings in some of
the changes used by all kernels.
But thanks for testing the preview kernel. That is build from the bigger set of
changes which I would like to merg
Hi Stefan,
we experienced the kernel lockups under heavy load on Ubuntu 10.04
running the standard 2.6.32-314-ec2 kernel from the ami-3202f25b (us-
east-1). The underlying hardware is a Intel(R) Xeon(R) CPU E5507 @
2.27GHz w/ 4MB cache.
I have installed the linux-
image-2.6.32-317-ec2_2.6.32-317.
Eduardo,
Just fyi, any recent (this year) lucid image will already use the pv-grub
kernels by default.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920
Title:
Strange 'fork/clone' blocking beh
Nevermind, I found out how to do that, i.e. launch new instances using AMI
(ami-40db2229) with matching AKI (aki-427d952b).
It's all running now... hopeful that it won't hang anymore.
Thank you!
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subs
Hey, Stefan,
We're having instances hanging throughout the day
(https://forums.aws.amazon.com/thread.jspa?threadID=68692) and it looks
like it may be related to this.
Looking at your comment (#53) I wanted to apply the changes you
mentioned, but I need some help on how to install them. I'm famili
Which would be exactly the bug Robert was pointing to. Hrm, it sounds
more and more sensible to pick fix up sooner than later as the full
changeset is harder to get approved.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.
Robert,
Can you confirm what CPU is running on that node? The Librato guys have
apparently narrowed it down to only happening on the Intel E5507. See
this blog post for more:
https://silverline.librato.com/blog/main/EC2_Users_Should_be_Cautious_When_Booting_Ubuntu_10_04_AMIs
I checked one of our
Hi Robert,
thanks for the update and really good to hear. The bug you mentioned
might be one of the possible reasons. There also have been changes to
change timer and ipi interrupts to be per-cpu and the rest to use
fasteoi instead of level (moving around masking code as well). Having
irqs use the
Hi Stefan,
Just an update. That same node has run flawlessly under high load since
my last post, no lockups. What are your thoughts on the claim that this
bug https://bugs.launchpad.net/ubuntu/+source/linux/+bug/727459 is the
cause of the problem?
B.
--
You received this bug notification becaus
Robert, that really sounds like good news. To answer the question what exactly
improved things could get a bit difficult. It also depends a bit what exactly
the last stock ec2 kernel was you are comparing against. I had been doing a
first round of changes which Jeremy has been testing (from what
Stefan,
I've been testing your kernels and stability seems much improved. It's
only been a few days but the server has been under quite a lot of load
without an issue, with the stock ec2 kernel this was definitely not the
case.
Cautious optimism, but I'd love to isolate which of the patches was t
This is a general request for some testing help for Ubuntu 10.04 (Lucid) on
ec2. If there are other problem reports about strange process cpu times or
hangs without any message it would be good to pass this on.
Trying to pull in a wide range of upstream fixes, I have prepared linux-ec2
packages
Since no one developed a reproducible test case it is, unfortunately,
difficult to say whether this bug is resolved. We moved to a 2.6.35
series kernel and stopped seeing this particular problem (although we
have seen other problems that are eerily similar). All of the
information to reproduce the
The main problem here is the question whether there still is a problem.
There has been a first round of cleanup patches going into the ec2
kernels with 2.6.32-314.27. The last update/response I got was from
Jeremy and he seemed at least not to see the blocking behavior with
those. So as long as the
This ticket has been idle for a month, anyone have a resolution yet? :(
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920
Title:
Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Stefan,
The test machine did not exhibit the fork/clone behavior in this bug. The
wrong cpu times were pretty bad though and perhaps unrelated to this bug. The
experiment lasted 3 days on a loaded 4096meg cache large instance. That
*should have been long enough but with this bug it's tough to
Hi Jeremy, I have not heard back, so I am wondering whether the
hang/crashing occurred after all?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920
Title:
Strange 'fork/clone' blocking behavior un
Stefan,
I have some boxes that seem to reproduce this behaviour rather frequently.
This is great news since this bug is so hard to reproduce.
I just launched with the new testing kernels you provided (aki-9ab546f3
x86_64) on a server that has 4096meg cache (the bad behaving size).
Unfortunatel
Well maybe the kernel I posted. Kind of hard to say without anybody
testing it. ;-) The problem is that Lucid and Maverick kernels for EC2
are very different approaches. Maverick already has enough Xen code in
the mainline kernel that only the configuration is different to the
normal kernel. Lucid
Same as Alex - we were having this issue with our production servers on
2.6.32-309-ec2 kernel.
After upgrading to Maverick (2.6.35-24-virtual) issue is gone. So far (5 days)
we haven't met this bug.
I would also know about kernel version that doesn't have this issue anymore.
--
You received thi
We are now seeing this on a whole bunch of production machines all
running 10.04, 2.6.32-312-ec2. Has anyone confirmed a kernel (old or
new) that doesn't have this bug?
We've just upgraded one machine to 10.10 to see if it fixes the issue.
We haven't tried Stefan's kernels, but may if 10.10 doesn
Hi guys,
I've been following this bug very closely and I just had it happen to me
on an Instance running Maverick. Kernel version was 2.6.35-22-virtual
and I realize a newer -25- is out. Will see if I can repro with that as
well. This was a classic example of the bug, it happened during a large
While there is not yet a patch that I would suspect of being the one
fixing this issue, there were a larger number of changes that had been
done upstream but missed the Lucid ec2 kernel because it is using xen
specific copies of x86 files. Some of them will have no effect as
compile options cause t
I tried the approach from comment #30 and also comment #37 with ami-
fa01f193 in us-east. This runs a 2.6.32-312.24 kernel currently in
proposed and was running on hw that showed 6M cache in cpuinfo. Both
methods did not cause the scheduler bug for me.
I will have a look at the source code and see
The commit mentioned in comment #39 is included in the
Ubuntu-2.6.32-311.23 EC2 kernel.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920
Title:
Strange 'fork/clone' blocking behavior under high c
Mike,
You bring up a good point about CFS' need for good process time
accounting. I think that this upstream patch may fix a lot of problems:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8a22b9996b001c88f2bfb54c6de6a05fc39e177a
This patch is in 2.6.34.7, may not
Matt,
That's definitely in line with what we're seeing. Not to beat a dead
horse, and I am not a kernel hacker (so I apologize for any naivety),
etc... but that's basically what led me to believe the
CLOCK_PROCESS_CPUTIME_ID timers, top issues, and other CPU time
monitoring may be relevant here.
I've done a lot of looking at this today. It feels like the problem may
lie in the process scheduler. When I pin the CPU burning process to CPU0
(through "taskset -pc 0 $pid_printed_by_a_out"), and pin a bash shell
also to CPU0, I see failure of the bash process to wake after sleeping
(i.e., it's r
Hey Matt,
I ran it once while the system was idle:
https://gist.github.com/fb35566354afc442bf2d
And then again in a tight loop with a 50ms sleep while I locked the
system up, in the hopes of catching something interesting just before or
during the period when the system was locked:
https://gist.g
If anyone has a machine that they can get into the hanging state (with
fork() blocking), can you run run "echo w > /proc/sysrq-trigger" as root
and post the results?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad
The first one I rebuilt, as I thought it might have been related to a
dud EBS volume, but the current one is i-55a92d39. Mail me at
a...@swapoff.org if you want to take this offline.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https
Alec,
Do you have instance IDs from your hanging instances?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920
Title:
Strange 'fork/clone' blocking behavior under high cpu usage on EC2
--
ubuntu
@Matt,
No I don't see anything particularly unusual in the logs at all :\
Also, as suggested by Gavin, I rebooted my EC2 instance until I got a
machine with CPU model 23 and have not had a repeat of the issue.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which
Jordan,
Do you see this behavior at boot, or only after your instance has been
up and running for a while?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920
Title:
Strange 'fork/clone' blocking b
I see this in a particularly annoying way.
apt-get when run from another tool (like puppet) will have /dev/null for
stdin. apt-get foolishly select(2)s on stdin which results in 100% cpu
usage (stdin is always ready when it is /dev/null). This 100% cpu
qualifies for the fork-blocking, and since ap
Starting irqbalance (without rebooting) on a node in a state where
fork() will hang does not help.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920
Title:
Strange 'fork/clone' blocking behavior u
Alec,
Do any hung task kernel stack traces get emitted during your hangs?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920
Title:
Strange 'fork/clone' blocking behavior under high cpu usage on E
I am running Eclipse on EC2 over NX and this bug surfaces quite quickly,
in the form of total system hangs for .5 to 5 minutes at a time.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920
Title:
S
If you have an instance in a state where fork() will hang if you spin a
CPU, it would be a good experiment to see if irqbalance helps at all.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920
Title:
Matt,
Still no way to reproduce deterministically. I've just been running
variations of the test I posted above, writing to /dev/null and setting
timer signals. At some point the tests/system start hanging.
irqbalance is not running on any of the test instances that are hanging
(it appears to be
Hi Mike,
Let's focus on the fork() hangs in this bug. It's true that the two
could be related, but the symptoms don't quite line up.
You say you can reproduce the behavior on 2.3.32-311. Do you have a
procedure for getting an instance into the broken state, so you can then
cause fork() hangs with
Also, potentially related, the 2.6.32 kernels seem to sometimes drop
timer signals for CLOCK_PROCESS_CPUTIME_ID and continue to report
unusual process CPU times in system monitoring tools like top and via
the proc filesystem. I can reproduce the signal behavior with this
program: https://gist.githu
Fat fingered that kernel version. Should be 2.6.32-311-ec2.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920
Title:
Strange 'fork/clone' blocking behavior under high cpu usage on EC2
--
ubuntu-
I am able to reproduce this behavior on an instance running kernel
version 2.3.32-311 using ami-f8f40591. As with the older kernel, new
instances don't immediately exhibit symptoms.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https:
We're still working on a way to repro this on a new instance. Meanwhile,
we're moving forward with testing on the newer 2.6.32 kernel and on
Maverick's 2.6.35 kernel.
One observation we have made is that if we run the libctest in a loop
(`while :; do ./libctest; done`) on 2.6.32 it will eventually
We've booted up a couple instances using the new kernel and are
attempting to break them. As Mike pointed out, it is hard for us to
prove a negative; this is something that we can reliably reproduce once
we get the machine to do it once, but before that it is inconsistent.
--
You received this bu
We are running ami-fd4aa494 with 2.6.32-305-ec2 in us-east. I'll see
what I can do about setting up a couple nodes with the more recent
2.6.32 kernel build and report back.
We've already started running a few Maverick instances with
2.6.35-24-virtual, and so far they appear to be more stable.
Unfo
ami-fd4aa494 is one of our official images: us-east-1 ubuntu-
lucid-10.04-amd64-server-20100427.1
That said, its very old. The build there (unless changed) is running kernel
linux-image-2.6.32-305-ec2 2.6.32-305.9
The most recent lucid amd64 is:
ami-f8f40591 ubuntu-lucid-10.04-amd64-server-20
Can someone confirm whether this is an issue with a recent version of he
kernel. Unsuccessfully tried the sample program with 2.6.32-311-ec2.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920
Title:
Mike, can you click on the "affects me" for this bug?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920
Title:
Strange 'fork/clone' blocking behavior under high cpu usage on EC2
--
ubuntu-bugs m
The node we were working on this morning was:
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5550 @ 2.67GHz
stepping: 5
cpu MHz : 2666.760
cache size : 8192 KB
--
You received this bug notification b
Gavin,
Can you reproduce the issue at will? I'm struggling to find a way to
reproduce the issue on a freshly booted instance.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920
Title:
Strange 'for
Bad architecture:
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU E5507 @ 2.27GHz
stepping: 5
cpu MHz : 2266.746
cache size : 4096 KB
Good architecture:
vendor_id : GenuineIntel
cpu family :
Correction, we see the problem with the 4M cache CPUs.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920
Title:
Strange 'fork/clone' blocking behavior under high cpu usage on EC2
--
ubuntu-bugs
I work at a company that has had very similar issues.
One of the suspected theories (because this seems to strike us even
within the JVM where we aren't forking) is that it's a problem between
the hypervisor and the hardware.
What kind of hardware were you running when you found this problem? We
The following kernel stack was captured on a system in "fork() hangs"
state via "echo t > /proc/sysrq-trigger". The code for libctest is here:
https://gist.github.com/2d2b78987ea451c2edd6
<6>[853486.204130] libctest R running task0 13658 1417
0x
<4>[853486.204132] fff
Some discussions on this are at http://twitter.com/#!/mjmalone
Video posted by
http://twitter.com/#!/jordansissel/status/30421571315175425
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920
Title:
Argh never mind, had my alphabet backwards. It was a 10.04 LTS aka Lucid
node.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920
Title:
Strange 'fork/clone' blocking behavior under high cpu usage
Quick note: it is not just Lucid; seeing similar behavior in Karmic. In
fact, Matt, the node we looked at today was a Karmic node.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920
Title:
Strange
Attaching /proc/slabinfo from a system that can be used to cause fork()
hangs.
** Attachment added: "/proc/slabinfo from a sick instance"
https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/708920/+attachment/1811279/+files/slabinfo.txt
--
You received this bug notification because you
** Visibility changed to: Public
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920
Title:
Strange 'fork/clone' blocking behavior under high cpu usage on EC2
--
ubuntu-bugs mailing list
ubuntu-bu
It seems that this reproduction case only happens after the system has
been used for some unknown amount of time. At that point, fork() hangs
can be triggered at will. If the instance is rebooted, the test case no
longer causes hangs.
--
You received this bug notification because you are a member
On a system in this condition, sometimes hung task traces are seen:
kernel: [65098.694112] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
kernel: [65098.694117] cron D 880001885380 0 21248 569 0x
kernel: [65098.694121] 880772e25d20 0282
This is a transcription of the test program from the youtube video:
#include
#include
#include
int main(int argc, char **argv) {
int children = 0;
int status;
int i = 0;
if (argc < 2) {
printf("Usage: %s https://bugs.launchpad.net/bugs/708920
Title:
Strange 'fork/clone' blocki
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/708920
Title:
Strange 'fork/clone' blocking behavior under high cpu usage on EC2
--
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lis
71 matches
Mail list logo