[Bug 1178707] Re: Kernel Panics - ec2

2019-10-03 Thread Po-Hsu Lin
Closing this bug with Won't fix as this kernel / release is no longer supported. Please feel free to open a new bug report if you're still experiencing this on a newer release (Bionic 18.04.3 / Disco 19.04) Thanks! ** Changed in: linux (Ubuntu) Status: Confirmed => Won't Fix -- You recei

[Bug 1178707] Re: Kernel Panics - ec2

2014-05-10 Thread justin
Hi, We just completed an upgrade to precise across our instances and it looks like the issue is still persisting on kernel 3.2.0-61-virtual. Only have seen this on Amazon's m1.large instances so far. I've attached a new stack trace ** Attachment added: "precise_crash.txt" https://bugs.launchp

[Bug 1178707] Re: Kernel Panics - ec2

2013-07-30 Thread Stefan Bader
I am sorry, I unfortunately got distracted by trying to finish some feature for the next release. And I must admit right now I have no good idea how to proceed. The pages that got dumped at least to me show no pattern that points to a certain process. You might be in a better position there sinc

[Bug 1178707] Re: Kernel Panics - ec2

2013-07-29 Thread justin
We've had a few more panics but the page has been empty a few times it has printed it out. Is it helpful to post anymore traces or is there any other information that would be useful to gather for debugging? -- You received this bug notification because you are a member of Ubuntu Bugs, which is

[Bug 1178707] Re: Kernel Panics - ec2

2013-07-10 Thread Stefan Bader
Ah yeah. Well maybe it is not the only way to make it happen but one rather successful. I would really love to be able to find anything that allows me to reproduce the problem on a local host. So I grasp any straw that looks promising. -- You received this bug notification because you are a membe

[Bug 1178707] Re: Kernel Panics - ec2

2013-07-09 Thread justin
We're using https://github.com/ariya/phantomjs/tree/1.7 , the recent traces are just from machines that are running phantomJS, we have been seeing crashes on other servers without phantomjs but I only have the kernel you compiled for us running on those servers since they crash the most frequent -

[Bug 1178707] Re: Kernel Panics - ec2

2013-07-09 Thread Stefan Bader
So the first one did not show some immediately obvious hint. And I think the lockup of that was posted in comment #52 is a completely different issue (also wondering about the kernel version in there, is that a mainline kernel?). Anyway, that rather seems to be a bug which I thought we had a patch

[Bug 1178707] Re: Kernel Panics - ec2

2013-07-09 Thread justin
Just saw a crash on Kernel 3.2.46 Here's attached console output ** Attachment added: "kernel_3_2.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1178707/+attachment/3730533/+files/kernel_3_2.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which

[Bug 1178707] Re: Kernel Panics - ec2

2013-07-08 Thread justin
One crash from this weekend ** Attachment added: "linkworker01_bad_page_20130707.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1178707/+attachment/3729233/+files/linkworker01_bad_page_20130707.txt -- You received this bug notification because you are a member of Ubuntu Bugs, whic

[Bug 1178707] Re: Kernel Panics - ec2

2013-07-08 Thread justin
And another crash from this weekend. There was a third but the memory page that it dumped out contains some non-public information so I can't post it here ** Attachment added: "linkworker02_bad_page_20130706.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1178707/+attachment/3729234

[Bug 1178707] Re: Kernel Panics - ec2

2013-07-03 Thread justin
Yea that documentation is a part of paramiko which is imported in a shared python library that some code on this particular server uses (but does not make use of). PhantomJS is rolled on our own but it's also not installed or running on other instances where we've seen this issue. Hopefully (odd t

[Bug 1178707] Re: Kernel Panics - ec2

2013-07-03 Thread Stefan Bader
Hm, so that middle part looks a bit like Python documentation. Could it be part or a part of phantomjs? Btw, for Lucid/10.04, how is phantomjs obtained? At least it is not a separate package as of Precise/12.04 and later. I wonder whether any part of that (or something else which is added to the

[Bug 1178707] Re: Kernel Panics - ec2

2013-07-02 Thread justin
Ok got some more information now. ** Attachment added: "linkworker01_bad_page_20130702.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1178707/+attachment/3722260/+files/linkworker01_bad_page_20130702.txt -- You received this bug notification because you are a member of Ubuntu Bugs

[Bug 1178707] Re: Kernel Panics - ec2

2013-07-01 Thread Stefan Bader
Thanks and sorry, yeah the dump would be on the console if I had not messed up the conversion between the reported struct page and the memory I try to read from. So what you saw is basically the function trying to dump crashing because it accesses the wrong place. I hope I got it right this time an

[Bug 1178707] Re: Kernel Panics - ec2

2013-07-01 Thread justin
Is it supposed to dump the contents to the console? Had 2 crashes this weekend, attached are the stack traces but I don't really see anything different. ** Attachment added: "linkworker03_crash.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1178707/+attachment/3720432/+files/linkwo

[Bug 1178707] Re: Kernel Panics - ec2

2013-07-01 Thread justin
debug kernel stack trace ** Attachment added: "linkworker01_crash.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1178707/+attachment/3720433/+files/linkworker01_crash.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1178707] Re: Kernel Panics - ec2

2013-06-28 Thread Stefan Bader
That unfortunate news. Right now I can only think of a kind of desperate approach. I added a 64bit dbg1 kernel to the same location as from comment #41. That one hopefully (not really able to test it). If it works as expected it will dump the memory contents of the page that appears bad on the free

[Bug 1178707] Re: Kernel Panics - ec2

2013-06-25 Thread justin
Unfortunately just saw a panic on those newer kernels as well -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1178707 Title: Kernel Panics - ec2 To manage notifications about this bug go to: https://

[Bug 1178707] Re: Kernel Panics - ec2

2013-06-21 Thread Stefan Bader
Not really anything substantial, but recently there was a new upstream stable release for 2.6.32 which had some mm updates and also a few places claiming to fix memory leaks. As it is still unclear what causes the problems it would be good to install that updated kernel into at least one affected

[Bug 1178707] Re: Kernel Panics - ec2

2013-06-20 Thread justin
Also not sure if this is helpful, but here's an output of "sysctl -a"?field.comment=Also not sure if this is helpful, but here's an output of "sysctl -a" ** Attachment added: "sysctl.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1178707/+attachment/3708175/+files/sysctl.txt -- Yo

[Bug 1178707] Re: Kernel Panics - ec2

2013-06-19 Thread Stefan Bader
I tested this on a local Lucid PVM and tracing was available there. Maybe debugfs is not mounted by defaul on EC2? For 2.6.32 it was anon_vma_unlink. But probably does not matter that much which kernel. More to get a feeling how much relative activity processes do. I guess I need to do a bit more

[Bug 1178707] Re: Kernel Panics - ec2

2013-06-18 Thread justin
It doesn't look like dynamic ftracing is available in the 2.6.32 kernels we are running, only in the 3.x kernels. I assume you meant unlink_anon_vmas function? There's alot of output so it's really hard to discern much from it. We have phantomjs running on one of the servers experiencing the cra

[Bug 1178707] Re: Kernel Panics - ec2

2013-06-18 Thread Stefan Bader
The irqbalance problem on Xen.org sounds like the daemon crashing (which is not the case here). In the Redhat bug report it feels like people use crash when they mean hang. I remember there were some requests about backporting interrupt related patches. But due to the differences in the EC2 kernels

[Bug 1178707] Re: Kernel Panics - ec2

2013-06-18 Thread justin
Stefan, http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=430 , granted that is very old and https://bugzilla.redhat.com/show_bug.cgi?id=550724#c81 I also found this related bug that seems to be having similar crashes to ours reported by an Amazon engineer https://bugs.launchpad.net/ubuntu/

[Bug 1178707] Re: Kernel Panics - ec2

2013-06-18 Thread Stefan Bader
Maybe you got pointers to those reports about irgbalance? I not really sure what could be monitored to find more information. I went back and looked at all the bad page error messages and one thing that all of them seem to have in common is that there is a page->mapping set which has bit 0 set. And

[Bug 1178707] Re: Kernel Panics - ec2

2013-06-17 Thread justin
Yea I had read some bug reports about instability with irqbalance on Xen, but I'm just grasping at straws. The software and configurations are identical on the m1.large and m2.xlarge for this class of servers. Are there any particular values I could graph and start monitoring to see if network IO

[Bug 1178707] Re: Kernel Panics - ec2

2013-06-17 Thread Stefan Bader
Hard to imagine how dynamically pinning irqhandlers to certain cpu's would make a difference. But who knows. If the description of instance types is correct the main differences between the two instance types would be that m2 has more memory (7.5GB / 17.1 GB) but has only one 420GB virtual drive, w

[Bug 1178707] Re: Kernel Panics - ec2

2013-06-17 Thread justin
@Stefan, One interesting thing is we are seeing the crashes on m1.larges of a certain server type, but that same type running on m2.xlarge has not seen any crashes. Seeing same network and IO patterns in both cases but no crashes on the larger instance type. I disabled irqbalancer on one group

[Bug 1178707] Re: Kernel Panics - ec2

2013-05-23 Thread Stefan Bader
That "kernel BUG" probably does not mean that much. Given there seems to be at least one (but likely more) page on the free list which is not really released, this will result in more and more fallout. Is it possible to elaborate more on disk and network setup (at least anything that differers to a

[Bug 1178707] Re: Kernel Panics - ec2

2013-05-16 Thread justin
Here's another backtrace from today , this occurred on a c1.medium but the backtrace actually contained a mention of "kernel BUG" ** Attachment added: "i-deb173a1_20130516.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1178707/+attachment/3678351/+files/i-deb173a1_20130516.txt --

[Bug 1178707] Re: Kernel Panics - ec2

2013-05-15 Thread justin
One common trait these instances share are they are heavy on network IO. Instances of larger sizes with the same network I/O seem to be stable. Some have sustain bandwidth of 4MB/sec in/out with packet rates of up to 30k/sec -- You received this bug notification because you are a member of Ubuntu

[Bug 1178707] Re: Kernel Panics - ec2

2013-05-15 Thread Stefan Bader
Oh, right, I forgot that the version string came later. But since the symptom is distributed over such a variety of availability zones and even different instance types, it seems rather unlikely to be related to something on the host. Unfortunately the kernel messages that are seen only tell us so

[Bug 1178707] Re: Kernel Panics - ec2

2013-05-15 Thread justin
Adding an additional backtrace from this morning on Kernel 3 ** Attachment added: "kernel_3_20131505.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1178707/+attachment/3677408/+files/kernel_3_20131505.txt -- You received this bug notification because you are a member of Ubuntu Bug

[Bug 1178707] Re: Kernel Panics - ec2

2013-05-15 Thread justin
No NFS is involved. All the mounts are ephemeral storage. Instance types seemed to be isolated to m1.large and c1.xlarge so far. We have the same configuration running on m2.xlarge that we have for some m1.larges and have not seen crashes there (but I wouldn't rule out since we didnt start diggin

[Bug 1178707] Re: Kernel Panics - ec2

2013-05-15 Thread Stefan Bader
And just for references (I had the feeling there was something similar before) bug 1007082 has a comment #36 that claims this was there related to fsc on NFS. Is that involved here as well? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubunt

[Bug 1178707] Re: Kernel Panics - ec2

2013-05-15 Thread Stefan Bader
Looking at the dmesg snippets of the various kernels there seem to be multiple pages that have that bad page state. The locations seem random (maybe visualizing may yield some pattern). It happens the same with the ec2 kernel and the virtual flavour which actually are very different in the Xen c

[Bug 1178707] Re: Kernel Panics - ec2

2013-05-14 Thread justin
@Joseph is there any additional information I can provide to help the debugging? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1178707 Title: Kernel Panics - ec2 To manage notifications about this

[Bug 1178707] Re: Kernel Panics - ec2

2013-05-10 Thread justin
I've only tested this on 10.04 images. It would be a bit difficult to try on a newer release given software dependencies we have currently -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1178707 Title

[Bug 1178707] Re: Kernel Panics - ec2

2013-05-10 Thread Joseph Salisbury
Does this only happen on the 10.04 images? Have you also tested other releases? ** Changed in: linux (Ubuntu) Importance: Undecided => High ** Tags added: kernel-da-key -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs

[Bug 1178707] Re: Kernel Panics - ec2

2013-05-10 Thread justin
apport information ** Tags added: apport-collected ** Description changed: Kernel Versions affected: 2.6.32-346-ec2 #51-Ubuntu 2.6.32-309-ec2 #18-Ubuntu SMP 3.0.0-32-virtual #51~lucid1-Ubuntu

[Bug 1178707] Re: Kernel Panics - ec2

2013-05-10 Thread justin
The instances work load range from - an nginx proxy server, just proxies connections to different backends running in-memory database avg cpu: 30% - a server running inhouse in-memory database , taking connections from the nginx proxy servers avg cpu: 20% - queue worker servers avg cpu:

[Bug 1178707] Re: Kernel Panics - ec2

2013-05-10 Thread justin
** Attachment added: "console_output_kernel_3-b.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1178707/+attachment/3672123/+files/console_output_kernel_3-b.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs

[Bug 1178707] Re: Kernel Panics - ec2

2013-05-10 Thread justin
** Attachment added: "console_output_kernel_2_6_32_309.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1178707/+attachment/3672119/+files/console_output_kernel_2_6_32_309.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu

[Bug 1178707] Re: Kernel Panics - ec2

2013-05-10 Thread justin
** Attachment added: "console_output_kernel_3.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1178707/+attachment/3672116/+files/console_output_kernel_3.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.lau

[Bug 1178707] Re: Kernel Panics - ec2

2013-05-10 Thread justin
** Attachment added: "apport.linux-image-3.0.0-32-virtual.y8mP6P.apport" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1178707/+attachment/3672113/+files/apport.linux-image-3.0.0-32-virtual.y8mP6P.apport -- You received this bug notification because you are a member of Ubuntu Bugs, whi

[Bug 1178707] Re: Kernel Panics - ec2

2013-05-10 Thread justin
** Attachment added: "apport.linux-image-2.6.32-351-ec2.xiDdHy.apport" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1178707/+attachment/3672110/+files/apport.linux-image-2.6.32-351-ec2.xiDdHy.apport -- You received this bug notification because you are a member of Ubuntu Bugs, which i

[Bug 1178707] Re: Kernel Panics - ec2

2013-05-10 Thread justin
** Attachment added: "apport.linux-image-2.6.32-346-ec2.iIBV_c.apport" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1178707/+attachment/3672107/+files/apport.linux-image-2.6.32-346-ec2.iIBV_c.apport -- You received this bug notification because you are a member of Ubuntu Bugs, which i