Re: How to go about debuging a system lockup?

2006-11-17 Thread Krzysztof Halasa
"Jesper Juhl" <[EMAIL PROTECTED]> writes: > Or just try a few random older 2.6 kernels like 2.6.14, 2.6.9, > 2.6.whatever (of course it needs to be a version that git knows > about). One can also do "bisect" manually, works with all kernels. -- Krzysztof Halasa - To unsubscribe from this list:

Re: How to go about debuging a system lockup?

2006-11-17 Thread Stefan Richter
Lennart Sorensen wrote: > OK, I have now tried connecting with firescope to just follow the dmesg > buffer across firewire. Works great, until the system hangs, then > firescope reports that it couldn't perform the read. I wonder what part > of the system has to lock up for the firewire card to

Re: How to go about debuging a system lockup?

2006-11-17 Thread Lennart Sorensen
On Fri, Nov 17, 2006 at 09:29:28AM -0500, Lennart Sorensen wrote: > Wow, that looks really neat. I will have to go read up on that tool. OK, I have now tried connecting with firescope to just follow the dmesg buffer across firewire. Works great, until the system hangs, then firescope reports

Re: How to go about debuging a system lockup?

2006-11-17 Thread Lennart Sorensen
On Fri, Nov 17, 2006 at 02:43:36PM +0100, Stefan Richter wrote: > If the PCI bus itself isn't brought down, you could debug from remote > using Benjamin Herrenschmidt's Firescope on the remote node and a > FireWire card in the test machine. Once the ohci1394 driver was loaded, > the FireWire

Re: How to go about debuging a system lockup?

2006-11-17 Thread Stefan Richter
Lennart Sorensen wrote: > On Thu, Nov 16, 2006 at 04:01:03PM -0600, Protasevich, Natalie wrote: >> There are some port 80 cards that you can buy: ... > Hmm, one of those on the PCI bus might work. Or perhaps the parallel > port will. Of course if the problem is that somehow the PCI bus is >

Re: How to go about debuging a system lockup?

2006-11-17 Thread Stefan Richter
Lennart Sorensen wrote: On Thu, Nov 16, 2006 at 04:01:03PM -0600, Protasevich, Natalie wrote: There are some port 80 cards that you can buy: ... Hmm, one of those on the PCI bus might work. Or perhaps the parallel port will. Of course if the problem is that somehow the PCI bus is locked up,

Re: How to go about debuging a system lockup?

2006-11-17 Thread Lennart Sorensen
On Fri, Nov 17, 2006 at 02:43:36PM +0100, Stefan Richter wrote: If the PCI bus itself isn't brought down, you could debug from remote using Benjamin Herrenschmidt's Firescope on the remote node and a FireWire card in the test machine. Once the ohci1394 driver was loaded, the FireWire

Re: How to go about debuging a system lockup?

2006-11-17 Thread Lennart Sorensen
On Fri, Nov 17, 2006 at 09:29:28AM -0500, Lennart Sorensen wrote: Wow, that looks really neat. I will have to go read up on that tool. OK, I have now tried connecting with firescope to just follow the dmesg buffer across firewire. Works great, until the system hangs, then firescope reports

Re: How to go about debuging a system lockup?

2006-11-17 Thread Stefan Richter
Lennart Sorensen wrote: OK, I have now tried connecting with firescope to just follow the dmesg buffer across firewire. Works great, until the system hangs, then firescope reports that it couldn't perform the read. I wonder what part of the system has to lock up for the firewire card to no

Re: How to go about debuging a system lockup?

2006-11-17 Thread Krzysztof Halasa
Jesper Juhl [EMAIL PROTECTED] writes: Or just try a few random older 2.6 kernels like 2.6.14, 2.6.9, 2.6.whatever (of course it needs to be a version that git knows about). One can also do bisect manually, works with all kernels. -- Krzysztof Halasa - To unsubscribe from this list: send the

Re: How to go about debuging a system lockup?

2006-11-16 Thread Lennart Sorensen
On Thu, Nov 16, 2006 at 04:01:03PM -0600, Protasevich, Natalie wrote: > If you can't drop in kdb, or no sysreq, then your interrupts are > disabled. I used to be (with older systems anyway) that NMI button was > on the system, so one could send an NMI and make the handler to print a > trace. Newer

RE: How to go about debuging a system lockup?

2006-11-16 Thread Protasevich, Natalie
> I don't know of a good version yet. I so far don't know if there ever > was one. This could even be a bug in the PCI hardware, or the way the > BIOS on this system on a board configured the PCI controller. Maybe I > should go back and try a 2.4 kernel. > > > Hope some of that helps :) > >

Re: How to go about debuging a system lockup?

2006-11-16 Thread Jesper Juhl
On 16/11/06, Lennart Sorensen <[EMAIL PROTECTED]> wrote: On Thu, Nov 16, 2006 at 09:49:06PM +0100, Jesper Juhl wrote: ... > - You could also try kdb (http://oss.sgi.com/projects/kdb/) or kgdb > (http://kgdb.linsyssoft.com/). That might help you pinpoint the > failure. Can I run that remotely

Re: How to go about debuging a system lockup?

2006-11-16 Thread Lennart Sorensen
On Thu, Nov 16, 2006 at 09:49:06PM +0100, Jesper Juhl wrote: > Well, I have a few ideas that are hopefully useul. > > - If you have not done so already, then go in to the "Kernel Hacking" > section of the kernel configuration and enable some (all?) of the > debug options and see if that produces

Re: How to go about debuging a system lockup?

2006-11-16 Thread Jesper Juhl
On 16/11/06, Lennart Sorensen <[EMAIL PROTECTED]> wrote: We have a router with a Geode SC1200 cpu, with 4 AMD 972 ethernet ports (pcnet32) behind a PLX 6152 PCI-PCI bridge, which quite regularly locks up completely if we try to do simultanius traffic on all 4 ports (our test case sends data from

How to go about debuging a system lockup?

2006-11-16 Thread Lennart Sorensen
We have a router with a Geode SC1200 cpu, with 4 AMD 972 ethernet ports (pcnet32) behind a PLX 6152 PCI-PCI bridge, which quite regularly locks up completely if we try to do simultanius traffic on all 4 ports (our test case sends data from port 1 to port 2, and back and from port 3 to port 4 and

How to go about debuging a system lockup?

2006-11-16 Thread Lennart Sorensen
We have a router with a Geode SC1200 cpu, with 4 AMD 972 ethernet ports (pcnet32) behind a PLX 6152 PCI-PCI bridge, which quite regularly locks up completely if we try to do simultanius traffic on all 4 ports (our test case sends data from port 1 to port 2, and back and from port 3 to port 4 and

Re: How to go about debuging a system lockup?

2006-11-16 Thread Jesper Juhl
On 16/11/06, Lennart Sorensen [EMAIL PROTECTED] wrote: We have a router with a Geode SC1200 cpu, with 4 AMD 972 ethernet ports (pcnet32) behind a PLX 6152 PCI-PCI bridge, which quite regularly locks up completely if we try to do simultanius traffic on all 4 ports (our test case sends data from

Re: How to go about debuging a system lockup?

2006-11-16 Thread Lennart Sorensen
On Thu, Nov 16, 2006 at 09:49:06PM +0100, Jesper Juhl wrote: Well, I have a few ideas that are hopefully useul. - If you have not done so already, then go in to the Kernel Hacking section of the kernel configuration and enable some (all?) of the debug options and see if that produces

Re: How to go about debuging a system lockup?

2006-11-16 Thread Jesper Juhl
On 16/11/06, Lennart Sorensen [EMAIL PROTECTED] wrote: On Thu, Nov 16, 2006 at 09:49:06PM +0100, Jesper Juhl wrote: ... - You could also try kdb (http://oss.sgi.com/projects/kdb/) or kgdb (http://kgdb.linsyssoft.com/). That might help you pinpoint the failure. Can I run that remotely

RE: How to go about debuging a system lockup?

2006-11-16 Thread Protasevich, Natalie
I don't know of a good version yet. I so far don't know if there ever was one. This could even be a bug in the PCI hardware, or the way the BIOS on this system on a board configured the PCI controller. Maybe I should go back and try a 2.4 kernel. Hope some of that helps :) Well

Re: How to go about debuging a system lockup?

2006-11-16 Thread Lennart Sorensen
On Thu, Nov 16, 2006 at 04:01:03PM -0600, Protasevich, Natalie wrote: If you can't drop in kdb, or no sysreq, then your interrupts are disabled. I used to be (with older systems anyway) that NMI button was on the system, so one could send an NMI and make the handler to print a trace. Newer