Re: Possibly SATA related freeze killed networking and RAID

2007-12-10 Thread Thiemo Nagel
Hello, I think, I'm experiencing the same problem: 09:16:34 : NETDEV WATCHDOG: eth0: transmit timed out 09:16:34 : eth0: Got tx_timeout. irq: 09:16:34 : eth0: Ring at 37e5 09:16:34 : eth0: Dumping tx registers 09:16:34 : 0: 00ff 0003 025003ca

Re: Possibly SATA related freeze killed networking and RAID

2007-12-10 Thread noah
2007/11/21, noah <[EMAIL PROTECTED]>: > 2007/11/21, Alan Cox <[EMAIL PROTECTED]>: > > > I've had other freezes before but this was the first time I was able > > > to see what was actually going on. > > > IRQ 21 appears to be shared between sata_nv and ethernet. > > > > > > Does this mean my

Re: Possibly SATA related freeze killed networking and RAID

2007-12-10 Thread noah
2007/11/21, noah [EMAIL PROTECTED]: 2007/11/21, Alan Cox [EMAIL PROTECTED]: I've had other freezes before but this was the first time I was able to see what was actually going on. IRQ 21 appears to be shared between sata_nv and ethernet. Does this mean my hardware/BIOS is broken

Re: Possibly SATA related freeze killed networking and RAID

2007-12-10 Thread Thiemo Nagel
Hello, I think, I'm experiencing the same problem: 09:16:34 : NETDEV WATCHDOG: eth0: transmit timed out 09:16:34 : eth0: Got tx_timeout. irq: 09:16:34 : eth0: Ring at 37e5 09:16:34 : eth0: Dumping tx registers 09:16:34 : 0: 00ff 0003 025003ca

Re: Possibly SATA related freeze killed networking and RAID

2007-12-03 Thread Tejun Heo
Phillip Susi wrote: > Tejun Heo wrote: >> Surprise, surprise. There's no way to tell whether the controller >> raised interrupt or not if command is not in progress. As I said >> before, there's no IRQ pending bit. While processing commands, you can >> tell by looking at other status registers

Re: Possibly SATA related freeze killed networking and RAID

2007-12-03 Thread Phillip Susi
Tejun Heo wrote: Surprise, surprise. There's no way to tell whether the controller raised interrupt or not if command is not in progress. As I said before, there's no IRQ pending bit. While processing commands, you can tell by looking at other status registers but when there's nothing in

Re: Possibly SATA related freeze killed networking and RAID

2007-12-03 Thread Phillip Susi
Tejun Heo wrote: Surprise, surprise. There's no way to tell whether the controller raised interrupt or not if command is not in progress. As I said before, there's no IRQ pending bit. While processing commands, you can tell by looking at other status registers but when there's nothing in

Re: Possibly SATA related freeze killed networking and RAID

2007-12-03 Thread Tejun Heo
Phillip Susi wrote: Tejun Heo wrote: Surprise, surprise. There's no way to tell whether the controller raised interrupt or not if command is not in progress. As I said before, there's no IRQ pending bit. While processing commands, you can tell by looking at other status registers but when

Re: Possibly SATA related freeze killed networking and RAID

2007-11-30 Thread Tejun Heo
Phillip Susi wrote: > Tejun Heo wrote: >> Because SFF ATA controller don't have IRQ pending bit. You don't know >> whether IRQ is raised or not. Plus, accessing the status register which >> clears pending IRQ can be very slow on PATA machines. It has to go >> through the PCI and ATA bus and

Re: Possibly SATA related freeze killed networking and RAID

2007-11-30 Thread Pavel Machek
On Fri 2007-11-30 10:00:55, Mark Lord wrote: > Pavel Machek wrote: >> On Fri 2007-11-30 13:13:44, Alan Cox wrote: Why does a single spurious interrupt cause it to be shut down? I can >>> It doesn't. >>> see if the interrupt is stuck on and keeps interrupting constantly, but if

Re: Possibly SATA related freeze killed networking and RAID

2007-11-30 Thread Phillip Susi
Tejun Heo wrote: Because SFF ATA controller don't have IRQ pending bit. You don't know whether IRQ is raised or not. Plus, accessing the status register which clears pending IRQ can be very slow on PATA machines. It has to go through the PCI and ATA bus and come back. So, unconditionally

Re: Possibly SATA related freeze killed networking and RAID

2007-11-30 Thread Mark Lord
Pavel Machek wrote: On Fri 2007-11-30 13:13:44, Alan Cox wrote: Why does a single spurious interrupt cause it to be shut down? I can It doesn't. see if the interrupt is stuck on and keeps interrupting constantly, but if it's just the occasional spurious interrupt, why not just ignore it

Re: Possibly SATA related freeze killed networking and RAID

2007-11-30 Thread Pavel Machek
On Fri 2007-11-30 13:13:44, Alan Cox wrote: > > Why does a single spurious interrupt cause it to be shut down? I can > > It doesn't. > > > see if the interrupt is stuck on and keeps interrupting constantly, but > > if it's just the occasional spurious interrupt, why not just ignore it > >

Re: Possibly SATA related freeze killed networking and RAID

2007-11-30 Thread Alan Cox
> Why does a single spurious interrupt cause it to be shut down? I can It doesn't. > see if the interrupt is stuck on and keeps interrupting constantly, but > if it's just the occasional spurious interrupt, why not just ignore it > and move on? The interrupt is usually level triggered so it

Re: Possibly SATA related freeze killed networking and RAID

2007-11-30 Thread Tejun Heo
Phillip Susi wrote: Tejun Heo wrote: Because SFF ATA controller don't have IRQ pending bit. You don't know whether IRQ is raised or not. Plus, accessing the status register which clears pending IRQ can be very slow on PATA machines. It has to go through the PCI and ATA bus and come back.

Re: Possibly SATA related freeze killed networking and RAID

2007-11-30 Thread Pavel Machek
On Fri 2007-11-30 10:00:55, Mark Lord wrote: Pavel Machek wrote: On Fri 2007-11-30 13:13:44, Alan Cox wrote: Why does a single spurious interrupt cause it to be shut down? I can It doesn't. see if the interrupt is stuck on and keeps interrupting constantly, but if it's just the

Re: Possibly SATA related freeze killed networking and RAID

2007-11-30 Thread Phillip Susi
Tejun Heo wrote: Because SFF ATA controller don't have IRQ pending bit. You don't know whether IRQ is raised or not. Plus, accessing the status register which clears pending IRQ can be very slow on PATA machines. It has to go through the PCI and ATA bus and come back. So, unconditionally

Re: Possibly SATA related freeze killed networking and RAID

2007-11-30 Thread Mark Lord
Pavel Machek wrote: On Fri 2007-11-30 13:13:44, Alan Cox wrote: Why does a single spurious interrupt cause it to be shut down? I can It doesn't. see if the interrupt is stuck on and keeps interrupting constantly, but if it's just the occasional spurious interrupt, why not just ignore it

Re: Possibly SATA related freeze killed networking and RAID

2007-11-30 Thread Pavel Machek
On Fri 2007-11-30 13:13:44, Alan Cox wrote: Why does a single spurious interrupt cause it to be shut down? I can It doesn't. see if the interrupt is stuck on and keeps interrupting constantly, but if it's just the occasional spurious interrupt, why not just ignore it and move on?

Re: Possibly SATA related freeze killed networking and RAID

2007-11-30 Thread Alan Cox
Why does a single spurious interrupt cause it to be shut down? I can It doesn't. see if the interrupt is stuck on and keeps interrupting constantly, but if it's just the occasional spurious interrupt, why not just ignore it and move on? The interrupt is usually level triggered so it

Re: Possibly SATA related freeze killed networking and RAID

2007-11-29 Thread Robert Hancock
Phillip Susi wrote: Tejun Heo wrote: Agreed. Nobody cared on ATA controllers is usually very effective at taking the whole machine down. Is there any reason why we don't turn on irqpoll on turned off IRQs automatically? Why does a single spurious interrupt cause it to be shut down? I can

Re: Possibly SATA related freeze killed networking and RAID

2007-11-29 Thread Tejun Heo
Phillip Susi wrote: > Tejun Heo wrote: >> Agreed. Nobody cared on ATA controllers is usually very effective at >> taking the whole machine down. Is there any reason why we don't turn on >> irqpoll on turned off IRQs automatically? > > Why does a single spurious interrupt cause it to be shut

Re: Possibly SATA related freeze killed networking and RAID

2007-11-29 Thread Phillip Susi
Tejun Heo wrote: Agreed. Nobody cared on ATA controllers is usually very effective at taking the whole machine down. Is there any reason why we don't turn on irqpoll on turned off IRQs automatically? Why does a single spurious interrupt cause it to be shut down? I can see if the interrupt

Re: Possibly SATA related freeze killed networking and RAID

2007-11-29 Thread Phillip Susi
Tejun Heo wrote: Agreed. Nobody cared on ATA controllers is usually very effective at taking the whole machine down. Is there any reason why we don't turn on irqpoll on turned off IRQs automatically? Why does a single spurious interrupt cause it to be shut down? I can see if the interrupt

Re: Possibly SATA related freeze killed networking and RAID

2007-11-29 Thread Tejun Heo
Phillip Susi wrote: Tejun Heo wrote: Agreed. Nobody cared on ATA controllers is usually very effective at taking the whole machine down. Is there any reason why we don't turn on irqpoll on turned off IRQs automatically? Why does a single spurious interrupt cause it to be shut down? I can

Re: Possibly SATA related freeze killed networking and RAID

2007-11-29 Thread Robert Hancock
Phillip Susi wrote: Tejun Heo wrote: Agreed. Nobody cared on ATA controllers is usually very effective at taking the whole machine down. Is there any reason why we don't turn on irqpoll on turned off IRQs automatically? Why does a single spurious interrupt cause it to be shut down? I can

Re: Possibly SATA related freeze killed networking and RAID

2007-11-27 Thread Tejun Heo
Pavel Machek wrote: > Hi! > >>> kernel: [734344.717844] irq 21: nobody cared (try booting with the >>> "irqpoll" option) >>> kernel: [734344.717866] >> Your machine decided to emit interrupt 21 without an apparent reason. >> Whatever caused that made the kernel shut down IRQ 21 at which point

Re: Possibly SATA related freeze killed networking and RAID

2007-11-27 Thread Tejun Heo
Pavel Machek wrote: Hi! kernel: [734344.717844] irq 21: nobody cared (try booting with the irqpoll option) kernel: [734344.717866] Your machine decided to emit interrupt 21 without an apparent reason. Whatever caused that made the kernel shut down IRQ 21 at which point the disk drives

Re: Possibly SATA related freeze killed networking and RAID

2007-11-26 Thread Pavel Machek
Hi! > > kernel: [734344.717844] irq 21: nobody cared (try booting with the > > "irqpoll" option) > > kernel: [734344.717866] > > Your machine decided to emit interrupt 21 without an apparent reason. > Whatever caused that made the kernel shut down IRQ 21 at which point the > disk drives on

Re: Possibly SATA related freeze killed networking and RAID

2007-11-26 Thread Pavel Machek
Hi! kernel: [734344.717844] irq 21: nobody cared (try booting with the irqpoll option) kernel: [734344.717866] Your machine decided to emit interrupt 21 without an apparent reason. Whatever caused that made the kernel shut down IRQ 21 at which point the disk drives on that IRQ were

Re: Possibly SATA related freeze killed networking and RAID

2007-11-21 Thread noah
2007/11/21, Alan Cox <[EMAIL PROTECTED]>: > > I've had other freezes before but this was the first time I was able > > to see what was actually going on. > > IRQ 21 appears to be shared between sata_nv and ethernet. > > > > Does this mean my hardware/BIOS is broken somehow? > > Not neccessarily.

Re: Possibly SATA related freeze killed networking and RAID

2007-11-21 Thread noah
2007/11/21, Alan Cox [EMAIL PROTECTED]: I've had other freezes before but this was the first time I was able to see what was actually going on. IRQ 21 appears to be shared between sata_nv and ethernet. Does this mean my hardware/BIOS is broken somehow? Not neccessarily. It could a bug

Re: Possibly SATA related freeze killed networking and RAID

2007-11-20 Thread Alan Cox
> I've had other freezes before but this was the first time I was able > to see what was actually going on. > IRQ 21 appears to be shared between sata_nv and ethernet. > > Does this mean my hardware/BIOS is broken somehow? Not neccessarily. It could a bug in one of the drivers using IRQ 21

Re: Possibly SATA related freeze killed networking and RAID

2007-11-20 Thread noah
2007/11/20, Alan Cox <[EMAIL PROTECTED]>: > > kernel: [734344.717844] irq 21: nobody cared (try booting with the > > "irqpoll" option) > > kernel: [734344.717866] > > Your machine decided to emit interrupt 21 without an apparent reason. > Whatever caused that made the kernel shut down IRQ 21 at

Re: Possibly SATA related freeze killed networking and RAID

2007-11-20 Thread Alan Cox
> kernel: [734344.717844] irq 21: nobody cared (try booting with the > "irqpoll" option) > kernel: [734344.717866] Your machine decided to emit interrupt 21 without an apparent reason. Whatever caused that made the kernel shut down IRQ 21 at which point the disk drives on that IRQ were no

Possibly SATA related freeze killed networking and RAID

2007-11-20 Thread noah
I just had a strange freeze that killed networking and made software RAID fail two of my harddisks. There are a bunch of messages from the kernel which I extracted from the system log after reboot at the end of this mail. I hit power off in pure paranoia after the box froze, and then started to

Possibly SATA related freeze killed networking and RAID

2007-11-20 Thread noah
I just had a strange freeze that killed networking and made software RAID fail two of my harddisks. There are a bunch of messages from the kernel which I extracted from the system log after reboot at the end of this mail. I hit power off in pure paranoia after the box froze, and then started to

Re: Possibly SATA related freeze killed networking and RAID

2007-11-20 Thread Alan Cox
kernel: [734344.717844] irq 21: nobody cared (try booting with the irqpoll option) kernel: [734344.717866] Your machine decided to emit interrupt 21 without an apparent reason. Whatever caused that made the kernel shut down IRQ 21 at which point the disk drives on that IRQ were no longer

Re: Possibly SATA related freeze killed networking and RAID

2007-11-20 Thread noah
2007/11/20, Alan Cox [EMAIL PROTECTED]: kernel: [734344.717844] irq 21: nobody cared (try booting with the irqpoll option) kernel: [734344.717866] Your machine decided to emit interrupt 21 without an apparent reason. Whatever caused that made the kernel shut down IRQ 21 at which point

Re: Possibly SATA related freeze killed networking and RAID

2007-11-20 Thread Alan Cox
I've had other freezes before but this was the first time I was able to see what was actually going on. IRQ 21 appears to be shared between sata_nv and ethernet. Does this mean my hardware/BIOS is broken somehow? Not neccessarily. It could a bug in one of the drivers using IRQ 21 (sata_nv