Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-09 Thread Martin Wilck
Vivek Goyal wrote: > Did you also check IRR bits on LAPIC. May be some interrupt is already > being served and your new interrupts has been queued on LAPIC and IRR bit > on LAPIC is set? That's it. Whenever the IO-APIC IRR bit is set, I see the LAPIC IRR bit set, too. I never see any ISR bits

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-09 Thread Vivek Goyal
On Wed, Aug 08, 2007 at 08:15:32PM +0200, Martin Wilck wrote: > Vivek Goyal wrote: > > > But the issue here seems to be that LAPIC state got clear but IRR bit > > at IOAPIC bit is not cleared because IOAPIC vector information was deleted > > in first kernel and now upon receiving EOI, it does not

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-09 Thread Vivek Goyal
On Wed, Aug 08, 2007 at 08:15:32PM +0200, Martin Wilck wrote: Vivek Goyal wrote: But the issue here seems to be that LAPIC state got clear but IRR bit at IOAPIC bit is not cleared because IOAPIC vector information was deleted in first kernel and now upon receiving EOI, it does not know

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-09 Thread Martin Wilck
Vivek Goyal wrote: Did you also check IRR bits on LAPIC. May be some interrupt is already being served and your new interrupts has been queued on LAPIC and IRR bit on LAPIC is set? That's it. Whenever the IO-APIC IRR bit is set, I see the LAPIC IRR bit set, too. I never see any ISR bits set.

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Eric W. Biederman
Martin Wilck <[EMAIL PROTECTED]> writes: > Vivek Goyal wrote: > >> Got this oops while testing your patch when I did >> "echo c > /proc/sysrq-trigger" > > That's bad :-( > > ... >> Unable to handle kernel NULL pointer dereference at RIP: > [<>] >>[]

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Martin Wilck
Eric W. Biederman wrote: >>> irqpoll is working now. >> Yes. I'd give a lot to know what went wrong when I tried that in April. >> It'd have saved me many hours of work if I had discovered this workaround >> before. > > Yes that is odd. Got it. At that time I was using "noirqdebug" because we

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Martin Wilck
Eric W. Biederman wrote: >> I think a lot would be gained if disable_IO_APIC() would just mask the IRQs >> (like the function in my patch does), and perhaps fix the dest ID, instead of >> totally clearing the registers. > > Even masked we still won't see the EOI, because then we are in i8259 >

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Martin Wilck
Vivek Goyal wrote: > But the issue here seems to be that LAPIC state got clear but IRR bit > at IOAPIC bit is not cleared because IOAPIC vector information was deleted > in first kernel and now upon receiving EOI, it does not know this EOI belongs > to which vector. I am making another

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Martin Wilck
Vivek Goyal wrote: > Got this oops while testing your patch when I did > "echo c > /proc/sysrq-trigger" That's bad :-( ... > Unable to handle kernel NULL pointer dereference at RIP: [<>] >[] handle_edge_irq+0x5c/0x127 > [] do_IRQ+0xf1/0x15f > []

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Eric W. Biederman
Martin Wilck <[EMAIL PROTECTED]> writes: > Eric W. Biederman wrote: > >> Ok. Later in the thread it sounds like you have retried this and >> irqpoll is working now. > > Yes. I'd give a lot to know what went wrong when I tried that in April. > It'd have saved me many hours of work if I had

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Martin Wilck
Eric W. Biederman wrote: > Ok. Later in the thread it sounds like you have retried this and > irqpoll is working now. Yes. I'd give a lot to know what went wrong when I tried that in April. It'd have saved me many hours of work if I had discovered this workaround before. >>> Have you done any

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Eric W. Biederman
Martin Wilck <[EMAIL PROTECTED]> writes: > Hello Eric, > >> How bad is it if you just run with irqpoll in the kdump kernel? >> If running with irqpoll is usable that is probably preferable >> to putting in a hardware work around we can survive without. > > Yes, I tried that. No effect. Ok.

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Vivek Goyal
On Wed, Aug 08, 2007 at 10:06:13AM -0400, Chip Coldwell wrote: > On Wed, 8 Aug 2007, Vivek Goyal wrote: > > > On Tue, Aug 07, 2007 at 07:41:30PM +0200, Martin Wilck wrote: > > > > > > Can you explain how, on the front side bus, the IO-APIC knows whether > > > a CPU has accepted the INT message?

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Chip Coldwell
On Wed, 8 Aug 2007, Vivek Goyal wrote: > On Tue, Aug 07, 2007 at 07:41:30PM +0200, Martin Wilck wrote: > > > > Can you explain how, on the front side bus, the IO-APIC knows whether > > a CPU has accepted the INT message? There is no response > > to the INT message on the bus, except for the EOI

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Martin Wilck
Hi Vivek, >>> How bad is it if you just run with irqpoll in the kdump kernel? >>> If running with irqpoll is usable that is probably preferable >>> to putting in a hardware work around we can survive without. >> Yes, I tried that. No effect. >> > > Martin, at least irpoll should have worked. I

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Vivek Goyal
On Mon, Aug 06, 2007 at 05:08:05PM +0200, Martin Wilck wrote: > PATCH/RFC: [kdump] fix APIC shutdown sequence > > This patch fixes a problem that we have encountered > with kdump under high I/O load on some machines. > The machines showing the errors have an Intel ICH7 > chip

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Vivek Goyal
On Tue, Aug 07, 2007 at 07:41:30PM +0200, Martin Wilck wrote: [..] > > Such a situation has never been observed in the "good" case. > So, we do have some evidence, not just bare speculation. > > >> 2. The crashing CPU itself disables its local APIC > >>before the IO-APIC, leaving a short

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Vivek Goyal
On Wed, Aug 08, 2007 at 11:03:17AM +0200, Martin Wilck wrote: > Hello Eric, > > > How bad is it if you just run with irqpoll in the kdump kernel? > > If running with irqpoll is usable that is probably preferable > > to putting in a hardware work around we can survive without. > > Yes, I tried

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Martin Wilck
Hello Eric, > How bad is it if you just run with irqpoll in the kdump kernel? > If running with irqpoll is usable that is probably preferable > to putting in a hardware work around we can survive without. Yes, I tried that. No effect. > Have you done any looking at moving where the kernel

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Martin Wilck
Hello Andrew, thanks a lot for looking at this patch. > Please feed the diff through scripts/checkpatch.pl. It finds a lot of issues. I should have read the latest version of SubmittingPatches :-( I didn't expect you to pick up this patch so quickly, if I did I'd have cleaned it more

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Martin Wilck
Hello Andrew, thanks a lot for looking at this patch. Please feed the diff through scripts/checkpatch.pl. It finds a lot of issues. I should have read the latest version of SubmittingPatches :-( I didn't expect you to pick up this patch so quickly, if I did I'd have cleaned it more

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Martin Wilck
Hello Eric, How bad is it if you just run with irqpoll in the kdump kernel? If running with irqpoll is usable that is probably preferable to putting in a hardware work around we can survive without. Yes, I tried that. No effect. Have you done any looking at moving where the kernel

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Vivek Goyal
On Wed, Aug 08, 2007 at 11:03:17AM +0200, Martin Wilck wrote: Hello Eric, How bad is it if you just run with irqpoll in the kdump kernel? If running with irqpoll is usable that is probably preferable to putting in a hardware work around we can survive without. Yes, I tried that. No

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Vivek Goyal
On Tue, Aug 07, 2007 at 07:41:30PM +0200, Martin Wilck wrote: [..] Such a situation has never been observed in the good case. So, we do have some evidence, not just bare speculation. 2. The crashing CPU itself disables its local APIC before the IO-APIC, leaving a short time window

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Vivek Goyal
On Mon, Aug 06, 2007 at 05:08:05PM +0200, Martin Wilck wrote: PATCH/RFC: [kdump] fix APIC shutdown sequence This patch fixes a problem that we have encountered with kdump under high I/O load on some machines. The machines showing the errors have an Intel ICH7 chip set with a 6702PXH PCI

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Martin Wilck
Hi Vivek, How bad is it if you just run with irqpoll in the kdump kernel? If running with irqpoll is usable that is probably preferable to putting in a hardware work around we can survive without. Yes, I tried that. No effect. Martin, at least irpoll should have worked. I am assuming your

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Chip Coldwell
On Wed, 8 Aug 2007, Vivek Goyal wrote: On Tue, Aug 07, 2007 at 07:41:30PM +0200, Martin Wilck wrote: Can you explain how, on the front side bus, the IO-APIC knows whether a CPU has accepted the INT message? There is no response to the INT message on the bus, except for the EOI which

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Vivek Goyal
On Wed, Aug 08, 2007 at 10:06:13AM -0400, Chip Coldwell wrote: On Wed, 8 Aug 2007, Vivek Goyal wrote: On Tue, Aug 07, 2007 at 07:41:30PM +0200, Martin Wilck wrote: Can you explain how, on the front side bus, the IO-APIC knows whether a CPU has accepted the INT message? There is no

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Eric W. Biederman
Martin Wilck [EMAIL PROTECTED] writes: Hello Eric, How bad is it if you just run with irqpoll in the kdump kernel? If running with irqpoll is usable that is probably preferable to putting in a hardware work around we can survive without. Yes, I tried that. No effect. Ok. Later in the

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Martin Wilck
Eric W. Biederman wrote: Ok. Later in the thread it sounds like you have retried this and irqpoll is working now. Yes. I'd give a lot to know what went wrong when I tried that in April. It'd have saved me many hours of work if I had discovered this workaround before. Have you done any

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Eric W. Biederman
Martin Wilck [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Ok. Later in the thread it sounds like you have retried this and irqpoll is working now. Yes. I'd give a lot to know what went wrong when I tried that in April. It'd have saved me many hours of work if I had discovered this

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Martin Wilck
Vivek Goyal wrote: Got this oops while testing your patch when I did echo c /proc/sysrq-trigger That's bad :-( ... Unable to handle kernel NULL pointer dereference at RIP: [] IRQ [8025afd0] handle_edge_irq+0x5c/0x127 [8020e1ae]

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Martin Wilck
Vivek Goyal wrote: But the issue here seems to be that LAPIC state got clear but IRR bit at IOAPIC bit is not cleared because IOAPIC vector information was deleted in first kernel and now upon receiving EOI, it does not know this EOI belongs to which vector. I am making another experiment

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Martin Wilck
Eric W. Biederman wrote: I think a lot would be gained if disable_IO_APIC() would just mask the IRQs (like the function in my patch does), and perhaps fix the dest ID, instead of totally clearing the registers. Even masked we still won't see the EOI, because then we are in i8259 mode. So

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Martin Wilck
Eric W. Biederman wrote: irqpoll is working now. Yes. I'd give a lot to know what went wrong when I tried that in April. It'd have saved me many hours of work if I had discovered this workaround before. Yes that is odd. Got it. At that time I was using noirqdebug because we had a FW

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Eric W. Biederman
Martin Wilck [EMAIL PROTECTED] writes: Vivek Goyal wrote: Got this oops while testing your patch when I did echo c /proc/sysrq-trigger That's bad :-( ... Unable to handle kernel NULL pointer dereference at RIP: [] IRQ [8025afd0]

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-07 Thread Eric W. Biederman
A couple of questions. How bad is it if you just run with irqpoll in the kdump kernel? If running with irqpoll is usable that is probably preferable to putting in a hardware work around we can survive without. Have you done any looking at moving where the kernel initalizes io_apics? One of

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-07 Thread Andrew Morton
On Mon, 06 Aug 2007 17:08:05 +0200 Martin Wilck <[EMAIL PROTECTED]> wrote: > PATCH/RFC: [kdump] fix APIC shutdown sequence > > This patch fixes a problem that we have encountered > with kdump under high I/O load on some machines. > The machines showing the errors have an

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-07 Thread Chip Coldwell
On Tue, 7 Aug 2007, Vivek Goyal wrote: > On Mon, Aug 06, 2007 at 05:08:05PM +0200, Martin Wilck wrote: > > > 1. If, under SMP, the IO-APIC logical destination field is > >set by the IRQ balancing code to one of the "other" > >CPUs (i.e. not the crashing_cpu), and an IRQ arrives > >on

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-07 Thread Martin Wilck
Hello Vivek, thank you very much for looking at this problem, and for your comments. >> The error is caused by IRQs arriving while the APIC >> subsystem is deactivated in machine_crash_shutdown(). >> >> Apparently, the IO-APIC gets stuck if it sends an IRQ >> message to a Local APIC and never

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-07 Thread Vivek Goyal
On Mon, Aug 06, 2007 at 05:08:05PM +0200, Martin Wilck wrote: > PATCH/RFC: [kdump] fix APIC shutdown sequence > > This patch fixes a problem that we have encountered > with kdump under high I/O load on some machines. > The machines showing the errors have an Intel ICH7 > chip

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-07 Thread Vivek Goyal
On Mon, Aug 06, 2007 at 05:08:05PM +0200, Martin Wilck wrote: PATCH/RFC: [kdump] fix APIC shutdown sequence This patch fixes a problem that we have encountered with kdump under high I/O load on some machines. The machines showing the errors have an Intel ICH7 chip set with a 6702PXH PCI

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-07 Thread Martin Wilck
Hello Vivek, thank you very much for looking at this problem, and for your comments. The error is caused by IRQs arriving while the APIC subsystem is deactivated in machine_crash_shutdown(). Apparently, the IO-APIC gets stuck if it sends an IRQ message to a Local APIC and never receives an

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-07 Thread Chip Coldwell
On Tue, 7 Aug 2007, Vivek Goyal wrote: On Mon, Aug 06, 2007 at 05:08:05PM +0200, Martin Wilck wrote: 1. If, under SMP, the IO-APIC logical destination field is set by the IRQ balancing code to one of the other CPUs (i.e. not the crashing_cpu), and an IRQ arrives on the

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-07 Thread Andrew Morton
On Mon, 06 Aug 2007 17:08:05 +0200 Martin Wilck [EMAIL PROTECTED] wrote: PATCH/RFC: [kdump] fix APIC shutdown sequence This patch fixes a problem that we have encountered with kdump under high I/O load on some machines. The machines showing the errors have an Intel ICH7 chip set

Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-07 Thread Eric W. Biederman
A couple of questions. How bad is it if you just run with irqpoll in the kdump kernel? If running with irqpoll is usable that is probably preferable to putting in a hardware work around we can survive without. Have you done any looking at moving where the kernel initalizes io_apics? One of

PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-06 Thread Martin Wilck
PATCH/RFC: [kdump] fix APIC shutdown sequence This patch fixes a problem that we have encountered with kdump under high I/O load on some machines. The machines showing the errors have an Intel ICH7 chip set with a 6702PXH PCI Express-to-PCI Bridge (8086:032c) containing an IO-APIC. The bug

PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-06 Thread Martin Wilck
PATCH/RFC: [kdump] fix APIC shutdown sequence This patch fixes a problem that we have encountered with kdump under high I/O load on some machines. The machines showing the errors have an Intel ICH7 chip set with a 6702PXH PCI Express-to-PCI Bridge (8086:032c) containing an IO-APIC. The bug