Re: [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-10 Thread Jarek Poplawski
On Fri, Aug 10, 2007 at 12:43:43PM +0200, Marcin Ślusarz wrote: > 2007/8/10, Jarek Poplawski <[EMAIL PROTECTED]>: > > (..) > > I think, there is this one possible for your testing yet?: > > Subject: [patch] genirq: temporary fix for level-triggered IRQ resend > > Date: Wed, 8 Aug 2007 13:00:37 +020

Re: [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-10 Thread Marcin Ślusarz
2007/8/10, Jarek Poplawski <[EMAIL PROTECTED]>: > (..) > I think, there is this one possible for your testing yet?: > Subject: [patch] genirq: temporary fix for level-triggered IRQ resend > Date: Wed, 8 Aug 2007 13:00:37 +0200 I think I already tested this patch, but this thread is sooo big and I c

Re: [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-10 Thread Ingo Molnar
* Jarek Poplawski <[EMAIL PROTECTED]> wrote: > All correct! There was also checked a possibility it can be not hw > itself, but wrong way of handling after hw (acking too late). This was > false idea (or bad implementation), so it looks like hw vs lapic > problem. i think the problem is that

Re: [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-10 Thread Jarek Poplawski
On Fri, Aug 10, 2007 at 11:08:33AM +0200, Ingo Molnar wrote: > > * Jarek Poplawski <[EMAIL PROTECTED]> wrote: > > > On 10-08-2007 10:05, Thomas Gleixner wrote: > > ... > > > But suppressing the resend is not fixing the driver problem. The > > > problem can show up with spurious interrupts and wi

Re: [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-10 Thread Ingo Molnar
* Jarek Poplawski <[EMAIL PROTECTED]> wrote: > On 10-08-2007 10:05, Thomas Gleixner wrote: > ... > > But suppressing the resend is not fixing the driver problem. The > > problem can show up with spurious interrupts and with interrupts on > > a shared PCI interrupt line at any time. It just migh

Re: [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-10 Thread Jarek Poplawski
On Fri, Aug 10, 2007 at 10:48:41AM +0200, Ingo Molnar wrote: > > * Jarek Poplawski <[EMAIL PROTECTED]> wrote: > > > On Fri, Aug 10, 2007 at 10:15:53AM +0200, Jean-Baptiste Vignaud wrote: > > ... > > > I was still testing on -rc2: > > > Subject: [patch] genirq: temporary fix for level-triggered IR

Re: [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-10 Thread Ingo Molnar
* Jarek Poplawski <[EMAIL PROTECTED]> wrote: > On Fri, Aug 10, 2007 at 10:15:53AM +0200, Jean-Baptiste Vignaud wrote: > ... > > I was still testing on -rc2: > > Subject: [patch] genirq: temporary fix for level-triggered IRQ resend > > Date: Wed, 8 Aug 2007 13:00:37 +0200 > > > > For me after 1da

Re: [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-10 Thread Jean-Baptiste Vignaud
> For me it's enough too but Thomas seems to doubt. > > You've written earlier that you've 2.6.23-rc1 with HARDIRQS_SW_RESEND > prepared too. So, if this is not a great problem maybe you could try > this first. Tomorrow Thomas may send something, so this 100HZ could > wait yet, I hope? Ok, i'll t

Re: [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-10 Thread Jarek Poplawski
On Fri, Aug 10, 2007 at 10:15:53AM +0200, Jean-Baptiste Vignaud wrote: ... > I was still testing on -rc2: > Subject: [patch] genirq: temporary fix for level-triggered IRQ resend > Date: Wed, 8 Aug 2007 13:00:37 +0200 > > For me after 1day 20hours, the network is still up, with more than 1To > of n

Re: [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-10 Thread Jean-Baptiste Vignaud
> So, we still have to wait for the exact explanation... > > Thanks very much Marcin! > > I think, there is this one possible for your testing yet?: > Subject: [patch] genirq: temporary fix for level-triggered IRQ resend > Date: Wed, 8 Aug 2007 13:00:37 +0200 > > If it's not a great problem it w

Re: [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-10 Thread Jarek Poplawski
On Fri, Aug 10, 2007 at 08:33:27AM +0200, Marcin Ślusarz wrote: > 2007/8/9, Jarek Poplawski <[EMAIL PROTECTED]>: ... > > diff -Nurp 2.6.23-rc1-/kernel/irq/chip.c 2.6.23-rc1/kernel/irq/chip.c > > --- 2.6.23-rc1-/kernel/irq/chip.c 2007-07-09 01:32:17.0 +0200 > > +++ 2.6.23-rc1/kernel/ir

Re: [RFC] Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-09 Thread Jarek Poplawski
On Thu, Aug 09, 2007 at 06:04:34PM +0200, Andi Kleen wrote: > Jarek Poplawski <[EMAIL PROTECTED]> writes: > > > It seems, we can start to think about some preferred solutions, > > already. Here are some of my preliminary conclusions and suggestions. > > > > The problem of timeouts with some 'olde

[RFC] Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-09 Thread Jarek Poplawski
It seems, we can start to think about some preferred solutions, already. Here are some of my preliminary conclusions and suggestions. The problem of timeouts with some 'older' network cards seems to hit mainly x86_64 arch, and after diagnosing and testing (still beeing done) it's caused by resend

[patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-09 Thread Jarek Poplawski
On Wed, Aug 08, 2007 at 01:42:43PM +0200, Jarek Poplawski wrote: > Read below please: > > On Wed, Aug 08, 2007 at 01:09:36PM +0200, Marcin Ślusarz wrote: > > 2007/8/7, Jarek Poplawski <[EMAIL PROTECTED]>: > > > So, the let's try this idea yet: modified Ingo's "x86: activate > > > HARDIRQS_SW_RESEN

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-08 Thread Jarek Poplawski
On Wed, Aug 08, 2007 at 10:59:22AM +0200, Jean-Baptiste Vignaud wrote: ... > > If you would like to read something more about testing (then of > > course my suggestions could occur invalid - I'm a very bad tester > > myself...) you can try this: > > http://www.stardust.webpages.pl/files/handbook/ >

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-08 Thread Jarek Poplawski
On Wed, Aug 08, 2007 at 01:42:43PM +0200, Jarek Poplawski wrote: ... > So, it looks like x86_64 io_apic's IPI code was unused too long... To be fair it's x86_64 lapic's IPI code. Jarek P. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-08 Thread Jarek Poplawski
Read below please: On Wed, Aug 08, 2007 at 01:09:36PM +0200, Marcin Ślusarz wrote: > 2007/8/7, Jarek Poplawski <[EMAIL PROTECTED]>: > > So, the let's try this idea yet: modified Ingo's "x86: activate > > HARDIRQS_SW_RESEND" patch. > > (Don't forget about make oldconfig before make.) > > For testin

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-08 Thread Marcin Ślusarz
2007/8/7, Jarek Poplawski <[EMAIL PROTECTED]>: > And here is one more patch to test the same idea (chip->retrigger()). > Let's try i386 way! (I hope I will not be arrested for this...) > (Should be tested without any previous patches.) > > Jarek P. > > PS: as above > > --- > > diff -Nurp 2.6.22.1-/

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-08 Thread Marcin Ślusarz
2007/8/7, Jarek Poplawski <[EMAIL PROTECTED]>: > So, the let's try this idea yet: modified Ingo's "x86: activate > HARDIRQS_SW_RESEND" patch. > (Don't forget about make oldconfig before make.) > For testing only. > > Cheers, > Jarek P. > > PS: alas there was not even time for "compile checking"...

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-08 Thread Jarek Poplawski
On Wed, Aug 08, 2007 at 10:59:22AM +0200, Jean-Baptiste Vignaud wrote: > > Jean-Baptiste: I'm not sure how much of this testing you can afford? > > If you can spare some time for this and your box isn't for > > 'production' it could be very precious to diagnose such reproducible > > bug. > > Well

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-08 Thread Jean-Baptiste Vignaud
> Jean-Baptiste: I'm not sure how much of this testing you can afford? > If you can spare some time for this and your box isn't for > 'production' it could be very precious to diagnose such reproducible > bug. Well i can continue testing patches for sure. > Then, I'd have a few suggestions (you c

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-08 Thread Jarek Poplawski
On Wed, Aug 08, 2007 at 09:21:14AM +0200, Jarek Poplawski wrote: > On Tue, Aug 07, 2007 at 07:16:33PM +0200, Jean-Baptiste Vignaud wrote: ... > Marcin has done this with successfully using the most professional > way: git bisect (which btw. I did learn yet), but, IMHO, it could be ... Let me say th

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-08 Thread Jarek Poplawski
On Tue, Aug 07, 2007 at 07:16:33PM +0200, Jean-Baptiste Vignaud wrote: ... > So this afternoon i compiled 2.6.23-rc2 with same options as 2.6.23-rc1 > and edited grub.conf to add nosmp but after reboot the box did not > responded. Back home, i saw that the kernel failed because it was unable > to f

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-07 Thread Jean-Baptiste Vignaud
> On Tue, Aug 07, 2007 at 11:21:07AM +0200, Jean-Baptiste Vignaud wrote: > > > > > > * interrupts (i use irqbalance, but problem was the same without) > > > > > > I wonder if you tried without SMP too? > > > > No i did not. Do you think that this can be a problem ? > > To test with no SMP, do i n

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-07 Thread Jarek Poplawski
On Tue, Aug 07, 2007 at 02:13:39PM +0200, Jarek Poplawski wrote: > On Tue, Aug 07, 2007 at 11:52:46AM +0200, Jarek Poplawski wrote: > > On Tue, Aug 07, 2007 at 11:37:01AM +0200, Marcin Ślusarz wrote: ... > > > No, i don't need a break. I'll have more time in next weeks. > > > > Great! So, I'll try

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-07 Thread Jarek Poplawski
On Tue, Aug 07, 2007 at 11:52:46AM +0200, Jarek Poplawski wrote: > On Tue, Aug 07, 2007 at 11:37:01AM +0200, Marcin Ślusarz wrote: > > 2007/8/7, Jarek Poplawski <[EMAIL PROTECTED]>: > > > On Tue, Aug 07, 2007 at 09:46:36AM +0200, Marcin Ślusarz wrote: > > > > Network card still locks up (tested on

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-07 Thread Jarek Poplawski
On Mon, Aug 06, 2007 at 01:43:48PM -0400, Chuck Ebbert wrote: > On 08/06/2007 03:03 AM, Ingo Molnar wrote: > > > > But, since level types don't need this retriggers too much I think > > this "don't mask interrupts by default" idea should be rethinked: > > is there enough gain to risk such hard to

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-07 Thread Jarek Poplawski
On Tue, Aug 07, 2007 at 11:37:01AM +0200, Marcin Ślusarz wrote: > 2007/8/7, Jarek Poplawski <[EMAIL PROTECTED]>: > > On Tue, Aug 07, 2007 at 09:46:36AM +0200, Marcin Ślusarz wrote: > > > Network card still locks up (tested on 2.6.22.1). I had to upload more > > > data than usual (~350 MB vs ~1-100

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-07 Thread Jarek Poplawski
On Tue, Aug 07, 2007 at 11:21:07AM +0200, Jean-Baptiste Vignaud wrote: > > > > * interrupts (i use irqbalance, but problem was the same without) > > > > I wonder if you tried without SMP too? > > No i did not. Do you think that this can be a problem ? > To test with no SMP, do i need to recompile

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-07 Thread Jean-Baptiste Vignaud
> > * interrupts (i use irqbalance, but problem was the same without) > > I wonder if you tried without SMP too? No i did not. Do you think that this can be a problem ? To test with no SMP, do i need to recompile kernel or is there a kernel parameter ? > BTW, Jean-Baptiste and Chuck - it

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-07 Thread Jarek Poplawski
On Tue, Aug 07, 2007 at 10:10:34AM +0200, Jean-Baptiste Vignaud wrote: > > > BTW: Jean-Babtiste, could you send or point to you current configs? Oops! I'm very sorry for misspelling! > > I mean at least proc/interrupts, but with dmesg and .config it would > > be even better. (I assume this last

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-07 Thread Jarek Poplawski
On Tue, Aug 07, 2007 at 09:46:36AM +0200, Marcin Ślusarz wrote: > 2007/8/6, Ingo Molnar <[EMAIL PROTECTED]>: > > (..) > > please try Jarek's second patch too - there was a missing unmask. > > > > Ingo > > > > --> > > Subject: genirq: fix simple and fasteoi irq handlers > > From:

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-07 Thread Jean-Baptiste Vignaud
> BTW: Jean-Babtiste, could you send or point to you current configs? > I mean at least proc/interrupts, but with dmesg and .config it would > be even better. (I assume this last report was about the revert patch > mentioned by Chuck, not the one below your message?) Sure. Last reports are with

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-07 Thread Marcin Ślusarz
2007/8/6, Ingo Molnar <[EMAIL PROTECTED]>: > (..) > please try Jarek's second patch too - there was a missing unmask. > > Ingo > > --> > Subject: genirq: fix simple and fasteoi irq handlers > From: Jarek Poplawski <[EMAIL PROTECTED]> > > After the "genirq: do not mask interrupts

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-07 Thread Jarek Poplawski
On Mon, Aug 06, 2007 at 05:19:03PM -0400, Chuck Ebbert wrote: > On 08/06/2007 04:42 PM, Jean-Baptiste Vignaud wrote: > > Mmm, bad news, after 4 hours of intensive network stressing, one of the 2 > > 3com card failed with the latest fedora kernel. > > > > Aug 6 22:31:09 loki kernel: NETDEV WATCHD

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-06 Thread Al Boldi
Jean-Baptiste Vignaud wrote: > Mmm, bad news, after 4 hours of intensive network stressing, one of the 2 > 3com card failed with the latest fedora kernel. > > Aug 6 22:31:09 loki kernel: NETDEV WATCHDOG: eth2: transmit timed out > Aug 6 22:31:09 loki kernel: eth2: transmit timed out, tx_status 00

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-06 Thread Chuck Ebbert
On 08/06/2007 04:42 PM, Jean-Baptiste Vignaud wrote: > Mmm, bad news, after 4 hours of intensive network stressing, one of the 2 > 3com card failed with the latest fedora kernel. > > Aug 6 22:31:09 loki kernel: NETDEV WATCHDOG: eth2: transmit timed out > Aug 6 22:31:09 loki kernel: eth2: transm

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-06 Thread Jean-Baptiste Vignaud
Mmm, bad news, after 4 hours of intensive network stressing, one of the 2 3com card failed with the latest fedora kernel. Aug 6 22:31:09 loki kernel: NETDEV WATCHDOG: eth2: transmit timed out Aug 6 22:31:09 loki kernel: eth2: transmit timed out, tx_status 00 status e601. Aug 6 22:31:09 loki ke

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-06 Thread Jean-Baptiste Vignaud
> * Chuck Ebbert <[EMAIL PROTECTED]> wrote: > > > Before, they would print: > > > > eth0: transmit timed out, tx_status 00 status e601. > > diagnostics: net 0ccc media 8880 dma 003a fifo > > eth0: Interrupt posted but not delivered -- IRQ blocked by another device? > > Flags; bus-mas

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-06 Thread Ingo Molnar
* Chuck Ebbert <[EMAIL PROTECTED]> wrote: > Before, they would print: > > eth0: transmit timed out, tx_status 00 status e601. > diagnostics: net 0ccc media 8880 dma 003a fifo > eth0: Interrupt posted but not delivered -- IRQ blocked by another device? > Flags; bus-master 1, dirty 29

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-06 Thread Chuck Ebbert
On 08/06/2007 03:03 AM, Ingo Molnar wrote: > > But, since level types don't need this retriggers too much I think > this "don't mask interrupts by default" idea should be rethinked: > is there enough gain to risk such hard to diagnose errors? > > I reverted those masking changes in Fedora and

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-06 Thread Ingo Molnar
* Marcin Ślusarz <[EMAIL PROTECTED]> wrote: > 2007/7/31, Jarek Poplawski <[EMAIL PROTECTED]>: > > Marcin, > > > > I see you're quite busy, but if after testing this next Ingo's patch > > you are alive yet, maybe you could try one more "idea"? No patch this > > time, but if you could try this afte

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-06 Thread Marcin Ślusarz
2007/7/31, Jarek Poplawski <[EMAIL PROTECTED]>: > Marcin, > > I see you're quite busy, but if after testing this next Ingo's patch > you are alive yet, maybe you could try one more "idea"? No patch this > time, but if you could try this after adding boot option "noirqdebug" > (I'd like to be sure i

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-06 Thread Marcin Ślusarz
2007/8/1, Ingo Molnar <[EMAIL PROTECTED]>: > ok, it wasnt supposed to be _that_ easy i guess :-) Can you please > (re-)confirm that the workaround below indeed fixes the hung card > problem? (after producing a single WARN_ON message into the syslog) yes, with this patch everything works fine end o

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-01 Thread Ingo Molnar
* Marcin Ślusarz <[EMAIL PROTECTED]> wrote: > > ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR); > > + /* force POST: */ > > + ei_inb_p(e8390_base + EN0_IMR); > > > > spin_unlock(&ei_local->page_lock); > > enable_irq_lockdep_irqrestore(dev->irq, &flags); > > > > Ba

Re: 2.6.20->2.6.21 - networking dies after random time

2007-08-01 Thread Marcin Ślusarz
2007/7/30, Ingo Molnar <[EMAIL PROTECTED]>: > (..) > does the patch below fix those timeouts? It tests the theory whether any > POST latency could expose this problem. > > Ingo > > Index: linux/drivers/net/lib8390.c > === > ---

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-31 Thread Jarek Poplawski
On Mon, Jul 30, 2007 at 09:29:38AM +0200, Marcin Ślusarz wrote: ... > ps: I retested all patches posted in this thread on top of 2.6.22.1 > and behavior from 2.6.21.3 didn't changed. My next tests will be on > 2.6.22.x only. Marcin, I see you're quite busy, but if after testing this next Ingo's p

Re: [PATCH][netdrvr] lib8390: comment on locking by Alan Cox Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-30 Thread Jeff Garzik
Jarek Poplawski wrote: Hi, Very below is my patch proposal with a comment, which in my opinion is precious enough to save it for future help in reading and understanding the code. I hope Alan will not blame me I've not asked for his permission before sending, and he would ack this patch as it i

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-30 Thread Alan Cox
> So the whole locking is to be able to keep irqs enabled for a long time, > without risking entry of the same IRQ handler on this same CPU, correct? As implemented - on any CPU. We also need to know that the IRQ handler is not doing useful work on another processor which is why we take the lock

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-30 Thread Ingo Molnar
* Marcin Ślusarz <[EMAIL PROTECTED]> wrote: > > Subject: x86: activate HARDIRQS_SW_RESEND > > From: Ingo Molnar <[EMAIL PROTECTED]> > > > > activate the software-triggered IRQ-resend logic. > This patch didn't help (tested on 2.6.22.1) - ne2k_pci timed out. ok. This makes it more likely that th

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-30 Thread Ingo Molnar
* Alan Cox <[EMAIL PROTECTED]> wrote: > Ok the logic behind the 8390 is very simple: thanks for the explanation Alan! A few comments and a question: > Things to know > - IRQ delivery is asynchronous to the PCI bus > - Blocking the local CPU IRQ via spin locks was too slow > -

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-30 Thread Marcin Ślusarz
2007/7/26, Ingo Molnar <[EMAIL PROTECTED]>: > (..) > yeah - i meant to cover both arches but forgot about x86_64 - updated > patch attached below. > > Ingo > > -> > Subject: x86: activate HARDIRQS_SW_RESEND > From: Ingo Molnar <[EMAIL PROTECTED]> > > activate the software-tr

Re: [PATCH][netdrvr] lib8390: comment on locking by Alan Cox Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-26 Thread Alan Cox
On Thu, 26 Jul 2007 14:44:01 +0200 Jarek Poplawski <[EMAIL PROTECTED]> wrote: > Hi, > > Very below is my patch proposal with a comment, which in my opinion > is precious enough to save it for future help in reading and > understanding the code. > > I hope Alan will not blame me I've not asked fo

[PATCH][netdrvr] lib8390: comment on locking by Alan Cox Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-26 Thread Jarek Poplawski
Hi, Very below is my patch proposal with a comment, which in my opinion is precious enough to save it for future help in reading and understanding the code. I hope Alan will not blame me I've not asked for his permission before sending, and he would ack this patch as it is or at least most of thi

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-26 Thread Ingo Molnar
* Jarek Poplawski <[EMAIL PROTECTED]> wrote: > On Thu, Jul 26, 2007 at 10:31:20AM +0200, Ingo Molnar wrote: > ... > > yeah. The patch below enables sw-resend on x86, to test the theory > > whether the APIC-driven hardware-vector-resend code has some problem. > > I think Marcin is using x86_64 (

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-26 Thread Jarek Poplawski
On Thu, Jul 26, 2007 at 10:10:31AM +0200, Thomas Gleixner wrote: > On Thu, 2007-07-26 at 10:13 +0200, Jarek Poplawski wrote: ... > > PS: Now, it seems to me Thomas could be the nearest. BTW, could somebody > > give me some tip, how these re-triggered interrupts are skipped on dev's > > reset before

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-26 Thread Jarek Poplawski
On Thu, Jul 26, 2007 at 10:31:20AM +0200, Ingo Molnar wrote: ... > yeah. The patch below enables sw-resend on x86, to test the theory > whether the APIC-driven hardware-vector-resend code has some problem. I think Marcin is using x86_64 (Athlon 64) yet. Jarek P. - To unsubscribe from this list:

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-26 Thread Ingo Molnar
* Thomas Gleixner <[EMAIL PROTECTED]> wrote: > The other question is: > > Is the driver confused by the resent irq or is the chip-set unhappy > about the resend ? > > We could figure the latter out by activating the software based resend > method. yeah. The patch below enables sw-resend on x

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-26 Thread Ingo Molnar
* Marcin Ślusarz <[EMAIL PROTECTED]> wrote: > 2007/7/25, Thomas Gleixner <[EMAIL PROTECTED]>: > >(...) > > I've tested Jarek's patch, 2 Ingo's patches (2nd and 3rd) and Thomas' > patch (one patch at time of course) - all of them fixed the problem, > but the last one flooded my logs with "Skip

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-26 Thread Thomas Gleixner
On Thu, 2007-07-26 at 10:13 +0200, Jarek Poplawski wrote: > > I wanted to test them all on 2.6.22.1, but I didn't have enough time. > > I've verified only that 2.6.22.1 has the same problem. I can test it > > later, but I can report results back at beginning of next week. > > > So, everything is

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-26 Thread Jarek Poplawski
On Thu, Jul 26, 2007 at 10:13:26AM +0200, Jarek Poplawski wrote: ... > So, everything is clear - any changes are good! > Except the signed-off ones... Oops! Marcin's patch was both signed-off and good. So, there is probably something more... Sorry Marcin, Jarek P. - To unsubscribe from this list

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-26 Thread Jarek Poplawski
On Thu, Jul 26, 2007 at 09:16:10AM +0200, Marcin Ślusarz wrote: > 2007/7/25, Thomas Gleixner <[EMAIL PROTECTED]>: > >(...) > > I've tested Jarek's patch, 2 Ingo's patches (2nd and 3rd) and Thomas' > patch (one patch at time of course) - all of them fixed the problem, > but the last one flooded my

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-26 Thread Marcin Ślusarz
2007/7/25, Thomas Gleixner <[EMAIL PROTECTED]>: (...) I've tested Jarek's patch, 2 Ingo's patches (2nd and 3rd) and Thomas' patch (one patch at time of course) - all of them fixed the problem, but the last one flooded my logs with "Skip resend for irq 17". All tests were done on 2.6.21.3. I wa

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-25 Thread Alan Cox
> > The code in question lib8390.c does > > > > disable_irq(); > > fiddle_with_the_network_card_hardware() > > enable_irq(); > ... > > > > No idea how this affects the network card, as the code there must be > > able to handle interrupts, which are not originated from the card due to

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-25 Thread Jarek Poplawski
On Wed, Jul 25, 2007 at 02:19:31AM +0200, Thomas Gleixner wrote: ... > Looking into the IO_APIC code, the resend via send_IPI_self() happens > unconditionally. So the resend is done for level and edge interrupts. > This makes the problem more mysterious. > > The code in question lib8390.c does >

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-25 Thread Jarek Poplawski
On Wed, Jul 25, 2007 at 02:19:31AM +0200, Thomas Gleixner wrote: > On Tue, 2007-07-24 at 22:04 +0200, Ingo Molnar wrote: > > Marcin, could you try the patch below too? [without having any other > > patch applied.] It basically turns the critical section into an irqs-off > > critical section and t

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-24 Thread Thomas Gleixner
On Tue, 2007-07-24 at 22:04 +0200, Ingo Molnar wrote: > Marcin, could you try the patch below too? [without having any other > patch applied.] It basically turns the critical section into an irqs-off > critical section and thus checks whether your problem is related to that > particular area of

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-24 Thread Ingo Molnar
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > On Tue, 24 Jul 2007, Ingo Molnar wrote: > > > > please try the patch below instead. > > I'm hoping this is just a "let's see if the behavior changes" patch, > not something that you think should be applied if it fixes something? > > This patch loo

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-24 Thread Linus Torvalds
On Tue, 24 Jul 2007, Ingo Molnar wrote: > > please try the patch below instead. I'm hoping this is just a "let's see if the behavior changes" patch, not something that you think should be applied if it fixes something? This patch looks like it is trying to paper over (rather than fix) some p

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-24 Thread Ingo Molnar
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > thanks for tracking it down! Could you try the patch below (ontop an > otherwise unmodified kernel)? This tests the theory whether the > problem is related to the disable_irq_nosync() call in the ne2k > driver's xmit path. Does this solve the hangs to

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-24 Thread Ingo Molnar
* Marcin Ślusarz <[EMAIL PROTECTED]> wrote: > Ok, I've bisected this problem and found that this patch broke my NIC: > > 76d2160147f43f982dfe881404cfde9fd0a9da21 is first bad commit > commit 76d2160147f43f982dfe881404cfde9fd0a9da21 > Author: Ingo Molnar <[EMAIL PROTECTED]> > Date: Fri Feb 16 0

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-24 Thread Jarek Poplawski
On Mon, Jul 23, 2007 at 07:44:58AM +0200, Marcin Ślusarz wrote: > Ok, I've bisected this problem and found that this patch broke my NIC: > > 76d2160147f43f982dfe881404cfde9fd0a9da21 is first bad commit > commit 76d2160147f43f982dfe881404cfde9fd0a9da21 > Author: Ingo Molnar <[EMAIL PROTECTED]> > Da

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-23 Thread Jarek Poplawski
On Mon, Jul 23, 2007 at 07:44:58AM +0200, Marcin Ślusarz wrote: > Ok, I've bisected this problem and found that this patch broke my NIC: Congratulations! > > 76d2160147f43f982dfe881404cfde9fd0a9da21 is first bad commit > commit 76d2160147f43f982dfe881404cfde9fd0a9da21 > Author: Ingo Molnar <[EMA

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-22 Thread Marcin Ślusarz
Ok, I've bisected this problem and found that this patch broke my NIC: 76d2160147f43f982dfe881404cfde9fd0a9da21 is first bad commit commit 76d2160147f43f982dfe881404cfde9fd0a9da21 Author: Ingo Molnar <[EMAIL PROTECTED]> Date: Fri Feb 16 01:28:24 2007 -0800 [PATCH] genirq: do not mask interr

Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-22 Thread Magnus Holmgren
I'd like to add that the same thing happens to me, though I went directly from 2.6.18 to 2.6.21. Debian-built (-k7) kernel, ne2k_pci NIC, but no skge. -- Magnus Holmgren[EMAIL PROTECTED] (No Cc of list mail needed, thanks) "Exim is better at being younger, where

Re: 2.6.20->2.6.21 - networking dies after random time

2007-06-29 Thread Jarek Poplawski
On Fri, Jun 29, 2007 at 10:50:20AM +0200, Jean-Baptiste Vignaud wrote: > Update... > I did 2 tests : > > 1) booted with option acpi=off > It booted correctly, i managed to get some load on one of the card > and after a while (10 minutes i guess) the Timeout occurs. Side effect, > at the same mome

Re: 2.6.20->2.6.21 - networking dies after random time

2007-06-29 Thread Jean-Baptiste Vignaud
Update... I did 2 tests : 1) booted with option acpi=off It booted correctly, i managed to get some load on one of the card and after a while (10 minutes i guess) the Timeout occurs. Side effect, at the same moment the sata contolers lost control of the disks somehow and the raid 5 array on th

Re: 2.6.20->2.6.21 - networking dies after random time

2007-06-27 Thread Jarek Poplawski
On Tue, Jun 26, 2007 at 04:24:07PM +0200, Jean-Baptiste Vignaud wrote: > Hello, i have a very similar problem with 2.6.21 also; > > 2 3com NICs and they are failling randomly. > > The kernel is a basic fedora 7 kernel (2.6.21-1.3228.fc7) > I found a bug report and added details here : > https://

Re: 2.6.20->2.6.21 - networking dies after random time

2007-06-26 Thread Jean-Baptiste Vignaud
Hello, i have a very similar problem with 2.6.21 also; 2 3com NICs and they are failling randomly. The kernel is a basic fedora 7 kernel (2.6.21-1.3228.fc7) I found a bug report and added details here : https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=243960 I'm not subcribed on this list,

Re: 2.6.20->2.6.21 - networking dies after random time

2007-06-26 Thread Jarek Poplawski
On Tue, Jun 26, 2007 at 08:10:17AM +0200, Marcin Ślusarz wrote: ... > I reproduced it on minimal config: ... Hm... This method is usable if you can find such minimal config with which the bug cannot be reproduced. Then you can add more until the bug is back. Of course, this takes time... We know

Re: 2.6.20->2.6.21 - networking dies after random time

2007-06-22 Thread Jarek Poplawski
On Fri, Jun 22, 2007 at 10:56:44AM +0200, Marcin Ślusarz wrote: ... > When I disable on-board network card in BIOS (controlled by skge) > ne2k-pci card is still locking up. So I think it's strictly ne2k-pci > card bug. I made some tests and I know how to reproduce it fast (on my > machine) - just m

Re: 2.6.20->2.6.21 - networking dies after random time

2007-06-22 Thread Marcin Ślusarz
2007/6/19, Jarek Poplawski <[EMAIL PROTECTED]>: On Mon, Jun 18, 2007 at 08:10:00AM -0700, Stephen Hemminger wrote: > On Mon, 18 Jun 2007 13:08:49 +0200 > Jarek Poplawski <[EMAIL PROTECTED]> wrote: > > > On 16-06-2007 23:35, Marcin .lusarz wrote: > > > hi > > > after upgrading kernel from 2.6.20 t

Re: 2.6.20->2.6.21 - networking dies after random time

2007-06-18 Thread Jarek Poplawski
On Mon, Jun 18, 2007 at 08:10:00AM -0700, Stephen Hemminger wrote: > On Mon, 18 Jun 2007 13:08:49 +0200 > Jarek Poplawski <[EMAIL PROTECTED]> wrote: > > > On 16-06-2007 23:35, Marcin .lusarz wrote: > > > hi > > > after upgrading kernel from 2.6.20 to 2.6.21.3 i'm experiencing really > > > strange

Re: 2.6.20->2.6.21 - networking dies after random time

2007-06-18 Thread Jarek Poplawski
On Mon, Jun 18, 2007 at 08:10:00AM -0700, Stephen Hemminger wrote: > On Mon, 18 Jun 2007 13:08:49 +0200 > Jarek Poplawski <[EMAIL PROTECTED]> wrote: ... > > It looks like skge driver enables different device than probbed. > > Maybe you've something old/wrong about eth0/eth1 in /etc configs? > > Mo

Re: 2.6.20->2.6.21 - networking dies after random time

2007-06-18 Thread Stephen Hemminger
On Mon, 18 Jun 2007 13:08:49 +0200 Jarek Poplawski <[EMAIL PROTECTED]> wrote: > On 16-06-2007 23:35, Marcin .lusarz wrote: > > hi > > after upgrading kernel from 2.6.20 to 2.6.21.3 i'm experiencing really > > strange problem - my _both_ network cards dies after random uptime - > > sometimes it's a

Re: 2.6.20->2.6.21 - networking dies after random time

2007-06-18 Thread Jarek Poplawski
On 16-06-2007 23:35, Marcin .lusarz wrote: > hi > after upgrading kernel from 2.6.20 to 2.6.21.3 i'm experiencing really > strange problem - my _both_ network cards dies after random uptime - > sometimes it's a few minutes, sometimes hours, sometimes it does not > happen for a couple of days... > t