Re: [Bug, PATCH and another Bug] Was: Fix refcounting problem with netif_rx_reschedule()

2007-09-20 Thread David Miller
From: Krishna Kumar2 <[EMAIL PROTECTED]> Date: Thu, 20 Sep 2007 11:24:01 +0530 > Ran 4/16/64 thread iperf on latest bits with this patch and no issues after > 30 mins. I used to > consistently get the bug within 1-2 mins with just 4 threads prior to this > patch. > > Tested-by: Krishna Kumar <[EM

Re: [Bug, PATCH and another Bug] Was: Fix refcounting problem with netif_rx_reschedule()

2007-09-20 Thread David Miller
From: Krishna Kumar2 <[EMAIL PROTECTED]> Date: Thu, 20 Sep 2007 10:48:15 +0530 > About the "list deletion occurs", isn't the race I mentioned still present? > If done < budget, the driver does netif_rx_complete (at which time some > other cpu can add this NAPI to their list). But the first cpu mig

Re: [Bug, PATCH and another Bug] Was: Fix refcounting problem with netif_rx_reschedule()

2007-09-19 Thread Krishna Kumar2
Ran 4/16/64 thread iperf on latest bits with this patch and no issues after 30 mins. I used to consistently get the bug within 1-2 mins with just 4 threads prior to this patch. Tested-by: Krishna Kumar <[EMAIL PROTECTED]> (if any value in that) thanks, - KK David Miller <[EMAIL PROTECTED]> wrot

Re: [Bug, PATCH and another Bug] Was: Fix refcounting problem with netif_rx_reschedule()

2007-09-19 Thread Krishna Kumar2
Hi Dave, David Miller <[EMAIL PROTECTED]> wrote on 09/19/2007 09:35:57 PM: > The NAPI_STATE_SCHED flag bit should provide all of the necessary > synchornization. > > Only the setter of that bit should add the NAPI instance to the > polling list. > > The polling loop runs atomically on the cpu whe

Re: [Bug, PATCH and another Bug] Was: Fix refcounting problem with netif_rx_reschedule()

2007-09-19 Thread David Miller
From: Krishna Kumar2 <[EMAIL PROTECTED]> Date: Thu, 20 Sep 2007 10:40:33 +0530 > I like the clean changes made by Dave to fix this, and will test it > today (if I can get my crashed system to come up). I would very much appreciate this testing, as I'm rather sure we've plugged up the most serious

Re: [Bug, PATCH and another Bug] Was: Fix refcounting problem with netif_rx_reschedule()

2007-09-19 Thread Krishna Kumar2
Hi Jan-Bernd, Jan-Bernd Themann <[EMAIL PROTECTED]> wrote on 09/19/2007 06:53:48 PM: > If I understood it right the problem you describe (quota update in > __napi_schdule) can cause further problems when you choose the > following numbers: > > CPU1: A. process 99 pkts > CPU1: B. netif_rx_complete

Re: [Bug, PATCH and another Bug] Was: Fix refcounting problem with netif_rx_reschedule()

2007-09-19 Thread David Miller
From: Krishna Kumar <[EMAIL PROTECTED]> Date: Wed, 19 Sep 2007 17:24:03 +0530 > Note: during steps F-H and C-E, priv/napi is read/modified by both cpu's > which is another bug relating to the same race. > > I guess the above patch is not required if this bug (in IPoIB) is fixed? The NAPI_S

Re: [Bug, PATCH and another Bug] Was: Fix refcounting problem with netif_rx_reschedule()

2007-09-19 Thread Jan-Bernd Themann
Hi, On Wednesday 19 September 2007 13:54, Krishna Kumar wrote: > CPU#1: ipoib_poll(budget=100) > { > A. process 100 skbs > B. netif_rx_complete() > CPU#2> > F. ib_req_notify_cq() (no missed completions, do nothing) > G. return 100 > H. return to net_rx_a

[Bug, PATCH and another Bug] Was: Fix refcounting problem with netif_rx_reschedule()

2007-09-19 Thread Krishna Kumar
Hi Dave, After applying Roland's NAPI patch, system panics when I run multiple thread iperf (no stack trace at this time, it shows that the panic is in net_tx_action). I think the problem is: In the "done < budget" case, ipoib_poll calls netif_rx_complete() netif_rx_complete() __netif_rx