Re: net_rx_action/NAPI oops [PATCH]

2007-11-30 Thread Kok, Auke
Robert Olsson wrote: > Hello! > > After further investigations. The bug was just in front of us... > > ifconfig down in combination with the test for || !netif_running() can > return a full quota and netif_rx_complete() done which causes the oops > in net_rx_action. Of course the load must

Re: net_rx_action/NAPI oops [PATCH]

2007-11-30 Thread Robert Olsson
Hello! After further investigations. The bug was just in front of us... ifconfig down in combination with the test for || !netif_running() can return a full quota and netif_rx_complete() done which causes the oops in net_rx_action. Of course the load must be high enough to fill the quot

Re: net_rx_action/NAPI oops [PATCH]

2007-11-28 Thread Robert Olsson
No it doesn't. besides napi_disable and napi_synchronize are identical. I was trying to disarm interrupts this way too. The patch I did send yesterday is the only cure so-far but I don't if it's 100% bullet proof either. I was stress-testing it patch but ran into new problems...(schedul

Re: net_rx_action/NAPI oops [PATCH]

2007-11-28 Thread Stephen Hemminger
Would this fix it? --- a/drivers/net/e1000/e1000_main.c2007-11-15 21:13:12.0 -0800 +++ b/drivers/net/e1000/e1000_main.c2007-11-28 08:37:03.0 -0800 @@ -630,10 +630,10 @@ e1000_down(struct e1000_adapter *adapter * reschedule our watchdog timer */ set_bit(__E1

Re: net_rx_action/NAPI oops [PATCH]

2007-11-28 Thread Robert Olsson
Stephen Hemminger writes: > It is considered a driver bug in 2.6.24 to call netif_rx_complete (clear > NAPI_STATE_SCHED) > and do a full quota. That bug already had to be fixed in other drivers, > look like e1000 has same problem. From what I see the problem is not related to ->poll. But i

Re: net_rx_action/NAPI oops [PATCH]

2007-11-28 Thread Robert Olsson
Kok, Auke writes: > > Robert, please give that patch a try (it fixes a crash that I had here as > well) > and let us know if it works for you. No it doesn't cure the problem I've reported Cheers. --ro BTW. You can try to verify the probl

Re: net_rx_action/NAPI oops [PATCH]

2007-11-27 Thread Kok, Auke
Stephen Hemminger wrote: > On Tue, 27 Nov 2007 14:34:44 -0800 > "Kok, Auke" <[EMAIL PROTECTED]> wrote: > >> Stephen Hemminger wrote: >>> On Tue, 27 Nov 2007 19:52:24 +0100 >>> Robert Olsson <[EMAIL PROTECTED]> wrote: >>> Hello! I've discovered a bug while testing the new multiQ NAPI

Re: net_rx_action/NAPI oops [PATCH]

2007-11-27 Thread Stephen Hemminger
On Tue, 27 Nov 2007 14:34:44 -0800 "Kok, Auke" <[EMAIL PROTECTED]> wrote: > Stephen Hemminger wrote: > > On Tue, 27 Nov 2007 19:52:24 +0100 > > Robert Olsson <[EMAIL PROTECTED]> wrote: > > > >> Hello! > >> > >> I've discovered a bug while testing the new multiQ NAPI code. In hi-load > >> situati

Re: net_rx_action/NAPI oops [PATCH]

2007-11-27 Thread Kok, Auke
Stephen Hemminger wrote: > On Tue, 27 Nov 2007 19:52:24 +0100 > Robert Olsson <[EMAIL PROTECTED]> wrote: > >> Hello! >> >> I've discovered a bug while testing the new multiQ NAPI code. In hi-load >> situations when we take down an interface we get a kernel panic. The >> oops is below. >> >> From

Re: net_rx_action/NAPI oops [PATCH]

2007-11-27 Thread Stephen Hemminger
On Tue, 27 Nov 2007 19:52:24 +0100 Robert Olsson <[EMAIL PROTECTED]> wrote: > > Hello! > > I've discovered a bug while testing the new multiQ NAPI code. In hi-load > situations when we take down an interface we get a kernel panic. The > oops is below. > > From what I see this happens when driv

net_rx_action/NAPI oops [PATCH]

2007-11-27 Thread Robert Olsson
Hello! I've discovered a bug while testing the new multiQ NAPI code. In hi-load situations when we take down an interface we get a kernel panic. The oops is below. >From what I see this happens when driver does napi_disable() and clears NAPI_STATE_SCHED. In net_rx_action there is a check for wo