Re: watchdog timeout panic in e1000 driver

2007-02-20 Thread Kenzo Iwami
Hi, Thank you for your comment. > this looks a lot better than the previous patch!! However, we already have a > state marker for _down_ that we should probably reuse. Can you try the > attached > patch and see if it works for you? It's basically your patch without the > added > remove flag

Re: watchdog timeout panic in e1000 driver

2007-02-20 Thread Auke Kok
Kenzo Iwami wrote: Hi, I created a patch that uses watchdog_task but fixes the race condition that occurred in old the e1000 driver. I've obtained information about the panic caused by the old e1000 driver using e1000_watchdog_task. According to the crash dump, the panic was caused by a timer_l

Re: watchdog timeout panic in e1000 driver

2007-02-20 Thread Kenzo Iwami
Hi, I created a patch that uses watchdog_task but fixes the race condition that occurred in old the e1000 driver. I've obtained information about the panic caused by the old e1000 driver using e1000_watchdog_task. According to the crash dump, the panic was caused by a timer_list whose contents we

Re: watchdog timeout panic in e1000 driver

2007-01-18 Thread Kenzo Iwami
Hi, > My patch may seem like a huge change, but in essence the change is > pretty simple. > > In my patch, the interrupt handler code will check whether the interrupted > code is holding the swfw semaphore. If it is held, the watchdog function > is deferred until swfw semaphore is released. > The

Re: watchdog timeout panic in e1000 driver

2007-01-16 Thread Kenzo Iwami
Hi, Thank you for your comment. > thanks for staying patient while most of us were out or busy. Apart from > acknowledging > that you might have fixed a problem with your patch, we're very reluctant to > merge such > a huge change in our driver that touches much more cases then the one that

Re: watchdog timeout panic in e1000 driver

2007-01-15 Thread Auke Kok
Kenzo Iwami wrote: With this patch applied, I confirmed that the system doesn't panic. I think this patch can fix this problem. Does this patch have problems. Kenzo, thanks for staying patient while most of us were out or busy. Apart from acknowledging that you might have fixed a problem with

Re: watchdog timeout panic in e1000 driver

2007-01-15 Thread Kenzo Iwami
Hi, During the holiday season, I posted a patch that fixed this problem without using spinlocks nor disabling interrupts. http://marc.theaimsgroup.com/?l=linux-netdev&m=116649413613845&w=2 With this patch applied, I confirmed that the system doesn't panic. I think this patch can fix this proble

Re: watchdog timeout panic in e1000 driver

2006-12-18 Thread Kenzo Iwami
Hi, Previously, I posted a patch that fixed this problem without using spinlocks nor disabling interrupts. I have rebased this patch for 2.6.20-rc1. Does this patch have problems? I welcome any comments. -- Kenzo Iwami ([EMAIL PROTECTED]) Signed-off-by: Kenzo Iwami <[EMAIL PROTECTED]> diff

Re: watchdog timeout panic in e1000 driver

2006-12-11 Thread Kenzo Iwami
Hi, > There are several issues that are conflicting and mixing that make it less > than > intuitive to decide what the better fix is. > > Most of all, we discussed that adding a spinlock is not going to fix the > underlying > problem of contention, as the code that would need to be spinlocked

Re: watchdog timeout panic in e1000 driver

2006-12-04 Thread Auke Kok
Kenzo Iwami wrote: Hi, Doesn't this just mean that we need a spinlock or some other kind of semaphore around acquiring, using, and releasing this resource? We keep going around and around about this but I'm pretty sure spinlocks are meant to be able to solve exactly this issue. The problem is

Re: watchdog timeout panic in e1000 driver

2006-12-04 Thread Kenzo Iwami
Hi, >> Doesn't this just mean that we need a spinlock or some other kind of >> semaphore around acquiring, using, and releasing this resource? We keep >> going around and around about this but I'm pretty sure spinlocks are >> meant to be able to solve exactly this issue. >> >> The problem is goin

RE: watchdog timeout panic in e1000 driver

2006-11-16 Thread Brandeburg, Jesse
Kenzo Iwami wrote: > ethtool processing holding semaphore > INTERRUPT > e1000_watchdog waits for semaphore to be released > > The semaphore e1000_watchdog is waiting for can only be released when > ethtool resumes from interrupt after e1000_watchdog finishes > (basically a deadlock) >

Re: watchdog timeout panic in e1000 driver

2006-11-16 Thread Kenzo Iwami
Hi, Thank you for your comment. >>> I think this problem occurs because interrupt handler is executed in same >>> CPU as process that acquires semaphore. >>> How about disabling interrupt while the process is holding the semaphore? >>> I think this is possible, if the total lock time has been red

Re: watchdog timeout panic in e1000 driver

2006-11-15 Thread Auke Kok
Kenzo Iwami wrote: Hi, Even if the total lock time can be reduced, it's possible that interrupt handler is executed while the interrupted code is still holding the semaphore. I think your method only decrease the frequency of this problem. Why does reducing the lock time solve this problem? t

Re: watchdog timeout panic in e1000 driver

2006-11-15 Thread Kenzo Iwami
Hi, Even if the total lock time can be reduced, it's possible that interrupt handler is executed while the interrupted code is still holding the semaphore. I think your method only decrease the frequency of this problem. Why does reducing the lock time solve this problem?

Re: watchdog timeout panic in e1000 driver

2006-11-01 Thread Kenzo Iwami
Hi, >>> Even if the total lock time can be reduced, it's possible that interrupt >>> handler is executed while the interrupted code is still holding the >>> semaphore. >>> I think your method only decrease the frequency of this problem. >>> Why does reducing the lock time solve this problem? >> t

Re: watchdog timeout panic in e1000 driver

2006-10-30 Thread Shaw Vrana
On Mon, Oct 30, 2006 at 09:30:24AM -0800, Auke Kok wrote: > >Even if the total lock time can be reduced, it's possible that interrupt > >handler is executed while the interrupted code is still holding the > >semaphore. > >I think your method only decrease the frequency of this problem. > >Why does

Re: watchdog timeout panic in e1000 driver

2006-10-30 Thread Auke Kok
Kenzo Iwami wrote: Hi, Thank you for your comment. Anyway as I said in the same e-mail, we're working on reducing the lock timeout to a reasonable time. This will unfortunately take some time, as we need to change some major components in the driver to make sure this doesn't happen. How abou

Re: watchdog timeout panic in e1000 driver

2006-10-30 Thread Kenzo Iwami
Hi, Thank you for your comment. > Anyway as I said in the same e-mail, we're working on reducing the lock > timeout to a > reasonable time. This will unfortunately take some time, as we need to > change some major > components in the driver to make sure this doesn't happen

Re: watchdog timeout panic in e1000 driver

2006-10-26 Thread Auke Kok
Kenzo Iwami wrote: Hi, Thank you for your comment. Anyway as I said in the same e-mail, we're working on reducing the lock timeout to a reasonable time. This will unfortunately take some time, as we need to change some major components in the driver to make sure this doesn't happen. How abou

Re: watchdog timeout panic in e1000 driver

2006-10-26 Thread Kenzo Iwami
Hi, Thank you for your comment. >>> Anyway as I said in the same e-mail, we're working on reducing the lock >>> timeout to a >>> reasonable time. This will unfortunately take some time, as we need to >>> change some major >>> components in the driver to make sure this doesn't happen. >> >> Ho

Re: watchdog timeout panic in e1000 driver

2006-10-25 Thread Auke Kok
Kenzo Iwami wrote: Hi, This problem originally occurred in a very large cluster system using snmp for server management. About two servers panicked each day. The program I sent is to reproduce this problem in a very short time. It does occur under normal load when there is a lot of servers. hmm

Re: watchdog timeout panic in e1000 driver

2006-10-25 Thread Kenzo Iwami
Hi, >> This problem originally occurred in a very large cluster system using snmp >> for server management. About two servers panicked each day. The program I >> sent >> is to reproduce this problem in a very short time. It does occur under normal >> load when there is a lot of servers. > > hmm,

Re: watchdog timeout panic in e1000 driver

2006-10-24 Thread Auke Kok
Kenzo Iwami wrote: Hi, Thank you for your comment. This panic report falls in the category "how hard can I break my system as root". Explicitly abusing the system performing restricted calls depletes resources and harasses the sw lock (in this case). The reason that the driver attempts to wai

Re: watchdog timeout panic in e1000 driver

2006-10-24 Thread Kenzo Iwami
Hi, Thank you for your comment. > This panic report falls in the category "how hard can I break my system as > root". > Explicitly abusing the system performing restricted calls depletes resources > and > harasses the sw lock (in this case). The reason that the driver attempts to > wait that

Re: watchdog timeout panic in e1000 driver

2006-10-20 Thread Auke Kok
Kenzo Iwami wrote: Hi, Thank you for your comment. A watchdog timeout panic occurred in e1000 driver (7.2.9-NAPI). where's the panic message ? attached the panic message (e1000_panic). [...] This problem only occurs on a server using ethernet controller inside 631xESB/632xESB, and NMI wat

Re: watchdog timeout panic in e1000 driver

2006-10-19 Thread Auke Kok
Kenzo Iwami wrote: A watchdog timeout panic occurred in e1000 driver (7.2.9-NAPI). where's the panic message ? Please CC the maintainers of the driver at all times. Our e-mail addresses are widely visible everywhere. If e1000_watchdog is called when processing ioctl from ethtool, the syste

watchdog timeout panic in e1000 driver

2006-10-19 Thread Kenzo Iwami
Hi, A watchdog timeout panic occurred in e1000 driver (7.2.9-NAPI). If e1000_watchdog is called when processing ioctl from ethtool, the system could stop inside e1000_watchdog interrupt handler for about 16 seconds, and the system panicked as a result of a watchdog timeout. This problem only occu