Re: 2.6.16.32 stuck in generic_file_aio_write()

2007-02-19 Thread Igmar Palsenberg
Hi, > I can not make sure it is hardware problem, but I have interest in this > case's reproducing. > If you tell me your platform's construction, I will try it and give you good > solution. > Does your RAID adapter's firmware version work on 1.42? > Areca firmware had fix some hardware bugs an

Re: 2.6.16.32 stuck in generic_file_aio_write()

2007-02-12 Thread Igmar Palsenberg
Hi, > I can not make sure it is hardware problem, but I have interest in this > case's reproducing. > If you tell me your platform's construction, I will try it and give you good > solution. The machines giving problems are almost identical when it comes to hardware specs : Intel SE7520BD2 m

Re: 2.6.16.32 stuck in generic_file_aio_write()

2007-02-05 Thread erich
EMAIL PROTECTED]> Sent: Monday, February 05, 2007 6:24 PM Subject: Re: 2.6.16.32 stuck in generic_file_aio_write() Does the other machine have the same problems? It does. It seems to depend on the interrupt frequency : Setting KERNEL_HZ=250 makes it ony appear once a month or so, with K

Re: 2.6.16.32 stuck in generic_file_aio_write()

2007-02-05 Thread Igmar Palsenberg
> Does the other machine have the same problems? It does. It seems to depend on the interrupt frequency : Setting KERNEL_HZ=250 makes it ony appear once a month or so, with KERNEL_HZ=1000, it will occur within a week. It does happen a lot less with the other machine, which isn't under disk acti

Re: 2.6.16.32 stuck in generic_file_aio_write()

2006-12-14 Thread Igmar Palsenberg
> > See below. The other machine is mostly identifical, except for i8042 > > missing (probably due to running an older kernel, or small differences in > > the kernel config). > > > > Does the other machine have the same problems? No, but that machine has a lot less disk and networkactivity.

Re: 2.6.16.32 stuck in generic_file_aio_write()

2006-12-14 Thread Andrew Morton
On Thu, 14 Dec 2006 09:55:38 +0100 (CET) Igmar Palsenberg <[EMAIL PROTECTED]> wrote: > > > > Hmm.. Switching CONFIG_HZ from 1000 to 250 seems to 'fix' the problem. > > > I haven't seen the issue in nearly a week now. This makes Andrew's theory > > > about missing interrupts very likely. > > >

Re: 2.6.16.32 stuck in generic_file_aio_write()

2006-12-14 Thread Igmar Palsenberg
> > Hmm.. Switching CONFIG_HZ from 1000 to 250 seems to 'fix' the problem. > > I haven't seen the issue in nearly a week now. This makes Andrew's theory > > about missing interrupts very likely. > > > > Andrew / others : Is there a way to find out if it *is* missing > > interrupts ? > > > >

Re: 2.6.16.32 stuck in generic_file_aio_write()

2006-12-14 Thread Andrew Morton
On Thu, 14 Dec 2006 09:15:39 +0100 (CET) Igmar Palsenberg <[EMAIL PROTECTED]> wrote: > > > > I'll put a .config and a dmesg of the machine booting at > > > http://www.jdi-ict.nl/plain/ for those who want to look at it. > > > > dmesg : http://www.jdi-ict.nl/plain/lnx01.dmesg > > Kernel config :

Re: 2.6.16.32 stuck in generic_file_aio_write()

2006-12-14 Thread Igmar Palsenberg
> > I'll put a .config and a dmesg of the machine booting at > > http://www.jdi-ict.nl/plain/ for those who want to look at it. > > dmesg : http://www.jdi-ict.nl/plain/lnx01.dmesg > Kernel config : http://www.jdi-ict.nl/plain/lnx01.config Hmm.. Switching CONFIG_HZ from 1000 to 250 seems to 'fix

Re: 2.6.16.32 stuck in generic_file_aio_write()

2006-12-07 Thread Igmar Palsenberg
> I've enabled most debugging now, I'll see of i can run both a disk and VM > stresstest. Running stress now : stress -c 2 -i 2 -m 8 -d 8 --vm-bytes 20M --vm-hang 5 --hdd-bytes 20M I'll see what this results in. > I'll put a .config and a dmesg of the machine booting at > http://www.jdi-ict

Re: 2.6.16.32 stuck in generic_file_aio_write()

2006-12-07 Thread Igmar Palsenberg
> I thought it was, but from my look through yout 8-billion-task backtrace, > no task was stuck in D-state with the appropriate call trace. I was afraid of that... Where is the lock on the i_mutex suppose to be released ? I can't grasp the codepath from within an interrupt back to the fs layer.

Re: 2.6.16.32 stuck in generic_file_aio_write()

2006-12-06 Thread Igmar Palsenberg
> > Done some more digging : isn't http://lkml.org/lkml/2006/10/13/139 somehow > > related ? I do see pagefaults, and inode locks and mmap_locks. > > > > I thought it was, but from my look through yout 8-billion-task backtrace, > no task was stuck in D-state with the appropriate call trace. >

Re: 2.6.16.32 stuck in generic_file_aio_write()

2006-12-06 Thread Andrew Morton
On Wed, 6 Dec 2006 16:17:10 +0100 (CET) Igmar Palsenberg <[EMAIL PROTECTED]> wrote: > > > > It's rather large, but for those who want to look at it : > > > http://www.jdi-ict.nl/plain/serial-28112006.txt > > > > The same problem, this time with 2.6.19. I've done a show tasks, a show > > locks,

Re: 2.6.16.32 stuck in generic_file_aio_write()

2006-12-06 Thread Igmar Palsenberg
> > It's rather large, but for those who want to look at it : > > http://www.jdi-ict.nl/plain/serial-28112006.txt > > The same problem, this time with 2.6.19. I've done a show tasks, a show > locks, a show regs, and after that, a sync + reboot :) > > Log is at http://www.jdi-ict.nl/plain/seria

Re: 2.6.16.32 stuck in generic_file_aio_write()

2006-12-04 Thread Igmar Palsenberg
> It's rather large, but for those who want to look at it : > http://www.jdi-ict.nl/plain/serial-28112006.txt The same problem, this time with 2.6.19. I've done a show tasks, a show locks, a show regs, and after that, a sync + reboot :) Log is at http://www.jdi-ict.nl/plain/serial-04122006.txt

Re: 2.6.16.32 stuck in generic_file_aio_write()

2006-12-01 Thread Igmar Palsenberg
Hi, > > I've got a machine which occasionally locks up. I can still sysrq it from > > a serial console, so it's not entirely dead. > > > > A sysrq-t learns me that it's got a large number of httpd processes stuck > > in D state : > > There are known deadlocks in generic_file_write() in kerne

Re: 2.6.16.32 stuck in generic_file_aio_write()

2006-11-30 Thread Andrew Morton
On Wed, 29 Nov 2006 13:41:37 +0100 (CET) Igmar Palsenberg <[EMAIL PROTECTED]> wrote: > I've got a machine which occasionally locks up. I can still sysrq it from > a serial console, so it's not entirely dead. > > A sysrq-t learns me that it's got a large number of httpd processes stuck > in D st

Re: 2.6.16.32 stuck in generic_file_aio_write()

2006-11-30 Thread Igmar Palsenberg
Hi, > If you are working on arcmsr 1.20.00.13 for official kernel version. > This is the last version. I'm already on that version. I'll see if I can upgrade to 2.6.19 today. > Could you check your RAID controller event and tell someting to me? > You can check "MBIOS"=>"Physical Drive Informati

Re: 2.6.16.32 stuck in generic_file_aio_write()

2006-11-29 Thread erich
er. The firmware version 1.42 is on releasing procedure but not yet put it on Areca ftp site. If you need it, please tell me again. Best Regards Erich Chen - Original Message - From: "Igmar Palsenberg" <[EMAIL PROTECTED]> To: Cc: <[EMAIL PROTECTED]> Sent: Wedn

Re: 2.6.16.32 stuck in generic_file_aio_write()

2006-11-29 Thread Igmar Palsenberg
Hi, A followup. It crashed again, giving me : arcmsr0: scsi id=0 lun=0 ccb='0xf7c984e0' poll command abort successfully end_request: I/O error, dev sda, sector 3724719 and sd 0:0:0:0: rejecting I/O to offline device about 15k times. I'll see if I can upgrade the RAID driver. Igmar

2.6.16.32 stuck in generic_file_aio_write()

2006-11-29 Thread Igmar Palsenberg
Hi, I've got a machine which occasionally locks up. I can still sysrq it from a serial console, so it's not entirely dead. A sysrq-t learns me that it's got a large number of httpd processes stuck in D state : httpd D F7619440 2160 11635 2057 11636 (NOTLB) dbb7ae14 cc