Hi,
> I can not make sure it is hardware problem, but I have interest in this
> case's reproducing.
> If you tell me your platform's construction, I will try it and give you good
> solution.
> Does your RAID adapter's firmware version work on 1.42?
> Areca firmware had fix some hardware bugs an
Hi,
> I can not make sure it is hardware problem, but I have interest in this
> case's reproducing.
> If you tell me your platform's construction, I will try it and give you good
> solution.
The machines giving problems are almost identical when it comes to
hardware specs :
Intel SE7520BD2 m
EMAIL PROTECTED]>
Sent: Monday, February 05, 2007 6:24 PM
Subject: Re: 2.6.16.32 stuck in generic_file_aio_write()
Does the other machine have the same problems?
It does. It seems to depend on the interrupt frequency : Setting
KERNEL_HZ=250
makes it ony appear once a month or so, with K
> Does the other machine have the same problems?
It does. It seems to depend on the interrupt frequency : Setting KERNEL_HZ=250
makes it ony appear once a month or so, with KERNEL_HZ=1000, it will
occur within a week. It does happen a lot less with the other machine,
which isn't under disk acti
> > See below. The other machine is mostly identifical, except for i8042
> > missing (probably due to running an older kernel, or small differences in
> > the kernel config).
> >
>
> Does the other machine have the same problems?
No, but that machine has a lot less disk and networkactivity.
On Thu, 14 Dec 2006 09:55:38 +0100 (CET)
Igmar Palsenberg <[EMAIL PROTECTED]> wrote:
>
> > > Hmm.. Switching CONFIG_HZ from 1000 to 250 seems to 'fix' the problem.
> > > I haven't seen the issue in nearly a week now. This makes Andrew's theory
> > > about missing interrupts very likely.
> > >
> > Hmm.. Switching CONFIG_HZ from 1000 to 250 seems to 'fix' the problem.
> > I haven't seen the issue in nearly a week now. This makes Andrew's theory
> > about missing interrupts very likely.
> >
> > Andrew / others : Is there a way to find out if it *is* missing
> > interrupts ?
> >
>
>
On Thu, 14 Dec 2006 09:15:39 +0100 (CET)
Igmar Palsenberg <[EMAIL PROTECTED]> wrote:
>
> > > I'll put a .config and a dmesg of the machine booting at
> > > http://www.jdi-ict.nl/plain/ for those who want to look at it.
> >
> > dmesg : http://www.jdi-ict.nl/plain/lnx01.dmesg
> > Kernel config :
> > I'll put a .config and a dmesg of the machine booting at
> > http://www.jdi-ict.nl/plain/ for those who want to look at it.
>
> dmesg : http://www.jdi-ict.nl/plain/lnx01.dmesg
> Kernel config : http://www.jdi-ict.nl/plain/lnx01.config
Hmm.. Switching CONFIG_HZ from 1000 to 250 seems to 'fix
> I've enabled most debugging now, I'll see of i can run both a disk and VM
> stresstest.
Running stress now :
stress -c 2 -i 2 -m 8 -d 8 --vm-bytes 20M --vm-hang 5 --hdd-bytes 20M
I'll see what this results in.
> I'll put a .config and a dmesg of the machine booting at
> http://www.jdi-ict
> I thought it was, but from my look through yout 8-billion-task backtrace,
> no task was stuck in D-state with the appropriate call trace.
I was afraid of that... Where is the lock on the i_mutex suppose
to be released ? I can't grasp the codepath from within an interrupt back
to the fs layer.
> > Done some more digging : isn't http://lkml.org/lkml/2006/10/13/139 somehow
> > related ? I do see pagefaults, and inode locks and mmap_locks.
> >
>
> I thought it was, but from my look through yout 8-billion-task backtrace,
> no task was stuck in D-state with the appropriate call trace.
>
On Wed, 6 Dec 2006 16:17:10 +0100 (CET)
Igmar Palsenberg <[EMAIL PROTECTED]> wrote:
>
> > > It's rather large, but for those who want to look at it :
> > > http://www.jdi-ict.nl/plain/serial-28112006.txt
> >
> > The same problem, this time with 2.6.19. I've done a show tasks, a show
> > locks,
> > It's rather large, but for those who want to look at it :
> > http://www.jdi-ict.nl/plain/serial-28112006.txt
>
> The same problem, this time with 2.6.19. I've done a show tasks, a show
> locks, a show regs, and after that, a sync + reboot :)
>
> Log is at http://www.jdi-ict.nl/plain/seria
> It's rather large, but for those who want to look at it :
> http://www.jdi-ict.nl/plain/serial-28112006.txt
The same problem, this time with 2.6.19. I've done a show tasks, a show
locks, a show regs, and after that, a sync + reboot :)
Log is at http://www.jdi-ict.nl/plain/serial-04122006.txt
Hi,
> > I've got a machine which occasionally locks up. I can still sysrq it from
> > a serial console, so it's not entirely dead.
> >
> > A sysrq-t learns me that it's got a large number of httpd processes stuck
> > in D state :
>
> There are known deadlocks in generic_file_write() in kerne
On Wed, 29 Nov 2006 13:41:37 +0100 (CET)
Igmar Palsenberg <[EMAIL PROTECTED]> wrote:
> I've got a machine which occasionally locks up. I can still sysrq it from
> a serial console, so it's not entirely dead.
>
> A sysrq-t learns me that it's got a large number of httpd processes stuck
> in D st
Hi,
> If you are working on arcmsr 1.20.00.13 for official kernel version.
> This is the last version.
I'm already on that version. I'll see if I can upgrade to 2.6.19 today.
> Could you check your RAID controller event and tell someting to me?
> You can check "MBIOS"=>"Physical Drive Informati
er.
The firmware version 1.42 is on releasing procedure but not yet put it on
Areca ftp site.
If you need it, please tell me again.
Best Regards
Erich Chen
- Original Message -
From: "Igmar Palsenberg" <[EMAIL PROTECTED]>
To:
Cc: <[EMAIL PROTECTED]>
Sent: Wedn
Hi,
A followup. It crashed again, giving me :
arcmsr0: scsi id=0 lun=0 ccb='0xf7c984e0' poll command abort successfully
end_request: I/O error, dev sda, sector 3724719
and
sd 0:0:0:0: rejecting I/O to offline device
about 15k times.
I'll see if I can upgrade the RAID driver.
Igmar
Hi,
I've got a machine which occasionally locks up. I can still sysrq it from
a serial console, so it's not entirely dead.
A sysrq-t learns me that it's got a large number of httpd processes stuck
in D state :
httpd D F7619440 2160 11635 2057 11636 (NOTLB)
dbb7ae14 cc
21 matches
Mail list logo