Claudio Martins wrote:
On Tuesday 12 April 2005 01:46, Andrew Morton wrote:
Claudio Martins <[EMAIL PROTECTED]> wrote:
I think I'm going to give a try to Neil's patch, but I'll have to apply
some patches from -mm.
Just this one if you're using 2.6.12-rc2:
--- 25/drivers/md/md.c~avoid-deadlock-in-s
Chen, Kenneth W wrote:
Nick Piggin wrote on Tuesday, April 12, 2005 4:09 AM
Chen, Kenneth W wrote:
I like the patch a lot and already did bench it on our db setup. However,
I'm seeing a negative regression compare to a very very crappy patch (see
attached, you can laugh at me for doing things like
On Tuesday 12 April 2005 01:46, Andrew Morton wrote:
> Claudio Martins <[EMAIL PROTECTED]> wrote:
> > I think I'm going to give a try to Neil's patch, but I'll have to apply
> > some patches from -mm.
>
> Just this one if you're using 2.6.12-rc2:
>
> --- 25/drivers/md/md.c~avoid-deadlock-in-sync
Nick Piggin wrote on Tuesday, April 12, 2005 4:09 AM
> Chen, Kenneth W wrote:
> > I like the patch a lot and already did bench it on our db setup. However,
> > I'm seeing a negative regression compare to a very very crappy patch (see
> > attached, you can laugh at me for doing things like that :-)
Nick Piggin wrote:
It is a bit subtle: get_request may only drop the lock and return NULL
(after retaking the lock), if we fail on a memory allocation. If we
just fail due to unavailable queue slots, then the lock is never
dropped. And the mem allocation can't fail because it is a mempool
alloc wit
Chen, Kenneth W wrote:
On Tue, Apr 12 2005, Nick Piggin wrote:
Actually the patches I have sent you do fix real bugs, but they also
make the block layer less likely to recurse into page reclaim, so it
may be eg. hiding the problem that Neil's patch fixes.
Jens Axboe wrote on Tuesday, April 12, 200
Nick Piggin wrote:
Nick Piggin wrote:
Chen, Kenneth W wrote:
I like the patch a lot and already did bench it on our db setup.
However,
I'm seeing a negative regression compare to a very very crappy patch
(see
attached, you can laugh at me for doing things like that :-).
OK - if we go that way,
Nick Piggin wrote:
Chen, Kenneth W wrote:
I like the patch a lot and already did bench it on our db setup.
However,
I'm seeing a negative regression compare to a very very crappy patch (see
attached, you can laugh at me for doing things like that :-).
OK - if we go that way, perhaps the followi
On Tue, Apr 12 2005, Nick Piggin wrote:
> Actually the patches I have sent you do fix real bugs, but they also
> make the block layer less likely to recurse into page reclaim, so it
> may be eg. hiding the problem that Neil's patch fixes.
Jens Axboe wrote on Tuesday, April 12, 2005 12:08 AM
> Can
On Tue, Apr 12 2005, Nick Piggin wrote:
> Actually the patches I have sent you do fix real bugs, but they also
> make the block layer less likely to recurse into page reclaim, so it
> may be eg. hiding the problem that Neil's patch fixes.
Can you push those to Andrew? I'm quite happy with the way
On Tue, 2005-04-12 at 01:22 +0100, Claudio Martins wrote:
> On Monday 11 April 2005 23:59, Nick Piggin wrote:
> >
> > > OK, I'll try them in a few minutes and report back.
> >
> > I'm not overly hopeful. If they fix the problem, then it's likely
> > that the real bug is hidden.
> >
>
> Well, t
Claudio Martins <[EMAIL PROTECTED]> wrote:
>
> I think I'm going to give a try to Neil's patch, but I'll have to apply
> some
> patches from -mm.
Just this one if you're using 2.6.12-rc2:
--- 25/drivers/md/md.c~avoid-deadlock-in-sync_page_io-by-using-gfp_noio Mon Apr
11 16:55:07 2005
+++ 25
On Tuesday 12 April 2005 00:46, Neil Brown wrote:
> On Monday April 11, [EMAIL PROTECTED] wrote:
> > Neil, have you had a look at the traces? Do they mean much to you?
>
> Just looked.
> bio_alloc_bioset seems implicated, as does sync_page_io.
>
> sync_page_io used to use a 'struct bio' on the sta
On Monday 11 April 2005 23:59, Nick Piggin wrote:
>
> > OK, I'll try them in a few minutes and report back.
>
> I'm not overly hopeful. If they fix the problem, then it's likely
> that the real bug is hidden.
>
Well, the thing is, they do fix the problem. Or at least they hide it very
well ;
On Monday April 11, [EMAIL PROTECTED] wrote:
>
> Neil, have you had a look at the traces? Do they mean much to you?
>
Just looked.
bio_alloc_bioset seems implicated, as does sync_page_io.
sync_page_io used to use a 'struct bio' on the stack, but Jens Axboe
change it to use bio_alloc (don't kno
Claudio Martins wrote:
Right. I'm using two Seagate ATA133 disks (ide controler is AMD-8111) each
with 4 partitions, so I get 4 md Raid1 devices. The first one, md0, is for
swap. The rest are
~$ df -h
FilesystemSize Used Avail Use% Mounted on
/dev/md1 4.6G 1.9G 2.6
On Monday 11 April 2005 13:45, Nick Piggin wrote:
>
> No luck yet (on SMP i386). How many disks are you using in each
> raid1 array? You are using one array for swap, and one mounted as
> ext3 for the working area of the `stress` program, right?
>
Right. I'm using two Seagate ATA133 disks (ide
Nick Piggin wrote:
The common theme seems to be: try_to_free_pages, swap_writepage,
mempool_alloc, down/down_failed in .text.lock.md. Next I would suspect
md/raid1 - maybe some deadlock in an uncommon memory allocation
failure path?
I'll see if I can reproduce it here.
No luck yet (on SMP i386). Ho
Claudio Martins wrote:
On Sunday 10 April 2005 03:47, Andrew Morton wrote:
Suggest you boot with `nmi_watchdog=0' to prevent the nmi watchdog from
cutting in during long sysrq traces.
Also, capture the `sysrq-m' output so we can see if the thing is out of
memory.
Hi Andrew,
Thanks for the tip.
Claudio Martins wrote:
On Sunday 10 April 2005 03:47, Andrew Morton wrote:
Suggest you boot with `nmi_watchdog=0' to prevent the nmi watchdog from
cutting in during long sysrq traces.
Also, capture the `sysrq-m' output so we can see if the thing is out of
memory.
Hi Andrew,
Thanks for the tip.
On Sunday 10 April 2005 03:47, Andrew Morton wrote:
>
> Suggest you boot with `nmi_watchdog=0' to prevent the nmi watchdog from
> cutting in during long sysrq traces.
>
> Also, capture the `sysrq-m' output so we can see if the thing is out of
> memory.
Hi Andrew,
Thanks for the tip. I booted
On Sunday 10 April 2005 03:53, Nick Piggin wrote:
>
> Looks like you may possibly have a memory allocation deadlock
> (although I can't explain the NMI oops).
>
> I would be interested to see if the following patch is of any
> help to you.
>
Hi Nick,
I'll build a kernel with your patch and r
On Sunday 10 April 2005 03:47, Andrew Morton wrote:
>
> Suggest you boot with `nmi_watchdog=0' to prevent the nmi watchdog from
> cutting in during long sysrq traces.
>
> Also, capture the `sysrq-m' output so we can see if the thing is out of
> memory.
OK, will do it ASAP and report back.
Tha
Claudio Martins wrote:
On Tuesday 05 April 2005 03:12, Andrew Morton wrote:
Claudio Martins <[EMAIL PROTECTED]> wrote:
While stress testing 2.6.12-rc2 on an HP DL145 I get processes stuck
in D state after some time.
This machine is a dual Opteron 248 with 2GB (ECC) on one node (the
other node h
Claudio Martins <[EMAIL PROTECTED]> wrote:
>
> I repeated the test to try to get more output from alt-sysreq-T, but it
> oopsed again with even less output.
>By the way, I have also tested 2.6.11.6 and I get stuck processes in the
> same way. With 2.6.9 I get a hard lockup with no workin
On Tuesday 05 April 2005 03:12, Andrew Morton wrote:
> Claudio Martins <[EMAIL PROTECTED]> wrote:
> >While stress testing 2.6.12-rc2 on an HP DL145 I get processes stuck
> > in D state after some time.
> >This machine is a dual Opteron 248 with 2GB (ECC) on one node (the
> > other node has
Claudio Martins <[EMAIL PROTECTED]> wrote:
>
>While stress testing 2.6.12-rc2 on an HP DL145 I get processes stuck in D
> state after some time.
>This machine is a dual Opteron 248 with 2GB (ECC) on one node (the other
> node has no RAM modules plugged in, since this board works only
Hi,
While stress testing 2.6.12-rc2 on an HP DL145 I get processes stuck in D
state after some time.
This machine is a dual Opteron 248 with 2GB (ECC) on one node (the other
node has no RAM modules plugged in, since this board works only with pairs).
I was using stress (http://weathe
28 matches
Mail list logo