Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)

2007-01-23 Thread Justin Piszcz
On Tue, 23 Jan 2007, Neil Brown wrote: On Monday January 22, [EMAIL PROTECTED] wrote: Justin Piszcz wrote: My .config is attached, please let me know if any other information is needed and please CC (lkml) as I am not on the list, thanks! Running Kernel 2.6.19.2 on a MD RAID5

Re: change strip_cache_size freeze the whole raid

2007-01-23 Thread Justin Piszcz
I can try and do this later this week possibly. Justin. On Tue, 23 Jan 2007, Neil Brown wrote: On Monday January 22, [EMAIL PROTECTED] wrote: Hi, Yesterday I tried to increase the value of strip_cache_size to see if I can get better performance or not. I increase the value from 2048

Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)

2007-01-23 Thread Michael Tokarev
Justin Piszcz wrote: [] Is this a bug that can or will be fixed or should I disable pre-emption on critical and/or server machines? Disabling pre-emption on critical and/or server machines seems to be a good idea in the first place. IMHO anyway.. ;) /mjt - To unsubscribe from this list: send

Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)

2007-01-23 Thread Justin Piszcz
On Tue, 23 Jan 2007, Michael Tokarev wrote: Justin Piszcz wrote: [] Is this a bug that can or will be fixed or should I disable pre-emption on critical and/or server machines? Disabling pre-emption on critical and/or server machines seems to be a good idea in the first place. IMHO

Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)

2007-01-23 Thread Michael Tokarev
Justin Piszcz wrote: On Tue, 23 Jan 2007, Michael Tokarev wrote: Disabling pre-emption on critical and/or server machines seems to be a good idea in the first place. IMHO anyway.. ;) So bottom line is make sure not to use preemption on servers or else you will get weird

Re: Ooops on read-only raid5 while unmounting as xfs

2007-01-23 Thread Francois Barre
3. mark the array read-only (mdadm -o) You shouldn't be able to do this. It should only be possible to set an array to read-only when it is not in use. The fact that you cannot suggests something else if wrong. Should I ? I assumed that mdadm -o could be run with a running used array, and

Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)

2007-01-23 Thread Justin Piszcz
On Tue, 23 Jan 2007, Michael Tokarev wrote: Justin Piszcz wrote: On Tue, 23 Jan 2007, Michael Tokarev wrote: Disabling pre-emption on critical and/or server machines seems to be a good idea in the first place. IMHO anyway.. ;) So bottom line is make sure not to use preemption

Re: change strip_cache_size freeze the whole raid

2007-01-23 Thread kyle
I can try and do this later this week possibly. Justin. alt-sysrq-T or echo t /proc/sysrq-trigger can be really helpful to diagnose this sort of problem (providing the system isn't so badly stuck that the kernel logs don't get stored). It is probably hitting a memory-allocation deadlock,

[PATCH 2.6.20-rc5 01/12] dmaengine: add base support for the async_tx api

2007-01-23 Thread Dan Williams
From: Dan Williams [EMAIL PROTECTED] * introduce struct dma_async_tx_descriptor as a common field for all dmaengine software descriptors * convert the device_memcpy_* methods into separate prep, set src/dest, and submit stages * support capabilities beyond memcpy (xor, memset, xor zero sum,

[PATCH 2.6.20-rc5 02/12] dmaengine: add the async_tx api

2007-01-23 Thread Dan Williams
From: Dan Williams [EMAIL PROTECTED] async_tx is an api to describe a series of bulk memory transfers/transforms. When possible these transactions are carried out by asynchrounous dma engines. The api handles inter-transaction dependencies and hides dma channel management from the client. When

[PATCH 2.6.20-rc5 06/12] md: move raid5 compute block operations to raid5_run_ops

2007-01-23 Thread Dan Williams
From: Dan Williams [EMAIL PROTECTED] handle_stripe sets STRIPE_OP_COMPUTE_BLK to request servicing from raid5_run_ops. It also sets a flag for the block being computed to let other parts of handle_stripe submit dependent operations. raid5_run_ops guarantees that the compute operation completes

[PATCH 2.6.20-rc5 03/12] md: add raid5_run_ops and support routines

2007-01-23 Thread Dan Williams
From: Dan Williams [EMAIL PROTECTED] Prepare the raid5 implementation to use async_tx for running stripe operations: * biofill (copy data into request buffers to satisfy a read request) * compute block (generate a missing block in the cache from the other blocks) * prexor (subtract existing data

[PATCH 2.6.20-rc5 08/12] md: satisfy raid5 read requests via raid5_run_ops

2007-01-23 Thread Dan Williams
From: Dan Williams [EMAIL PROTECTED] Use raid5_run_ops to carry out the memory copies for a raid5 read request. Signed-off-by: Dan Williams [EMAIL PROTECTED] --- drivers/md/raid5.c | 40 +++- 1 files changed, 15 insertions(+), 25 deletions(-) diff --git

[PATCH 2.6.20-rc5 09/12] md: use async_tx and raid5_run_ops for raid5 expansion operations

2007-01-23 Thread Dan Williams
From: Dan Williams [EMAIL PROTECTED] The parity calculation for an expansion operation is the same as the calculation performed at the end of a write with the caveat that all blocks in the stripe are scheduled to be written. An expansion operation is identified as a stripe with the POSTXOR flag

[PATCH 2.6.20-rc5 12/12] dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines

2007-01-23 Thread Dan Williams
From: Dan Williams [EMAIL PROTECTED] This is a driver for the iop DMA/AAU/ADMA units which are capable of pq_xor, pq_update, pq_zero_sum, xor, dual_xor, xor_zero_sum, fill, copy+crc, and copy operations. Changelog: * fixed a slot allocation bug in do_iop13xx_adma_xor that caused too few slots to

[PATCH 2.6.20-rc5 11/12] md: remove raid5 compute_block and compute_parity5

2007-01-23 Thread Dan Williams
From: Dan Williams [EMAIL PROTECTED] replaced by raid5_run_ops Signed-off-by: Dan Williams [EMAIL PROTECTED] --- drivers/md/raid5.c | 124 1 files changed, 0 insertions(+), 124 deletions(-) diff --git a/drivers/md/raid5.c

[PATCH 2.6.20-rc5 04/12] md: use raid5_run_ops for stripe cache operations

2007-01-23 Thread Dan Williams
From: Dan Williams [EMAIL PROTECTED] Each stripe has three flag variables to reflect the state of operations (pending, ack, and complete). -pending: set to request servicing in raid5_run_ops -ack: set to reflect that raid5_runs_ops has seen this request -complete: set when the operation is

[PATCH 2.6.20-rc5 05/12] md: move write operations to raid5_run_ops

2007-01-23 Thread Dan Williams
From: Dan Williams [EMAIL PROTECTED] handle_stripe sets STRIPE_OP_PREXOR, STRIPE_OP_BIODRAIN, STRIPE_OP_POSTXOR to request a write to the stripe cache. raid5_run_ops is triggerred to run and executes the request outside the stripe lock. Signed-off-by: Dan Williams [EMAIL PROTECTED] ---

[PATCH 2.6.20-rc5 07/12] md: move raid5 parity checks to raid5_run_ops

2007-01-23 Thread Dan Williams
From: Dan Williams [EMAIL PROTECTED] handle_stripe sets STRIPE_OP_CHECK to request a check operation in raid5_run_ops. If raid5_run_ops is able to perform the check with a dma engine the parity will be preserved in memory removing the need to re-read it from disk, as is necessary in the

Re: Ooops on read-only raid5 while unmounting as xfs

2007-01-23 Thread Neil Brown
On Tuesday January 23, [EMAIL PROTECTED] wrote: My question is then : what prevents the upper layer to open the array read-write, submit a write and make the md code BUG_ON() ? The theory is that when you tell an md array to become read-only, it tells the block layer that it is read-only, and

Re: [patch] md: bitmap read_page error

2007-01-23 Thread Neil Brown
On Tuesday January 23, [EMAIL PROTECTED] wrote: I think your patch is not enough to slove the read_page error completely. I think in the bitmap_init_from_disk we also need to check the 'count' never exceeds the size of file before calling the read_page function. How do your think about it.

Re: [PATCH 002 of 4] md: Make 'repair' actually work for raid1.

2007-01-23 Thread Andrew Morton
On Tue, 23 Jan 2007 11:26:52 +1100 NeilBrown [EMAIL PROTECTED] wrote: + for (j = 0; j vcnt ; j++) + memcpy(page_address(sbio-bi_io_vec[j].bv_page), +

Re: [PATCH 002 of 4] md: Make 'repair' actually work for raid1.

2007-01-23 Thread Neil Brown
On Tuesday January 23, [EMAIL PROTECTED] wrote: On Tue, 23 Jan 2007 11:26:52 +1100 NeilBrown [EMAIL PROTECTED] wrote: + for (j = 0; j vcnt ; j++) + memcpy(page_address(sbio-bi_io_vec[j].bv_page), +