Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

2008-01-15 Thread dean gaudet
On Tue, 15 Jan 2008, Andrew Morton wrote: > On Tue, 15 Jan 2008 21:01:17 -0800 (PST) dean gaudet <[EMAIL PROTECTED]> > wrote: > > > On Mon, 14 Jan 2008, NeilBrown wrote: > > > > > > > > raid5's 'make_request' function calls generic_m

Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

2008-01-15 Thread dean gaudet
d5_activate_delayed is only called at unplug time, never in > raid5. This seems to bring back the performance numbers. Calling it > in raid5d was sometimes too soon... > > Cc: "Dan Williams" <[EMAIL PROTECTED]> > Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

Re: 2.6.24-rc6 reproducible raid5 hang

2008-01-10 Thread dean gaudet
On Fri, 11 Jan 2008, Neil Brown wrote: > Thanks. > But I suspect you didn't test it with a bitmap :-) > I ran the mdadm test suite and it hit a problem - easy enough to fix. damn -- i "lost" my bitmap 'cause it was external and i didn't have things set up properly to pick it up after a reboot :)

Re: 2.6.24-rc6 reproducible raid5 hang

2008-01-10 Thread dean gaudet
On Thu, 10 Jan 2008, Neil Brown wrote: > On Wednesday January 9, [EMAIL PROTECTED] wrote: > > On Sun, 2007-12-30 at 10:58 -0700, dean gaudet wrote: > > > i have evidence pointing to d89d87965dcbe6fe4f96a2a7e8421b3a75f634d1 > > > > > > http://git.kernel.org/?p

Re: 2.6.24-rc6 reproducible raid5 hang

2008-01-09 Thread dean gaudet
On Thu, 10 Jan 2008, Neil Brown wrote: > On Wednesday January 9, [EMAIL PROTECTED] wrote: > > On Sun, 2007-12-30 at 10:58 -0700, dean gaudet wrote: > > > i have evidence pointing to d89d87965dcbe6fe4f96a2a7e8421b3a75f634d1 > > > > > > http://git.kernel.org/?p

Re: Raid 1, can't get the second disk added back in.

2008-01-09 Thread dean gaudet
On Tue, 8 Jan 2008, Bill Davidsen wrote: > Neil Brown wrote: > > On Monday January 7, [EMAIL PROTECTED] wrote: > > > > > Problem is not raid, or at least not obviously raid related. The problem > > > is that the whole disk, /dev/hdb is unavailable. > > > > Maybe check /sys/block/hdb/hold

Re: [patch] improve stripe_cache_size documentation

2007-12-30 Thread dean gaudet
On Sun, 30 Dec 2007, dean gaudet wrote: > On Sun, 30 Dec 2007, Thiemo Nagel wrote: > > > >stripe_cache_size (currently raid5 only) > > > > As far as I have understood, it applies to raid6, too. > > good point... and raid4. > > here's an up

Re: [patch] improve stripe_cache_size documentation

2007-12-30 Thread dean gaudet
On Sun, 30 Dec 2007, Thiemo Nagel wrote: > >stripe_cache_size (currently raid5 only) > > As far as I have understood, it applies to raid6, too. good point... and raid4. here's an updated patch. -dean Signed-off-by: dean gaudet <[EMAIL PROTECTED]> Index: lin

Re: 2.6.24-rc6 reproducible raid5 hang

2007-12-30 Thread dean gaudet
On Sat, 29 Dec 2007, Dan Williams wrote: > On Dec 29, 2007 1:58 PM, dean gaudet <[EMAIL PROTECTED]> wrote: > > On Sat, 29 Dec 2007, Dan Williams wrote: > > > > > On Dec 29, 2007 9:48 AM, dean gaudet <[EMAIL PROTECTED]> wrote: > > > > hmm bummer

Re: 2.6.24-rc6 reproducible raid5 hang

2007-12-29 Thread dean gaudet
On Sat, 29 Dec 2007, dean gaudet wrote: > On Sat, 29 Dec 2007, Justin Piszcz wrote: > > > Curious btw what kind of filesystem size/raid type (5, but defaults I > > assume, > > nothing special right? (right-symmetric vs. left-symmetric, etc?)/cache > > size/chunk

Re: 2.6.24-rc6 reproducible raid5 hang

2007-12-29 Thread dean gaudet
On Sat, 29 Dec 2007, Justin Piszcz wrote: > Curious btw what kind of filesystem size/raid type (5, but defaults I assume, > nothing special right? (right-symmetric vs. left-symmetric, etc?)/cache > size/chunk size(s) are you using/testing with? mdadm --create --level=5 --chunk=64 -n7 -x1 /dev/md2

[patch] improve stripe_cache_size documentation

2007-12-29 Thread dean gaudet
Document the amount of memory used by the stripe cache and the fact that it's tied down and unavailable for other purposes (right?). thanks to Dan Williams for the formula. -dean Signed-off-by: dean gaudet <[EMAIL PROTECTED]> Index: linux/Documenta

Re: 2.6.24-rc6 reproducible raid5 hang

2007-12-29 Thread dean gaudet
On Sat, 29 Dec 2007, Dan Williams wrote: > On Dec 29, 2007 9:48 AM, dean gaudet <[EMAIL PROTECTED]> wrote: > > hmm bummer, i'm doing another test (rsync 3.5M inodes from another box) on > > the same 64k chunk array and had raised the stripe_cache_size to 1024... > &

Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?

2007-12-29 Thread dean gaudet
On Tue, 25 Dec 2007, Bill Davidsen wrote: > The issue I'm thinking about is hardware sector size, which on modern drives > may be larger than 512b and therefore entail a read-alter-rewrite (RAR) cycle > when writing a 512b block. i'm not sure any shipping SATA disks have larger than 512B sectors

Re: 2.6.24-rc6 reproducible raid5 hang

2007-12-29 Thread dean gaudet
sumed equal to (chunk_size * raid_disks * stripe_cache_size) or (chunk_size * raid_disks * stripe_cache_active)? -dean On Thu, 27 Dec 2007, dean gaudet wrote: > hmm this seems more serious... i just ran into it with chunksize 64KiB and > while just untarring a bunch of linux kernels in p

Re: 2.6.24-rc6 reproducible raid5 hang

2007-12-27 Thread dean gaudet
ache_size of 256... in this case it's with a workload which is untarring 34 copies of the linux kernel at the same time. it's a variant of doug ledford's memtest, and i've attached it. -dean#!/usr/bin/perl # Copyright (c) 2007 dean gaudet <[EMAIL PROTECTED]> # # Permis

Re: 2.6.24-rc6 reproducible raid5 hang

2007-12-27 Thread dean gaudet
hmm this seems more serious... i just ran into it with chunksize 64KiB and while just untarring a bunch of linux kernels in parallel... increasing stripe_cache_size did the trick again. -dean On Thu, 27 Dec 2007, dean gaudet wrote: > hey neil -- remember that raid5 hang which me and only

Re: external bitmaps.. and more

2007-12-11 Thread dean gaudet
On Thu, 6 Dec 2007, Michael Tokarev wrote: > I come across a situation where external MD bitmaps > aren't usable on any "standard" linux distribution > unless special (non-trivial) actions are taken. > > First is a small buglet in mdadm, or two. > > It's not possible to specify --bitmap= in asse

Re: Raid array is not automatically detected.

2007-07-17 Thread dean gaudet
On Mon, 16 Jul 2007, David Greaves wrote: > Bryan Christ wrote: > > I do have the type set to 0xfd. Others have said that auto-assemble only > > works on RAID 0 and 1, but just as Justin mentioned, I too have another box > > with RAID5 that gets auto assembled by the kernel (also no initrd). I

Re: limits on raid

2007-06-17 Thread dean gaudet
On Sun, 17 Jun 2007, Wakko Warner wrote: > What benefit would I gain by using an external journel and how big would it > need to be? i don't know how big the journal needs to be... i'm limited by xfs' maximum journal size of 128MiB. i don't have much benchmark data -- but here are some rough not

Re: limits on raid

2007-06-17 Thread dean gaudet
On Sun, 17 Jun 2007, Wakko Warner wrote: > dean gaudet wrote: > > On Sat, 16 Jun 2007, Wakko Warner wrote: > > > > > When I've had an unclean shutdown on one of my systems (10x 50gb raid5) > > > it's > > > always slowed the system down when b

Re: limits on raid

2007-06-16 Thread dean gaudet
On Sat, 16 Jun 2007, Wakko Warner wrote: > When I've had an unclean shutdown on one of my systems (10x 50gb raid5) it's > always slowed the system down when booting up. Quite significantly I must > say. I wait until I can login and change the rebuild max speed to slow it > down while I'm using i

Re: limits on raid

2007-06-16 Thread dean gaudet
On Sat, 16 Jun 2007, David Greaves wrote: > Neil Brown wrote: > > On Friday June 15, [EMAIL PROTECTED] wrote: > > > > > As I understand the way > > > raid works, when you write a block to the array, it will have to read all > > > the other blocks

Re: XFS sunit/swidth for raid10

2007-03-25 Thread dean gaudet
On Fri, 23 Mar 2007, Peter Rabbitson wrote: > dean gaudet wrote: > > On Thu, 22 Mar 2007, Peter Rabbitson wrote: > > > > > dean gaudet wrote: > > > > On Thu, 22 Mar 2007, Peter Rabbitson wrote: > > > > > > > > > Hi, > > >

Re: XFS sunit/swidth for raid10

2007-03-22 Thread dean gaudet
On Thu, 22 Mar 2007, Peter Rabbitson wrote: > dean gaudet wrote: > > On Thu, 22 Mar 2007, Peter Rabbitson wrote: > > > > > Hi, > > > How does one determine the XFS sunit and swidth sizes for a software > > > raid10 > > > with 3 copies? > &g

Re: XFS sunit/swidth for raid10

2007-03-22 Thread dean gaudet
On Thu, 22 Mar 2007, Peter Rabbitson wrote: > Hi, > How does one determine the XFS sunit and swidth sizes for a software raid10 > with 3 copies? mkfs.xfs uses the GET_ARRAY_INFO ioctl to get the data it needs from software raid and select an appropriate sunit/swidth... although i'm not sure i a

Re: mdadm: raid1 with ext3 - filesystem size differs?

2007-03-20 Thread dean gaudet
it looks like you created the filesystem on the component device before creating the raid. -dean On Fri, 16 Mar 2007, Hanno Meyer-Thurow wrote: > Hi all! > Please CC me on answers since I am not subscribed to this list, thanks. > > When I try to build a raid1 system with mdadm 2.6.1 the filesy

Re: Replace drive in RAID5 without losing redundancy?

2007-03-05 Thread dean gaudet
On Tue, 6 Mar 2007, Neil Brown wrote: > On Monday March 5, [EMAIL PROTECTED] wrote: > > > > Is it possible to mark a disk as "to be replaced by an existing spare", > > then migrate to the spare disk and kick the old disk _after_ migration > > has been done? Or not even kick - but mark as new sp

Re: Linux Software RAID Bitmap Question

2007-02-28 Thread dean gaudet
On Mon, 26 Feb 2007, Neil Brown wrote: > On Sunday February 25, [EMAIL PROTECTED] wrote: > > I believe Neil stated that using bitmaps does incur a 10% performance > > penalty. If one's box never (or rarely) crashes, is a bitmap needed? > > I think I said it "can" incur such a penalty. The actu

Re: Reshaping raid0/10

2007-02-21 Thread dean gaudet
On Thu, 22 Feb 2007, Neil Brown wrote: > On Wednesday February 21, [EMAIL PROTECTED] wrote: > > Hello, > > > > > > > > are there any plans to support reshaping > > on raid0 and raid10? > > > > No concrete plans. It largely depends on time and motivation. > I expect that the various flavours

Re: md autodetect only detects one disk in raid1

2007-01-27 Thread dean gaudet
take a look at your mdadm.conf ... both on your root fs and in your initrd... look for a DEVICES line and make sure it says "DEVICES partitions"... anything else is likely to cause problems like below. also make sure each array is specified by UUID rather than device. and then rebuild your init

Re: bad performance on RAID 5

2007-01-18 Thread dean gaudet
On Wed, 17 Jan 2007, Sevrin Robstad wrote: > I'm suffering from bad performance on my RAID5. > > a "echo check >/sys/block/md0/md/sync_action" > > gives a speed at only about 5000K/sec , and HIGH load average : > > # uptime > 20:03:55 up 8 days, 19:55, 1 user, load average: 11.70, 4.04, 1.52

Re: raid5 software vs hardware: parity calculations?

2007-01-15 Thread dean gaudet
On Mon, 15 Jan 2007, Mr. James W. Laferriere wrote: > Hello Dean , > > On Mon, 15 Jan 2007, dean gaudet wrote: > ...snip... > > it should just be: > > > > echo check >/sys/block/mdX/md/sync_action > > > > if you don't have a /sys/block/m

Re: raid5 software vs hardware: parity calculations?

2007-01-15 Thread dean gaudet
On Mon, 15 Jan 2007, berk walker wrote: > dean gaudet wrote: > > echo check >/sys/block/mdX/md/sync_action > > > > it'll read the entire array (parity included) and correct read errors as > > they're discovered. > > Could I get a pointer as to how

Re: raid5 software vs hardware: parity calculations?

2007-01-15 Thread dean gaudet
On Mon, 15 Jan 2007, Robin Bowes wrote: > I'm running RAID6 instead of RAID5+1 - I've had a couple of instances > where a drive has failed in a RAID5+1 array and a second has failed > during the rebuild after the hot-spare had kicked in. if the failures were read errors without losing the entire

Re: raid5 software vs hardware: parity calculations?

2007-01-13 Thread dean gaudet
On Sat, 13 Jan 2007, Robin Bowes wrote: > Bill Davidsen wrote: > > > > There have been several recent threads on the list regarding software > > RAID-5 performance. The reference might be updated to reflect the poor > > write performance of RAID-5 until/unless significant tuning is done. > > Read

Re: raid5 software vs hardware: parity calculations?

2007-01-12 Thread dean gaudet
On Thu, 11 Jan 2007, James Ralston wrote: > I'm having a discussion with a coworker concerning the cost of md's > raid5 implementation versus hardware raid5 implementations. > > Specifically, he states: > > > The performance [of raid5 in hardware] is so much better with the > > write-back cachin

Re: Shrinking a RAID1--superblock problems

2006-12-12 Thread dean gaudet
On Tue, 12 Dec 2006, Jonathan Terhorst wrote: > I need to shrink a RAID1 array and am having trouble with the > persistent superblock; namely, mdadm --grow doesn't seem to relocate > it. If I downsize the array and then shrink the corresponding > partitions, the array fails since the superblock (w

Re: Observations of a failing disk

2006-11-27 Thread dean gaudet
On Tue, 28 Nov 2006, Richard Scobie wrote: > Anyway, my biggest concern is why > > echo repair > /sys/block/md5/md/sync_action > > appeared to have no effect at all, when I understand that it should re-write > unreadable sectors? i've had the same thing happen on a seagate 7200.8 pata 400GB...

Re: Raid 1 (non) performance

2006-11-19 Thread dean gaudet
On Wed, 15 Nov 2006, Magnus Naeslund(k) wrote: > # cat /proc/mdstat > Personalities : [raid1] > md2 : active raid1 sda3[0] sdb3[1] > 236725696 blocks [2/2] [UU] > > md1 : active raid1 sda2[0] sdb2[1] > 4192896 blocks [2/2] [UU] > > md0 : active raid1 sda1[0] sdb1[1] > 4192832 b

Re: safest way to swap in a new physical disk

2006-11-18 Thread dean gaudet
On Tue, 14 Nov 2006, Will Sheffler wrote: > Hi. > > What is the safest way to switch out a disk in a software raid array created > with mdadm? I'm not talking about replacing a failed disk, I want to take a > healthy disk in the array and swap it for another physical disk. Specifically, > I have

Re: raid5 hang on get_active_stripe

2006-11-15 Thread dean gaudet
and i haven't seen it either... neil do you think your latest patch was hiding the bug? 'cause there was an iteration of an earlier patch which didn't produce much spam in dmesg but the bug was still there, then there is the version below which spams dmesg a fair amount but i didn't see the bu

Re: RAID5 array showing as degraded after motherboard replacement

2006-11-07 Thread dean gaudet
On Wed, 8 Nov 2006, James Lee wrote: > > However I'm still seeing the error messages in my dmesg (the ones I > > posted earlier), and they suggest that there is some kind of hardware > > fault (based on a quick Google of the error codes). So I'm a little > > confused. the fact that the error is

Re: [PATCH 001 of 6] md: Send online/offline uevents when an md array starts/stops.

2006-11-06 Thread dean gaudet
On Mon, 6 Nov 2006, Neil Brown wrote: > This creates a deep disconnect between udev and md. > udev expects a device to appear first, then it created the > device-special-file in /dev. > md expect the device-special-file to exist first, and then created the > device on the first open. could you cr

Re: RAID5 array showing as degraded after motherboard replacement

2006-11-06 Thread dean gaudet
On Mon, 6 Nov 2006, James Lee wrote: > Thanks for the reply Dean. I looked through dmesg output from the > boot up, to check whether this was just an ordering issue during the > system start up (since both evms and mdadm attempt to activate the > array, which could cause things to go wrong...).

Re: mdadm 2.5.5 external bitmap assemble problem

2006-11-06 Thread dean gaudet
On Mon, 6 Nov 2006, Neil Brown wrote: > > hey i have another related question... external bitmaps seem to pose a bit > > of a chicken-and-egg problem. all of my filesystems are md devices. with > > an external bitmap i need at least one of the arrays to start, then have > > filesystems mounted

Re: Is my RAID broken?

2006-11-06 Thread dean gaudet
On Mon, 6 Nov 2006, Mikael Abrahamsson wrote: > On Mon, 6 Nov 2006, Neil Brown wrote: > > > So it looks like you machine recently crashed (power failure?) and it is > > restarting. > > Or upgrade some part of the OS and now it'll do resync every week or so (I > think this is debian default nowad

Re: RAID5 array showing as degraded after motherboard replacement

2006-11-05 Thread dean gaudet
On Sun, 5 Nov 2006, James Lee wrote: > Hi there, > > I'm running a 5-drive software RAID5 array across two controllers. > The motherboard in that PC recently died - I sent the board back for > RMA. When I refitted the motherboard, connected up all the drives, > and booted up I found that the arr

Re: Checking individual drive state

2006-11-05 Thread dean gaudet
On Sun, 5 Nov 2006, Bradshaw wrote: > I've recently built a smallish RAID5 box as a storage area for my home > network, using mdadm. However, one of the drives will not remain in the array > for longer that around two days before it is removed. Readding it to the array > does not throw any errors,

Re: RAID5/10 chunk size and ext2/3 stride parameter

2006-11-04 Thread dean gaudet
On Sat, 4 Nov 2006, martin f krafft wrote: > also sprach dean gaudet <[EMAIL PROTECTED]> [2006.11.03.2019 +0100]: > > > I cannot find authoritative information about the relation between > > > the RAID chunk size and the correct stride parameter to use when > >

mdadm 2.5.5 external bitmap assemble problem

2006-11-04 Thread dean gaudet
i think i've got my mdadm.conf set properly for an external bitmap -- but it doesn't seem to work. i can assemble from the command-line fine though: # grep md4 /etc/mdadm/mdadm.conf ARRAY /dev/md4 bitmap=/bitmap.md4 UUID=dbc3be0b:b5853930:a02e038c:13ba8cdc # mdadm -A /dev/md4 mdadm: Could not

Re: RAID5/10 chunk size and ext2/3 stride parameter

2006-11-03 Thread dean gaudet
On Tue, 24 Oct 2006, martin f krafft wrote: > Hi, > > I cannot find authoritative information about the relation between > the RAID chunk size and the correct stride parameter to use when > creating an ext2/3 filesystem. you know, it's interesting -- mkfs.xfs somehow gets the right sunit/swidth

Re: md array numbering is messed up

2006-10-30 Thread dean gaudet
On Mon, 30 Oct 2006, Brad Campbell wrote: > Michael Tokarev wrote: > > My guess is that it's using mdrun shell script - the same as on Debian. > > It's a long story, the thing is quite ugly and messy and does messy things > > too, but they says it's compatibility stuff and continue shipping it. ..

Re: Raid 0 breakage (maybe)

2006-10-30 Thread dean gaudet
On Mon, 30 Oct 2006, Neil Brown wrote: > > [EMAIL PROTECTED]:~# mdadm --assemble /dev/md0 /dev/hde /dev/hdi > > mdadm: cannot open device /dev/hde: Device or resource busy > > This is telling you that /dev/hde - or one of it's partitions - is > "Busy". This means more than just 'open'. It means

Re: why partition arrays?

2006-10-24 Thread dean gaudet
On Tue, 24 Oct 2006, Bill Davidsen wrote: > My read on LVM is that (a) it's one more thing for the admin to learn, (b) > because it's seldom used the admin will be working from documentation if it > has a problem, and (c) there is no bug-free software, therefore the use of LVM > on top of RAID wil

Re: Setting write-intent bitmap during array resync/create?

2006-10-10 Thread dean gaudet
On Tue, 10 Oct 2006, Eli Stair wrote: > I gather this isn't currently possible, but I wonder if it's feasible to make > it so? This works fine once the array is marked 'clean', and I imagine it's > simpler to just disallow the bitmap creation until it's in that state. Would > it be possible to a

Re: USB and raid... Device names change

2006-09-18 Thread dean gaudet
On Tue, 19 Sep 2006, Eduardo Jacob wrote: > DEVICE /dev/raid111 /dev/raid121 > ARRAY /dev/md0 level=raid1 num-devices=2 > UUID=1369e13f:eb4fa45c:6d4b9c2a:8196aa1b try using "DEVICE partitions"... then "mdadm -As /dev/md0" will scan all available partitions for raid components with UUID=1369e13

Re: access *existing* array from knoppix

2006-09-12 Thread dean gaudet
On Tue, 12 Sep 2006, Dexter Filmore wrote: > Am Dienstag, 12. September 2006 16:08 schrieb Justin Piszcz: > > /dev/MAKEDEV /dev/md0 > > > > also make sure the SW raid modules etc are loaded if necessary. > > Won't work, MAKEDEV doesn't know how to create [/dev/]md0. echo 'DEVICE partitions' >/tm

Re: Care and feeding of RAID?

2006-09-09 Thread dean gaudet
On Tue, 5 Sep 2006, Paul Waldo wrote: > What about bitmaps? Nobody has mentioned them. It is my understanding that > you just turn them on with "mdadm /dev/mdX -b internal". Any caveats for > this? bitmaps have been working great for me on a raid5 and raid1. it makes it that much more tolera

Re: UUID's

2006-09-09 Thread dean gaudet
On Sat, 9 Sep 2006, Richard Scobie wrote: > To remove all doubt about what is assembled where, I though going to: > > DEVICE partitions > MAILADDR root > ARRAY /dev/md3 UUID=xyz etc. > > would be more secure. > > Is this correct thinking on my part? yup. mdadm can generate it all for you... t

Re: UUID's

2006-09-08 Thread dean gaudet
On Sat, 9 Sep 2006, Richard Scobie wrote: > If I have specified an array in mdadm.conf using UUID's: > > ARRAY /dev/md0 UUID=3aaa0122:29827cfa:5331ad66:ca767371 > > and I replace a failed drive in the array, will the new drive be given the > previous UUID, or do I need to upate the mdadm.conf

Re: proactive-raid-disk-replacement

2006-09-08 Thread dean gaudet
On Fri, 8 Sep 2006, Michael Tokarev wrote: > dean gaudet wrote: > > On Fri, 8 Sep 2006, Michael Tokarev wrote: > > > >> Recently Dean Gaudet, in thread titled 'Feature > >> Request/Suggestion - "Drive Linking"', mentioned his > &

Re: proactive-raid-disk-replacement

2006-09-08 Thread dean gaudet
On Fri, 8 Sep 2006, Michael Tokarev wrote: > Recently Dean Gaudet, in thread titled 'Feature > Request/Suggestion - "Drive Linking"', mentioned his > document, http://arctic.org/~dean/proactive-raid5-disk-replacement.txt > > I've read it, a

Re: Feature Request/Suggestion - "Drive Linking"

2006-09-04 Thread dean gaudet
On Mon, 4 Sep 2006, Bill Davidsen wrote: > But I think most of the logic exists, the hardest part would be deciding what > to do. The existing code looks as if it could be hooked to do this far more > easily than writing new. In fact, several suggested recovery schemes involve > stopping the RAID5

Re: RAID-5 recovery

2006-09-03 Thread dean gaudet
On Sun, 3 Sep 2006, Clive Messer wrote: > This leads me to a question. I understand from reading the linux-raid > archives > that the current behaviour when rebuilding with a single badblock on another > disk is for that disk to also be kicked from the array. that's not quite the current behav

Re: Resize on dirty array?

2006-08-30 Thread dean gaudet
On Sun, 13 Aug 2006, dean gaudet wrote: > On Fri, 11 Aug 2006, David Rees wrote: > > > On 8/11/06, dean gaudet <[EMAIL PROTECTED]> wrote: > > > On Fri, 11 Aug 2006, David Rees wrote: > > > > > > > On 8/10/06, dean gaudet <[EMAIL PROTECTED]

Re: Feature Request/Suggestion - "Drive Linking"

2006-08-29 Thread dean gaudet
On Wed, 30 Aug 2006, Neil Bortnak wrote: > Hi Everybody, > > I had this major recovery last week after a hardware failure monkeyed > things up pretty badly. About half way though I had a couple of ideas > and I thought I'd suggest/ask them. > > 1) "Drive Linking": So let's say I have a 6 disk RA

Re: Is mdadm --create safe for existing arrays ?

2006-08-16 Thread dean gaudet
On Wed, 16 Aug 2006, Peter Greis wrote: > So, how do I change / and /boot to make the super > blocks persistent ? Is it safe to run "mdadm --create > /dev/md0 --raid-devices=2 --level=1 /dev/sda1 > /dev/sdb1" without loosing any data ? boot a rescue disk shrink the filesystems by a few MB to acc

Re: Resize on dirty array?

2006-08-13 Thread dean gaudet
On Fri, 11 Aug 2006, David Rees wrote: > On 8/11/06, dean gaudet <[EMAIL PROTECTED]> wrote: > > On Fri, 11 Aug 2006, David Rees wrote: > > > > > On 8/10/06, dean gaudet <[EMAIL PROTECTED]> wrote: > > > > - set up smartd to run long self tests once

Re: Resize on dirty array?

2006-08-10 Thread dean gaudet
suggestions: - set up smartd to run long self tests once a month. (stagger it every few days so that your disks aren't doing self-tests at the same time) - run 2.6.15 or later so md supports repairing read errors from the other drives... - run 2.6.16 or later so you get the check and rep

Re: Converting Ext3 to Ext3 under RAID 1

2006-08-02 Thread dean gaudet
On Wed, 2 Aug 2006, Dan Graham wrote: > Hello; > I have an existing, active ext3 filesystem which I would like to convert to > a RAID 1 ext3 filesystem with minimal down time. After casting about the web > and experimenting some on a test system, I believe that I can accomplish this > in the fo

Re: Still can't get md arrays that were started from an initrd to shutdown

2006-07-17 Thread dean gaudet
On Mon, 17 Jul 2006, Christian Pernegger wrote: > The problem seems to affect only arrays that are started via an > initrd, even if they do not have the root filesystem on them. > That's all arrays if they're either managed by EVMS or the > ramdisk-creator is initramfs-tools. For yaird-generated i

Re: proactive raid5 disk replacement success (using bitmap + raid1)

2006-06-22 Thread dean gaudet
t; Thanks a lot for sharing this. > > I am not quite understand about these 2 commands. Why we want to add a > pre-failing disk back to md4? > > mdadm --zero-superblock /dev/sde1 > mdadm /dev/md4 -a /dev/sde1 > > Ming > > > On Sun, 2006-04-23 at 18:40 -0700, de

Re: raid5 hang on get_active_stripe

2006-06-13 Thread dean gaudet
On Tue, 13 Jun 2006, Bill Davidsen wrote: > Neil Brown wrote: > > > On Friday June 2, [EMAIL PROTECTED] wrote: > > > > > On Thu, 1 Jun 2006, Neil Brown wrote: > > > > > > > > > > I've got one more long-shot I would like to try first. If you could > > > > backout that change to ll_rw_block

Re: raid5 hang on get_active_stripe

2006-06-10 Thread dean gaudet
On Fri, 2 Jun 2006, Neil Brown wrote: > On Friday June 2, [EMAIL PROTECTED] wrote: > > On Thu, 1 Jun 2006, Neil Brown wrote: > > > > > I've got one more long-shot I would like to try first. If you could > > > backout that change to ll_rw_block, and apply this patch instead. > > > Then when it ha

Re: raid5 hang on get_active_stripe

2006-06-02 Thread dean gaudet
On Thu, 1 Jun 2006, Neil Brown wrote: > I've got one more long-shot I would like to try first. If you could > backout that change to ll_rw_block, and apply this patch instead. > Then when it hangs, just cat the stripe_cache_active file and see if > that unplugs things or not (cat it a few times).

Re: raid5 hang on get_active_stripe

2006-05-30 Thread dean gaudet
On Wed, 31 May 2006, Neil Brown wrote: > On Tuesday May 30, [EMAIL PROTECTED] wrote: > > > > actually i think the rate is higher... i'm not sure why, but klogd doesn't > > seem to keep up with it: > > > > [EMAIL PROTECTED]:~# grep -c kblockd_schedule_work /var/log/messages > > 31 > > [EMAIL PRO

Re: raid5 hang on get_active_stripe

2006-05-30 Thread dean gaudet
On Wed, 31 May 2006, Neil Brown wrote: > On Tuesday May 30, [EMAIL PROTECTED] wrote: > > On Tue, 30 May 2006, Neil Brown wrote: > > > > > Could you try this patch please? On top of the rest. > > > And if it doesn't fail in a couple of days, tell me how regularly the > > > message > > >kbloc

Re: raid5 hang on get_active_stripe

2006-05-30 Thread dean gaudet
On Tue, 30 May 2006, Neil Brown wrote: > Could you try this patch please? On top of the rest. > And if it doesn't fail in a couple of days, tell me how regularly the > message >kblockd_schedule_work failed > gets printed. i'm running this patch now ... and just after reboot, no freeze yet,

Re: raid5 hang on get_active_stripe

2006-05-29 Thread dean gaudet
On Sun, 28 May 2006, Neil Brown wrote: > The following patch adds some more tracing to raid5, and might fix a > subtle bug in ll_rw_blk, though it is an incredible long shot that > this could be affecting raid5 (if it is, I'll have to assume there is > another bug somewhere). It certainly doesn'

Re: [PATCH] mdadm 2.5 (Was: ANNOUNCE: mdadm 2.5 - A tool for managing Soft RAID under Linux)

2006-05-28 Thread dean gaudet
On Sun, 28 May 2006, Luca Berra wrote: > dietlibc rand() and random() are the same function. > but random will throw a warning saying it is deprecated. that's terribly obnoxious... it's never going to be deprecated, there are only approximately a bazillion programs using random(). -dean - To un

Re: [PATCH] mdadm 2.5 (Was: ANNOUNCE: mdadm 2.5 - A tool for managing Soft RAID under Linux)

2006-05-28 Thread dean gaudet
On Sun, 28 May 2006, Luca Berra wrote: > - mdadm-2.5-rand.patch > Posix dictates rand() versus bsd random() function, and dietlibc > deprecated random(), so switch to srand()/rand() and make everybody > happy. fwiw... lots of rand()s tend to suck... and RAND_MAX may not be large enough for this

Re: raid5 hang on get_active_stripe

2006-05-27 Thread dean gaudet
On Sat, 27 May 2006, Neil Brown wrote: > Thanks. This narrows it down quite a bit... too much infact: I can > now say for sure that this cannot possible happen :-) > > 2/ The message.gz you sent earlier with the > echo t > /proc/sysrq-trigger > trace in it didn't contain inform

Re: raid5 hang on get_active_stripe

2006-05-26 Thread dean gaudet
On Sat, 27 May 2006, Neil Brown wrote: > On Friday May 26, [EMAIL PROTECTED] wrote: > > On Tue, 23 May 2006, Neil Brown wrote: > > > > i applied them against 2.6.16.18 and two days later i got my first hang... > > below is the stripe_cache foo. > > > > thanks > > -dean > > > > neemlark:~# cd /

Re: raid5 hang on get_active_stripe

2006-05-26 Thread dean gaudet
On Tue, 23 May 2006, Neil Brown wrote: > I've spent all morning looking at this and while I cannot see what is > happening I did find a couple of small bugs, so that is good... > > I've attached three patches. The first fix two small bugs (I think). > The last adds some extra information to >

Re: raid5 hang on get_active_stripe

2006-05-17 Thread dean gaudet
On Thu, 11 May 2006, dean gaudet wrote: > On Tue, 14 Mar 2006, Neil Brown wrote: > > > On Monday March 13, [EMAIL PROTECTED] wrote: > > > I just experienced some kind of lockup accessing my 8-drive raid5 > > > (2.6.16-rc4-mm2). The system has been up for 16 da

Re: raid5 hang on get_active_stripe

2006-05-11 Thread dean gaudet
On Tue, 14 Mar 2006, Neil Brown wrote: > On Monday March 13, [EMAIL PROTECTED] wrote: > > Hi all, > > > > I just experienced some kind of lockup accessing my 8-drive raid5 > > (2.6.16-rc4-mm2). The system has been up for 16 days running fine, but > > now processes that try to read the md device h

proactive raid5 disk replacement success (using bitmap + raid1)

2006-04-23 Thread dean gaudet
i had a disk in a raid5 which i wanted to clone onto the hot spare... without going offline and without long periods without redundancy. a few folks have discussed using bitmaps and temporary (superblockless) raid1 mappings to do this... i'm not sure anyone has tried / reported success though.

forcing a read on a known bad block

2006-04-11 Thread dean gaudet
hey Neil... i've been wanting to test out the reconstruct-on-read-error code... and i've had two chances to do so, but haven't be able to force md to read the appropriate block to trigger the code. i had two disks with SMART Current_Pending_Sector > 0 (which indicates pending read error) and i

Re: md/mdadm fails to properly run on 2.6.15 after upgrading from 2.6.11

2006-04-11 Thread dean gaudet
On Mon, 10 Apr 2006, Marc L. de Bruin wrote: > dean gaudet wrote: > > On Mon, 10 Apr 2006, Marc L. de Bruin wrote: > > > > > However, all "preferred minor"s are correct, meaning that the output is in > > > sync with what I expected it to be from /e

Re: md/mdadm fails to properly run on 2.6.15 after upgrading from 2.6.11

2006-04-10 Thread dean gaudet
On Mon, 10 Apr 2006, Marc L. de Bruin wrote: > However, all "preferred minor"s are correct, meaning that the output is in > sync with what I expected it to be from /etc/mdadm/mdadm.conf. > > Any other ideas? Just adding /etc/mdadm/mdadm.conf to the initrd does not seem > to work, since mdrun seem

Re: md/mdadm fails to properly run on 2.6.15 after upgrading from 2.6.11

2006-04-10 Thread dean gaudet
On Mon, 10 Apr 2006, Marc L. de Bruin wrote: > dean gaudet wrote: > > > initramfs-tools generates an "mdrun /dev" which starts all the raids it can > > find... but does not include the mdadm.conf in the initrd so i'm not sure it > > will necessarily star

Re: md/mdadm fails to properly run on 2.6.15 after upgrading from 2.6.11

2006-04-09 Thread dean gaudet
On Sun, 9 Apr 2006, Marc L. de Bruin wrote: ... > Okay, just pressing Control-D continues the boot process and AFAIK the root > filesystemen actually isn't corrupt. Running e2fsck returns no errors and > booting 2.6.11 works just fine, but I have no clue why it picked the wrong > partitions to bui

Re: raid5 high cpu usage during reads - oprofile results

2006-04-01 Thread dean gaudet
On Sat, 1 Apr 2006, Alex Izvorski wrote: > On Sat, 2006-04-01 at 14:28 -0800, dean gaudet wrote: > > i'm guessing there's a good reason for STRIPE_SIZE being 4KiB -- 'cause > > otherwise it'd be cool to run with STRIPE_SIZE the same as your raid > &

Re: raid5 high cpu usage during reads - oprofile results

2006-04-01 Thread dean gaudet
On Sat, 1 Apr 2006, Alex Izvorski wrote: > Dean - I think I see what you mean, you're looking at this line in the > assembly? > > 65830 16.8830 : c1f: cmp%rcx,0x28(%rax) yup that's the one... that's probably a fair number of cache (or tlb) misses going on right there. > I looked

Re: raid5 high cpu usage during reads - oprofile results

2006-03-25 Thread dean gaudet
On Sat, 25 Mar 2006, Alex Izvorski wrote: > http://linuxraid.pastebin.com/621363 - oprofile annotated assembly it looks to me like a lot of time is spent in __find_stripe() ... i wonder if the hash is working properly. in raid5.c you could try changing #define stripe_hash(conf, sect) (&((conf

Re: raid5 that used parity for reads only when degraded

2006-03-24 Thread dean gaudet
On Thu, 23 Mar 2006, Alex Izvorski wrote: > Also the cpu load is measured with Andrew Morton's cyclesoak > tool which I believe to be quite accurate. there's something cyclesoak does which i'm not sure i agree with: cyclesoak process dirties an array of 100 bytes... so what you're really ge

Re: naming of md devices

2006-03-22 Thread dean gaudet
On Thu, 23 Mar 2006, Nix wrote: > Last I heard the Debian initramfs constructs RAID arrays by explicitly > specifying the devices that make them up. This is, um, a bad idea: > the first time a disk fails or your kernel renumbers them you're > in *trouble*. yaird seems to dtrt ... at least in unst

Re: how to clone a disk

2006-03-11 Thread dean gaudet
On Sat, 11 Mar 2006, Ming Zhang wrote: > On Sat, 2006-03-11 at 16:31 -0800, dean gaudet wrote: > > if you fail the disk from the array, or boot without the failing disk, > > then the event counter in the other superblocks will be updated... and the > > removed/failed d

Re: how to clone a disk

2006-03-11 Thread dean gaudet
On Sat, 11 Mar 2006, Ming Zhang wrote: > On Sat, 2006-03-11 at 16:15 -0800, dean gaudet wrote: > > > you're planning to do this while the array is online? that's not safe... > > unless it's a read-only array... > > what i plan to do is to pull out the d

  1   2   >