I'm just documenting a problem and a solution in case anyone else suffers the same problem - it took quite a bit of googling to find the solution. Not a lot of info on raid recovery.
One thing about raid1 mirroring, if something goes wrong and a partition (mirror) gets removed from the raid array, the system continues working flawlessly. To find them, you have to look through the output of dmesg (or notice as the messages flash by during boot), for messages like this: md: hde6's event counter: 00000174 md: hda6's event counter: 000001ae md: superblock update time inconsistency -- using the most recent one md: freshest: hda6 md: kicking non-fresh hde6 from array! md: unbind<hde6,1> md: export_rdev(hde6) md: RAID level 1 does not need chunksize! Continuing anyway. md0: max total readahead window set to 124k md0: 1 data-disks, max readahead per data-disk: 124k raid1: device hda6 operational as mirror 0 raid1: md0, not all disks are operational -- trying to recover array raid1: raid set md0 active with 1 out of 2 mirrors md: updating md0 RAID superblock on device md: hda6 [events: 000001af]<6>(write) hda6's sb offset: 3076352 md: recovery thread got woken up ... md0: no spare disk to reconstruct array! -- continuing in degraded mode However, there is no other notification of an error occurring. I recently accidentally discovered I'd been running my system with all raid discs in degraded mode. I.e. I thought I was under the nice safe raid1 umbrella but I wasn't, and hadn't been for a long time. There's a note somewhere that says never access your partitions of a raid directly via the hd name like /dev/hda1 - if you must, use the ataraid name: /dev/ataraid/d0p1 (disc 0 partition 1). That's probably what I'd done to cause the problem. So, on an old Red Hat 7.2 system, pre- mdadm raid comamnds, how do you fix it? Probably you can fetch and install the mdadm package, which everyone likes better than the old raidtools. I was feeling paranoid about doing that, and thought I'd have a go at getting back into action with the old (raidtools-0.90) commands, which I'd used to build the array in the first place. First problem was I couldn't remember the command for rebuilding a raid device, and a man -k raid listed the raid commands, but none could rebuild the array. Eventually I found that there's no man entry in RH 7.2 or 7.3 for the required command, raidhotadd. Usage: /sbin/raidhotadd /dev/md0 /dev/hde7 for example. Seemed to work just fine, and you can cat /proc/mdstat to watch the progress of the rebuild. That's all you'd need to do. .... Unless you're a complete idiot like me, and you add the wrong partition to the raid array. Which is what I discovered when I went to reconstruct the *other* raid array. # /sbin/raidhotadd /dev/md2 /dev/hde7 /dev/md2: can not hot-add disk: invalid argument. Not a catastrophe because at least it was only the partition that had been kicked out of the other array. So sure, I'd wiped all the partition's data, but it was still happily sitting on the one that was still in the array. I added the correct partition and so now had three partitions used in the mirror. (I didn't even know you could do that.) So now a cat /proc/mdstat showed this: Personalities : [raid1] read_ahead 1024 sectors md0 : active raid1 hde6[2] hde7[1] hda6[0] 3076352 blocks [2/2] [UU] md2 : active raid1 hda7[0] 29567488 blocks [2/1] [U_] unused devices: <none> So, the problem then became, how to remove a partition (hde7) from the raid device? A little search turned up raidhotremove. Unfortunately, that won't work on a running device: # /sbin/raidhotremove /dev/md0 /dev/hde7 /dev/md0: can not hot-remove disk: disk busy! I noticed the raidhotgenerateerror command (usage: raidhotgeneraterror /dev/md0 /dev/hde7), which appeared to work but didn't actually let me hot remove it afterwards. (Still: device busy!) Then a google search turned up this exact problem, and the solution too: mark the partition faulty with the raidsetfaulty command. I soon discovered this isn't part of the raidtools-0.90-24 package, but is in 1.00.3. synaptic quickly showed there was no RH7.2 package available via apt-get, and rpmfind quickly confirmed this. Google lead to the name for the file, and with that a google search on raidtools-1.00.3.tar.gz actually lead to a place holding not just the ..tgz but also a source rpm (http://linux.maruhn.com/sec/raidtools.html) So I grabbed that, did the rpm --rebuild followed by the rpm -F, and got the necessary commands. Then it was a simple matter of: 1. raidsetfaulty /dev/md0 /dev/hde7 2. raidhotremove /dev/md0 /dev/hde7 3. watch cat /proc/mdstat (until the reconstruction completed on /dev/md0) # cat /proc/mdstat Personalities : [raid1] read_ahead 1024 sectors md0 : active raid1 hde7[2] hda6[0] 3076352 blocks [2/1] [U_] [====>................] recovery = 20.4% (629300/3076352) finish=1.2min speed=33121K/sec md2 : active raid1 hda7[0] 29567488 blocks [2/1] [U_] unused devices: <none> 4. raidhotadd /dev/md2 /dev/hde7 (and wait until the raid reconstruction finishes) There you are. Hope this is easier to find if some other poor soul has the same problem. Installing mdadm and following similar command steps would probably also work, I imagine. luke -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html