Re: raid5 stuck in degraded, inactive and dirty mode
On Wed, Jan 09, 2008 at 07:16:34PM +1100, CaT wrote: But I suspect that --assemble --force would do the right thing. Without more details, it is hard to say for sure. I suspect so aswell but throwing caution into the wind erks me wrt this raid array. :) Sorry. Not to be a pain but considering the previous email with all the examine dumps, etc would the above be the way to go? I just don't want to have missed something and bugger the array up totally. -- To the extent that we overreact, we proffer the terrorists the greatest tribute. - High Court Judge Michael Kirby - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: md rotates RAID5 spare at boot
I'm sorry- is this an inappropriate list to ask for help? There seemed to be a fair amount of that when I searched the archives, but I don't want to bug developers with my problems! Please let me know if I should find another place to ask for help (and please let me know where that might be!). Thanks! Jed Davidow wrote: I have a RAID5 (5+1spare) setup that works perfectly well until I reboot. I have 6 drives (two different models) partitioned to give me 2 arrays, md0 and md1, that I use for /home and /var respectively. When I reboot, the system assembles each array, but swaps out what was the spare with one of the member drives. It then immediately detects a degraded array and rebuilds. After that, all is fine and testing has shown things to be working like they should. Until I reboot. Example: Built two arrays: /dev/md0 - /dev/sd[abcef]1 and /dev/md1 - /dev/sd[abcef]2 Added /dev/sdg1 and /dev/sdg2 as spares, and this works. One scenario when I reboot: md0 is assembled from sd[abceg]1; it's degraded and reports a spares missing event. md1 assembles correctly, spare is not missing Any ideas? I have asked about this on various boards (some said UDEV rules would help, some thought the issue had to do with the /dev/sdX names changing, etc). I don't think those are applicable since dmesg reports the arrays assemble as soon as the disks are detected. Thanks in advance! INFO: (currently the boot drive (non raid) is sdd, otherwise all sd devices are part of the raid) fdisk: $ sudo fdisk -l Disk /dev/sda: 250.0 GB, 250059350016 bytes 255 heads, 63 sectors/track, 30401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Device Boot Start End Blocks Id System /dev/sda1 1152112217401 fd Linux raid autodetect /dev/sda21522 30401 231978600 fd Linux raid autodetect Disk /dev/sdb: 251.0 GB, 251000193024 bytes 255 heads, 63 sectors/track, 30515 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Device Boot Start End Blocks Id System /dev/sdb1 1152112217401 fd Linux raid autodetect /dev/sdb21522 30401 231978600 fd Linux raid autodetect Disk /dev/sdc: 251.0 GB, 251000193024 bytes 255 heads, 63 sectors/track, 30515 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Device Boot Start End Blocks Id System /dev/sdc1 1152112217401 fd Linux raid autodetect /dev/sdc21522 30401 231978600 fd Linux raid autodetect Disk /dev/md0: 50.0 GB, 50041978880 bytes 2 heads, 4 sectors/track, 12217280 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0x Disk /dev/md0 doesn't contain a valid partition table Disk /dev/md1: 950.1 GB, 950183919616 bytes 2 heads, 4 sectors/track, 231978496 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0x Disk /dev/md1 doesn't contain a valid partition table Disk /dev/sdd: 120.0 GB, 120034123776 bytes 255 heads, 63 sectors/track, 14593 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x535bfd7a Device Boot Start End Blocks Id System /dev/sdd1 * 1 14219 114214086 83 Linux /dev/sdd2 14220 14593 30041555 Extended /dev/sdd5 14220 14593 3004123+ 82 Linux swap / Solaris Disk /dev/sde: 251.0 GB, 251000193024 bytes 255 heads, 63 sectors/track, 30515 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Device Boot Start End Blocks Id System /dev/sde1 1152112217401 fd Linux raid autodetect /dev/sde21522 30401 231978600 fd Linux raid autodetect Disk /dev/sdf: 250.0 GB, 250059350016 bytes 255 heads, 63 sectors/track, 30401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Device Boot Start End Blocks Id System /dev/sdf1 1152112217401 fd Linux raid autodetect /dev/sdf21522 30401 231978600 fd Linux raid autodetect Disk /dev/sdg: 250.0 GB, 250059350016 bytes 255 heads, 63 sectors/track, 30401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Device Boot Start End Blocks Id System /dev/sdg1 1152112217401 fd Linux raid autodetect /dev/sdg21522 30401 231978600 fd Linux raid autodetect $ sudo mdadm --detail /dev/md0
Re: 2.6.24-rc6 reproducible raid5 hang
On Thu, 10 Jan 2008, Neil Brown wrote: On Wednesday January 9, [EMAIL PROTECTED] wrote: On Sun, 2007-12-30 at 10:58 -0700, dean gaudet wrote: i have evidence pointing to d89d87965dcbe6fe4f96a2a7e8421b3a75f634d1 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d89d87965dcbe6fe4f96a2a7e8421b3a75f634d1 which was Neil's change in 2.6.22 for deferring generic_make_request until there's enough stack space for it. Commit d89d87965dcbe6fe4f96a2a7e8421b3a75f634d1 reduced stack utilization by preventing recursive calls to generic_make_request. However the following conditions can cause raid5 to hang until 'stripe_cache_size' is increased: Thanks for pursuing this guys. That explanation certainly sounds very credible. The generic_make_request_immed is a good way to confirm that we have found the bug, but I don't like it as a long term solution, as it just reintroduced the problem that we were trying to solve with the problematic commit. As you say, we could arrange that all request submission happens in raid5d and I think this is the right way to proceed. However we can still take some of the work into the thread that is submitting the IO by calling raid5d() at the end of make_request, like this. Can you test it please? Does it seem reasonable? Thanks, NeilBrown Signed-off-by: Neil Brown [EMAIL PROTECTED] it has passed 11h of the untar/diff/rm linux.tar.gz workload... that's pretty good evidence it works for me. thanks! Tested-by: dean gaudet [EMAIL PROTECTED] ### Diffstat output ./drivers/md/md.c|2 +- ./drivers/md/raid5.c |4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2008-01-07 13:32:10.0 +1100 +++ ./drivers/md/md.c 2008-01-10 11:08:02.0 +1100 @@ -5774,7 +5774,7 @@ void md_check_recovery(mddev_t *mddev) if (mddev-ro) return; - if (signal_pending(current)) { + if (current == mddev-thread-tsk signal_pending(current)) { if (mddev-pers-sync_request) { printk(KERN_INFO md: %s in immediate safe mode\n, mdname(mddev)); diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c --- .prev/drivers/md/raid5.c 2008-01-07 13:32:10.0 +1100 +++ ./drivers/md/raid5.c 2008-01-10 11:06:54.0 +1100 @@ -3432,6 +3432,7 @@ static int chunk_aligned_read(struct req } } +static void raid5d (mddev_t *mddev); static int make_request(struct request_queue *q, struct bio * bi) { @@ -3547,7 +3548,7 @@ static int make_request(struct request_q goto retry; } finish_wait(conf-wait_for_overlap, w); - handle_stripe(sh, NULL); + set_bit(STRIPE_HANDLE, sh-state); release_stripe(sh); } else { /* cannot get stripe for read-ahead, just give-up */ @@ -3569,6 +3570,7 @@ static int make_request(struct request_q test_bit(BIO_UPTODATE, bi-bi_flags) ? 0 : -EIO); } + raid5d(mddev); return 0; } - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc6 reproducible raid5 hang
On Jan 10, 2008 12:13 AM, dean gaudet [EMAIL PROTECTED] wrote: w.r.t. dan's cfq comments -- i really don't know the details, but does this mean cfq will misattribute the IO to the wrong user/process? or is it just a concern that CPU time will be spent on someone's IO? the latter is fine to me... the former seems sucky because with today's multicore systems CPU time seems cheap compared to IO. I do not see this affecting the time slicing feature of cfq, because as Neil says the work has to get done at some point. If I give up some of my slice working on someone else's I/O chances are the favor will be returned in kind since the code does not discriminate. The io-priority capability of cfq currently does not work as advertised with current MD since the priority is tied to the current thread and the thread that actually submits the i/o on a stripe is non-deterministic. So I do not see this change making the situation any worse. In fact, it may make it a bit better since there is a higher chance for the thread submitting i/o to MD to do its own i/o to the backing disks. Reviewed-by: Dan Williams [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: md rotates RAID5 spare at boot
Jed Davidow wrote: I have a RAID5 (5+1spare) setup that works perfectly well until I reboot. I have 6 drives (two different models) partitioned to give me 2 arrays, md0 and md1, that I use for /home and /var respectively. When I reboot, the system assembles each array, but swaps out what was the spare with one of the member drives. It then immediately detects a degraded array and rebuilds. After that, all is fine and testing has shown things to be working like they should. Until I reboot. Example: Built two arrays: /dev/md0 - /dev/sd[abcef]1 and /dev/md1 - /dev/sd[abcef]2 Added /dev/sdg1 and /dev/sdg2 as spares, and this works. One scenario when I reboot: md0 is assembled from sd[abceg]1; it's degraded and reports a spares missing event. md1 assembles correctly, spare is not missing I'm looking at the dmesg which follows and seeing md1 reconstructing. This seems to be at variance with assembles correctly here. That's the only thing which has struck me as worth mentioning so far. Any ideas? I have asked about this on various boards (some said UDEV rules would help, some thought the issue had to do with the /dev/sdX names changing, etc). I don't think those are applicable since dmesg reports the arrays assemble as soon as the disks are detected. Thanks in advance! INFO: (currently the boot drive (non raid) is sdd, otherwise all sd devices are part of the raid) fdisk: $ sudo fdisk -l Disk /dev/sda: 250.0 GB, 250059350016 bytes 255 heads, 63 sectors/track, 30401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Device Boot Start End Blocks Id System /dev/sda1 1152112217401 fd Linux raid autodetect /dev/sda21522 30401 231978600 fd Linux raid autodetect Disk /dev/sdb: 251.0 GB, 251000193024 bytes 255 heads, 63 sectors/track, 30515 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Device Boot Start End Blocks Id System /dev/sdb1 1152112217401 fd Linux raid autodetect /dev/sdb21522 30401 231978600 fd Linux raid autodetect Disk /dev/sdc: 251.0 GB, 251000193024 bytes 255 heads, 63 sectors/track, 30515 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Device Boot Start End Blocks Id System /dev/sdc1 1152112217401 fd Linux raid autodetect /dev/sdc21522 30401 231978600 fd Linux raid autodetect Disk /dev/md0: 50.0 GB, 50041978880 bytes 2 heads, 4 sectors/track, 12217280 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0x Disk /dev/md0 doesn't contain a valid partition table Disk /dev/md1: 950.1 GB, 950183919616 bytes 2 heads, 4 sectors/track, 231978496 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0x Disk /dev/md1 doesn't contain a valid partition table Disk /dev/sdd: 120.0 GB, 120034123776 bytes 255 heads, 63 sectors/track, 14593 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x535bfd7a Device Boot Start End Blocks Id System /dev/sdd1 * 1 14219 114214086 83 Linux /dev/sdd2 14220 14593 30041555 Extended /dev/sdd5 14220 14593 3004123+ 82 Linux swap / Solaris Disk /dev/sde: 251.0 GB, 251000193024 bytes 255 heads, 63 sectors/track, 30515 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Device Boot Start End Blocks Id System /dev/sde1 1152112217401 fd Linux raid autodetect /dev/sde21522 30401 231978600 fd Linux raid autodetect Disk /dev/sdf: 250.0 GB, 250059350016 bytes 255 heads, 63 sectors/track, 30401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Device Boot Start End Blocks Id System /dev/sdf1 1152112217401 fd Linux raid autodetect /dev/sdf21522 30401 231978600 fd Linux raid autodetect Disk /dev/sdg: 250.0 GB, 250059350016 bytes 255 heads, 63 sectors/track, 30401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Device Boot Start End Blocks Id System /dev/sdg1 1152112217401 fd Linux raid autodetect /dev/sdg21522 30401 231978600 fd Linux raid autodetect $ sudo mdadm --detail /dev/md0 (md1 shows similar info) /dev/md0: Version : 00.90.03 Creation Time : Sat Apr 7 23:32:58
Re: md rotates RAID5 spare at boot
Hi Bill, Maybe I'm using the wrong words... In this instance, on the previous boot, md1 was assembled from sd[efbac]2 and sdg2 was the spare. When I rebooted it assembled from sd[efbgc]2 and had no spare (appears that sdg was swapped in for sda). Since sdg2 had been the spare, the array is degraded and it rebuilds. I suppose this would be the case if, during the shutdown that sda2 was compromised (although I see nothing about sda2 as being faulty- I can manually add it immediately). But this happens just about every time I reboot, sometimes to only one of the two arrays, sometimes with the corresponding partitions on both arrays and sometimes with different partitions on each array. If something was physically wrong with one of the drives, I would expect it to swap in the spare for that drive each time. But it seems to swap in the spare randomly. Note- last night I shutdown completely, restarted after 30 sec and for the first time in a while did not have an issue. This time the drives were recognized and assigned device nodes in the 'correct' order (MB controller first, PCI controller next). Would device node assignments have any affect on how the array was being assembled? It looks to me like md inspects and attempts to assemble after each drive controller is scanned (from dmesg, there appears to be a failed bind on the first three devices after they are scanned, and then again when the second controller is scanned). Would the scan order cause a spare to be swapped in? Bill Davidsen wrote: Jed Davidow wrote: I have a RAID5 (5+1spare) setup that works perfectly well until I reboot. I have 6 drives (two different models) partitioned to give me 2 arrays, md0 and md1, that I use for /home and /var respectively. When I reboot, the system assembles each array, but swaps out what was the spare with one of the member drives. It then immediately detects a degraded array and rebuilds. After that, all is fine and testing has shown things to be working like they should. Until I reboot. Example: Built two arrays: /dev/md0 - /dev/sd[abcef]1 and /dev/md1 - /dev/sd[abcef]2 Added /dev/sdg1 and /dev/sdg2 as spares, and this works. One scenario when I reboot: md0 is assembled from sd[abceg]1; it's degraded and reports a spares missing event. md1 assembles correctly, spare is not missing I'm looking at the dmesg which follows and seeing md1 reconstructing. This seems to be at variance with assembles correctly here. That's the only thing which has struck me as worth mentioning so far. Any ideas? I have asked about this on various boards (some said UDEV rules would help, some thought the issue had to do with the /dev/sdX names changing, etc). I don't think those are applicable since dmesg reports the arrays assemble as soon as the disks are detected. Thanks in advance! INFO: (currently the boot drive (non raid) is sdd, otherwise all sd devices are part of the raid) fdisk: $ sudo fdisk -l Disk /dev/sda: 250.0 GB, 250059350016 bytes 255 heads, 63 sectors/track, 30401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Device Boot Start End Blocks Id System /dev/sda1 1152112217401 fd Linux raid autodetect /dev/sda21522 30401 231978600 fd Linux raid autodetect Disk /dev/sdb: 251.0 GB, 251000193024 bytes 255 heads, 63 sectors/track, 30515 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Device Boot Start End Blocks Id System /dev/sdb1 1152112217401 fd Linux raid autodetect /dev/sdb21522 30401 231978600 fd Linux raid autodetect Disk /dev/sdc: 251.0 GB, 251000193024 bytes 255 heads, 63 sectors/track, 30515 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Device Boot Start End Blocks Id System /dev/sdc1 1152112217401 fd Linux raid autodetect /dev/sdc21522 30401 231978600 fd Linux raid autodetect Disk /dev/md0: 50.0 GB, 50041978880 bytes 2 heads, 4 sectors/track, 12217280 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0x Disk /dev/md0 doesn't contain a valid partition table Disk /dev/md1: 950.1 GB, 950183919616 bytes 2 heads, 4 sectors/track, 231978496 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0x Disk /dev/md1 doesn't contain a valid partition table Disk /dev/sdd: 120.0 GB, 120034123776 bytes 255 heads, 63 sectors/track, 14593 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x535bfd7a Device Boot Start End Blocks Id System /dev/sdd1 * 1 14219
this goes go my megaraid probs too was: Re: md rotates RAID5 spare at boot
Jed Davidow wrote: I'm sorry- is this an inappropriate list to ask for help? There seemed to be a fair amount of that when I searched the archives, but I don't want to bug developers with my problems! Please let me know if I should find another place to ask for help (and please let me know where that might be!). I could also use help with my mega-raid 150 question. Don't know if I asked it wrong or it was the color shirt I was wearing. I am unfortunately running with such a dearth of knowledge on the topic that I don't really know the right questions to ask when diagnosing a performance problem. All I know is that there's very little documentation on this card, is even less documentation on the commandline tool to access/control the card and if it makes the most sense, I am perfectly willing to deep six the card on eBay and pick up a couple of reasonable speed serial ATA controller cards in its stead. the only reason I want to try and learn more about the hardware raid is because the problems I'm experiencing with my virtual machines on this platform mimic problems a customer of mine is experiencing and if I can fix them just by changing how the raid controller uses the discs, then that is a huge win. Personally, I think it's something a little deeper because VMware server seems to go out to lunch whenever there is a backup in the disk I/O queue. I'm seriously thinking about picking up esx as soon as the budget allows. I just need some good solid advice on what path I should take. ---eric -- Speech-recognition in use. It makes mistakes, I correct some. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 stuck in degraded, inactive and dirty mode
On Thursday January 10, [EMAIL PROTECTED] wrote: On Wed, Jan 09, 2008 at 07:16:34PM +1100, CaT wrote: But I suspect that --assemble --force would do the right thing. Without more details, it is hard to say for sure. I suspect so aswell but throwing caution into the wind erks me wrt this raid array. :) Sorry. Not to be a pain but considering the previous email with all the examine dumps, etc would the above be the way to go? I just don't want to have missed something and bugger the array up totally. Yes, definitely. The superblocks look perfectly normal for a single drive failure followed by a crash. So --assemble --force is the way to go. Technically you could have some data corruption if a write was under way at the time of the crash. In that case the parity block of that stripe could be wrong, so the recovered data for the missing device could be wrong. This is why you are required to use --force - to confirm that you are aware that there could be a problem. It would be worth running fsck just to be sure that nothing critical has been corrupted. Also if you have a recent backup, I wouldn't recycle it until I was fairly sure that all your data was really safe. But in my experience the chance of actual data corruption in this situation is fairly low. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: md rotates RAID5 spare at boot
On Thursday January 10, [EMAIL PROTECTED] wrote: It looks to me like md inspects and attempts to assemble after each drive controller is scanned (from dmesg, there appears to be a failed bind on the first three devices after they are scanned, and then again when the second controller is scanned). Would the scan order cause a spare to be swapped in? This suggests that mdadm --incremental is being used to assemble the arrays. Every time udev finds a new device, it gets added to whichever array is should be in. If it is called as mdadm --incremental --run, then it will get started as soon as possible, even if it is degraded. With the --run, it will wait until all devices are available. Even with mdadm --incremental --run, you shouldn't get a resync if the last device is added before the array is written to. What distro are you running? What does grep -R mdadm /etc/udev show? NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 stuck in degraded, inactive and dirty mode
On Fri, Jan 11, 2008 at 07:21:42AM +1100, Neil Brown wrote: On Thursday January 10, [EMAIL PROTECTED] wrote: On Wed, Jan 09, 2008 at 07:16:34PM +1100, CaT wrote: But I suspect that --assemble --force would do the right thing. Without more details, it is hard to say for sure. I suspect so aswell but throwing caution into the wind erks me wrt this raid array. :) Sorry. Not to be a pain but considering the previous email with all the examine dumps, etc would the above be the way to go? I just don't want to have missed something and bugger the array up totally. Yes, definitely. Cool. The superblocks look perfectly normal for a single drive failure followed by a crash. So --assemble --force is the way to go. Technically you could have some data corruption if a write was under way at the time of the crash. In that case the parity block of that I'd expect so as I think the crash situation is one of rather severe abruptness. stripe could be wrong, so the recovered data for the missing device could be wrong. This is why you are required to use --force - to confirm that you are aware that there could be a problem. Right. It would be worth running fsck just to be sure that nothing critical has been corrupted. Also if you have a recent backup, I wouldn't recycle it until I was fairly sure that all your data was really safe. I'll be doing a fsck and checking what data I can over the weekend to see what was fragged. I suspect it'll just be something rsynced due to the time of the crash. But in my experience the chance of actual data corruption in this situation is fairly low. Yaay. :) Thanks. I'll now go and put humpty together again. For some reason Johnny Cash's 'Ring of Fire' is playing in my head. -- To the extent that we overreact, we proffer the terrorists the greatest tribute. - High Court Judge Michael Kirby - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
The effects of multiple layers of block drivers
Hello, I am starting to dig into the Block subsystem to try and uncover the reason for some data I lost recently. My situation is that I have multiple block drivers on top of each other and am wondering how the effectss of a raid 5 rebuild would affect the block devices above it. The layers are raid 5 - lvm - cryptoloop. It seems that after the raid 5 device was rebuilt by adding in a new disk, that the cryptoloop doesn't have a valid ext3 partition on it. As a raid device re-builds is there ant rearranging of sectors or corresponding blocks that would effect another block device on top of it? Sincerely, Dennison Williams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: md rotates RAID5 spare at boot
distro: Ubuntu 7.10 Two files show up... 85-mdadm.rules: # This file causes block devices with Linux RAID (mdadm) signatures to # automatically cause mdadm to be run. # See udev(8) for syntax SUBSYSTEM==block, ACTION==add|change, ENV{ID_FS_TYPE}==linux_raid*, \ RUN+=watershed /sbin/mdadm --assemble --scan --no-degraded 65-mdadm.vol_id.rules: # This file causes Linux RAID (mdadm) block devices to be checked for # further filesystems if the array is active. # See udev(8) for syntax SUBSYSTEM!=block, GOTO=mdadm_end KERNEL!=md[0-9]*, GOTO=mdadm_end ACTION!=add|change, GOTO=mdadm_end # Check array status ATTR{md/array_state}==|clear|inactive, GOTO=mdadm_end # Obtain array information IMPORT{program}=/sbin/mdadm --detail --export $tempnode ENV{MD_NAME}==?*, SYMLINK+=disk/by-id/md-name-$env{MD_NAME} ENV{MD_UUID}==?*, SYMLINK+=disk/by-id/md-uuid-$env{MD_UUID} # by-uuid and by-label symlinks IMPORT{program}=vol_id --export $tempnode OPTIONS=link_priority=-100 ENV{ID_FS_USAGE}==filesystem|other|crypto, ENV{ID_FS_UUID_ENC}==?*, \ SYMLINK+=disk/by-uuid/$env{ID_FS_UUID_ENC} ENV{ID_FS_USAGE}==filesystem|other, ENV{ID_FS_LABEL_ENC}==?*, \ SYMLINK+=disk/by-label/$env{ID_FS_LABEL_ENC} I see. So udev is invoking the assemble command as soon as it detects the devices. So is it possible that the spare is not the last drive to be detected and mdadm assembles too soon? Neil Brown wrote: On Thursday January 10, [EMAIL PROTECTED] wrote: It looks to me like md inspects and attempts to assemble after each drive controller is scanned (from dmesg, there appears to be a failed bind on the first three devices after they are scanned, and then again when the second controller is scanned). Would the scan order cause a spare to be swapped in? This suggests that mdadm --incremental is being used to assemble the arrays. Every time udev finds a new device, it gets added to whichever array is should be in. If it is called as mdadm --incremental --run, then it will get started as soon as possible, even if it is degraded. With the --run, it will wait until all devices are available. Even with mdadm --incremental --run, you shouldn't get a resync if the last device is added before the array is written to. What distro are you running? What does grep -R mdadm /etc/udev show? NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: md rotates RAID5 spare at boot
One quick question about those rules. The 65-mdadm rule looks like it checks ACTIVE arrays for filesystems, and the 85 rule assembles arrays. Shouldn't they run in the other order? distro: Ubuntu 7.10 Two files show up... 85-mdadm.rules: # This file causes block devices with Linux RAID (mdadm) signatures to # automatically cause mdadm to be run. # See udev(8) for syntax SUBSYSTEM==block, ACTION==add|change, ENV{ID_FS_TYPE}==linux_raid*, \ RUN+=watershed /sbin/mdadm --assemble --scan --no-degraded 65-mdadm.vol_id.rules: # This file causes Linux RAID (mdadm) block devices to be checked for # further filesystems if the array is active. # See udev(8) for syntax SUBSYSTEM!=block, GOTO=mdadm_end KERNEL!=md[0-9]*, GOTO=mdadm_end ACTION!=add|change, GOTO=mdadm_end # Check array status ATTR{md/array_state}==|clear|inactive, GOTO=mdadm_end # Obtain array information IMPORT{program}=/sbin/mdadm --detail --export $tempnode ENV{MD_NAME}==?*, SYMLINK+=disk/by-id/md-name-$env{MD_NAME} ENV{MD_UUID}==?*, SYMLINK+=disk/by-id/md-uuid-$env{MD_UUID} # by-uuid and by-label symlinks IMPORT{program}=vol_id --export $tempnode OPTIONS=link_priority=-100 ENV{ID_FS_USAGE}==filesystem|other|crypto, ENV{ID_FS_UUID_ENC}==?*, \ SYMLINK+=disk/by-uuid/$env{ID_FS_UUID_ENC} ENV{ID_FS_USAGE}==filesystem|other, ENV{ID_FS_LABEL_ENC}==?*, \ SYMLINK+=disk/by-label/$env{ID_FS_LABEL_ENC} I see. So udev is invoking the assemble command as soon as it detects the devices. So is it possible that the spare is not the last drive to be detected and mdadm assembles too soon? Neil Brown wrote: On Thursday January 10, [EMAIL PROTECTED] wrote: It looks to me like md inspects and attempts to assemble after each drive controller is scanned (from dmesg, there appears to be a failed bind on the first three devices after they are scanned, and then again when the second controller is scanned). Would the scan order cause a spare to be swapped in? This suggests that mdadm --incremental is being used to assemble the arrays. Every time udev finds a new device, it gets added to whichever array is should be in. If it is called as mdadm --incremental --run, then it will get started as soon as possible, even if it is degraded. With the --run, it will wait until all devices are available. Even with mdadm --incremental --run, you shouldn't get a resync if the last device is added before the array is written to. What distro are you running? What does grep -R mdadm /etc/udev show? NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: md rotates RAID5 spare at boot
On Thursday January 10, [EMAIL PROTECTED] wrote: distro: Ubuntu 7.10 Two files show up... 85-mdadm.rules: # This file causes block devices with Linux RAID (mdadm) signatures to # automatically cause mdadm to be run. # See udev(8) for syntax SUBSYSTEM==block, ACTION==add|change, ENV{ID_FS_TYPE}==linux_raid*, \ RUN+=watershed /sbin/mdadm --assemble --scan --no-degraded I see. So udev is invoking the assemble command as soon as it detects the devices. So is it possible that the spare is not the last drive to be detected and mdadm assembles too soon? The --no-degraded' should stop it from assembling until all expected devices have been found. It could assemble before the spare is found, but should not assemble before all the data devices have been found. The dmesg trace you included in your first mail doesn't actually show anything wrong - it never starts and incomplete array. Can you try again and get a trace where there definitely is a rebuild happening. And please don't drop linux-raid from the 'cc' list. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: md rotates RAID5 spare at boot
On Thursday January 10, [EMAIL PROTECTED] wrote: One quick question about those rules. The 65-mdadm rule looks like it checks ACTIVE arrays for filesystems, and the 85 rule assembles arrays. Shouldn't they run in the other order? They are fine. The '65' rule applies to arrays. I.e. it fires on an array device once it has been started. The '85' rule applies to component devices. They are quite independent. NeilBrown distro: Ubuntu 7.10 Two files show up... 85-mdadm.rules: # This file causes block devices with Linux RAID (mdadm) signatures to # automatically cause mdadm to be run. # See udev(8) for syntax SUBSYSTEM==block, ACTION==add|change, ENV{ID_FS_TYPE}==linux_raid*, \ RUN+=watershed /sbin/mdadm --assemble --scan --no-degraded 65-mdadm.vol_id.rules: # This file causes Linux RAID (mdadm) block devices to be checked for # further filesystems if the array is active. # See udev(8) for syntax SUBSYSTEM!=block, GOTO=mdadm_end KERNEL!=md[0-9]*, GOTO=mdadm_end ACTION!=add|change, GOTO=mdadm_end # Check array status ATTR{md/array_state}==|clear|inactive, GOTO=mdadm_end # Obtain array information IMPORT{program}=/sbin/mdadm --detail --export $tempnode ENV{MD_NAME}==?*, SYMLINK+=disk/by-id/md-name-$env{MD_NAME} ENV{MD_UUID}==?*, SYMLINK+=disk/by-id/md-uuid-$env{MD_UUID} # by-uuid and by-label symlinks IMPORT{program}=vol_id --export $tempnode OPTIONS=link_priority=-100 ENV{ID_FS_USAGE}==filesystem|other|crypto, ENV{ID_FS_UUID_ENC}==?*, \ SYMLINK+=disk/by-uuid/$env{ID_FS_UUID_ENC} ENV{ID_FS_USAGE}==filesystem|other, ENV{ID_FS_LABEL_ENC}==?*, \ SYMLINK+=disk/by-label/$env{ID_FS_LABEL_ENC} I see. So udev is invoking the assemble command as soon as it detects the devices. So is it possible that the spare is not the last drive to be detected and mdadm assembles too soon? Neil Brown wrote: On Thursday January 10, [EMAIL PROTECTED] wrote: It looks to me like md inspects and attempts to assemble after each drive controller is scanned (from dmesg, there appears to be a failed bind on the first three devices after they are scanned, and then again when the second controller is scanned). Would the scan order cause a spare to be swapped in? This suggests that mdadm --incremental is being used to assemble the arrays. Every time udev finds a new device, it gets added to whichever array is should be in. If it is called as mdadm --incremental --run, then it will get started as soon as possible, even if it is degraded. With the --run, it will wait until all devices are available. Even with mdadm --incremental --run, you shouldn't get a resync if the last device is added before the array is written to. What distro are you running? What does grep -R mdadm /etc/udev show? NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: md rotates RAID5 spare at boot
(Sorry- yes it looks like I posted an incorrect dmesg extract) $egrep sd|md|raid|scsi /var/log/dmesg.0 [ 36.112449] md: linear personality registered for level -1 [ 36.117197] md: multipath personality registered for level -4 [ 36.121795] md: raid0 personality registered for level 0 [ 36.126950] md: raid1 personality registered for level 1 [ 36.131424] raid5: automatically using best checksumming function: pIII_sse [ 36.150020] raid5: using function: pIII_sse (4564.000 MB/sec) [ 36.218015] raid6: int32x1780 MB/s [ 36.285943] raid6: int32x2902 MB/s [ 36.353961] raid6: int32x4667 MB/s [ 36.421869] raid6: int32x8528 MB/s [ 36.489811] raid6: mmxx1 1813 MB/s [ 36.557775] raid6: mmxx2 2123 MB/s [ 36.625763] raid6: sse1x11101 MB/s [ 36.693717] raid6: sse1x21898 MB/s [ 36.761688] raid6: sse2x12227 MB/s [ 36.829647] raid6: sse2x23178 MB/s [ 36.829695] raid6: using algorithm sse2x2 (3178 MB/s) [ 36.829744] md: raid6 personality registered for level 6 [ 36.829793] md: raid5 personality registered for level 5 [ 36.829842] md: raid4 personality registered for level 4 [ 36.853475] md: raid10 personality registered for level 10 [ 37.781513] scsi0 : sata_sil [ 37.781628] scsi1 : sata_sil [ 37.781724] scsi2 : sata_sil [ 37.781820] scsi3 : sata_sil [ 37.781922] ata1: SATA max UDMA/100 cmd 0xf88c0080 ctl 0xf88c008a bmdma 0xf88c irq 20 [ 37.781997] ata2: SATA max UDMA/100 cmd 0xf88c00c0 ctl 0xf88c00ca bmdma 0xf88c0008 irq 20 [ 37.782069] ata3: SATA max UDMA/100 cmd 0xf88c0280 ctl 0xf88c028a bmdma 0xf88c0200 irq 20 [ 37.782142] ata4: SATA max UDMA/100 cmd 0xf88c02c0 ctl 0xf88c02ca bmdma 0xf88c0208 irq 20 [ 39.577812] scsi 0:0:0:0: Direct-Access ATA WDC WD2500JD-00H 08.0 PQ: 0 ANSI: 5 [ 39.578027] scsi 1:0:0:0: Direct-Access ATA Maxtor 7L250S0 BACE PQ: 0 ANSI: 5 [ 39.578234] scsi 3:0:0:0: Direct-Access ATA Maxtor 7L250S0 BACE PQ: 0 ANSI: 5 [ 39.632483] scsi4 : ata_piix [ 39.632591] scsi5 : ata_piix [ 39.632812] ata5: PATA max UDMA/133 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x0001f000 irq 14 [ 39.634522] ata6: PATA max UDMA/133 cmd 0x00010170 ctl 0x00010376 bmdma 0x0001f008 irq 15 [ 39.634924] sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) [ 39.634995] sd 0:0:0:0: [sda] Write Protect is off [ 39.635048] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 39.635076] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 39.635218] sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) [ 39.635292] sd 0:0:0:0: [sda] Write Protect is off [ 39.635350] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 39.635380] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 39.635462] sda: sda1 sda2 [ 39.650092] sd 0:0:0:0: [sda] Attached SCSI disk [ 39.650226] sd 1:0:0:0: [sdb] 490234752 512-byte hardware sectors (251000 MB) [ 39.650296] sd 1:0:0:0: [sdb] Write Protect is off [ 39.650348] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [ 39.650379] sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 39.650505] sd 1:0:0:0: [sdb] 490234752 512-byte hardware sectors (251000 MB) [ 39.650573] sd 1:0:0:0: [sdb] Write Protect is off [ 39.650625] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [ 39.650657] sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 39.650727] sdb: sdb1 sdb2 [ 39.667599] sd 1:0:0:0: [sdb] Attached SCSI disk [ 39.667719] sd 3:0:0:0: [sdc] 490234752 512-byte hardware sectors (251000 MB) [ 39.667788] sd 3:0:0:0: [sdc] Write Protect is off [ 39.667840] sd 3:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [ 39.667871] sd 3:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 39.667997] sd 3:0:0:0: [sdc] 490234752 512-byte hardware sectors (251000 MB) [ 39.668064] sd 3:0:0:0: [sdc] Write Protect is off [ 39.668116] sd 3:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [ 39.668146] sd 3:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 39.668213] sdc: sdc1 sdc2 [ 39.692703] sd 3:0:0:0: [sdc] Attached SCSI disk [ 39.699348] sd 0:0:0:0: Attached scsi generic sg0 type 0 [ 39.699570] sd 1:0:0:0: Attached scsi generic sg1 type 0 [ 39.699786] sd 3:0:0:0: Attached scsi generic sg2 type 0 [ 39.834560] md: md0 stopped. [ 39.870361] md: bindsdc1 [ 39.870527] md: md1 stopped. [ 39.910999] md: md0 stopped. [ 39.911064] md: unbindsdc1 [ 39.911120] md: export_rdev(sdc1) [ 39.929760] md: bindsda1 [ 39.929953] md: bindsdc1 [ 39.930139] md: bindsdb1 [ 39.930231] md: md1 stopped. [ 39.932468] md: bindsdc2 [ 39.932674] md: bindsda2 [ 39.932860] md: bindsdb2 [ 40.411001] scsi 4:0:1:0: CD-ROMLITE-ON DVDRW SOHW-1213S TS09 PQ: 0 ANSI: 5 [ 40.411152] scsi 4:0:1:0: Attached
Re: md rotates RAID5 spare at boot
On Thursday January 10, [EMAIL PROTECTED] wrote: (Sorry- yes it looks like I posted an incorrect dmesg extract) This still doesn't seem to match your description. I see: [ 41.247389] md: bindsdf1 [ 41.247584] md: bindsdb1 [ 41.247787] md: bindsda1 [ 41.247971] md: bindsdc1 [ 41.248151] md: bindsdg1 [ 41.248325] md: bindsde1 [ 41.256718] raid5: device sde1 operational as raid disk 0 [ 41.256771] raid5: device sdc1 operational as raid disk 4 [ 41.256821] raid5: device sda1 operational as raid disk 3 [ 41.256870] raid5: device sdb1 operational as raid disk 2 [ 41.256919] raid5: device sdf1 operational as raid disk 1 [ 41.257426] raid5: allocated 5245kB for md0 [ 41.257476] raid5: raid level 5 set md0 active with 5 out of 5 devices, algorithm 2 which looks like 'md0' started with 5 of 5 drives, plus g1 is there as a spare. And [ 41.312250] md: bindsdf2 [ 41.312476] md: bindsdb2 [ 41.312711] md: bindsdg2 [ 41.312922] md: bindsdc2 [ 41.313138] md: bindsda2 [ 41.313343] md: bindsde2 [ 41.313452] md: md1: raid array is not clean -- starting background reconstruction [ 41.322189] raid5: device sde2 operational as raid disk 0 [ 41.322243] raid5: device sdc2 operational as raid disk 4 [ 41.322292] raid5: device sdg2 operational as raid disk 3 [ 41.322342] raid5: device sdb2 operational as raid disk 2 [ 41.322391] raid5: device sdf2 operational as raid disk 1 [ 41.322823] raid5: allocated 5245kB for md1 [ 41.322872] raid5: raid level 5 set md1 active with 5 out of 5 devices, algorithm 2 md1 also assembled with 5/5 drives and sda2 as a spare. This one was not shut down cleanly so it started a resync. But there is not evidence of anything starting degraded. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: The effects of multiple layers of block drivers
On Thursday January 10, [EMAIL PROTECTED] wrote: Hello, I am starting to dig into the Block subsystem to try and uncover the reason for some data I lost recently. My situation is that I have multiple block drivers on top of each other and am wondering how the effectss of a raid 5 rebuild would affect the block devices above it. It should just work - no surprises. raid5 is just a block device like any other. When doing a rebuild it might be a bit slower, but that is all. The layers are raid 5 - lvm - cryptoloop. It seems that after the raid 5 device was rebuilt by adding in a new disk, that the cryptoloop doesn't have a valid ext3 partition on it. There was a difference of opinion between raid5 and dm-crypt which could cause some corruption. What kernel version are you using, and are you using dm-crypt or loop (e..g losetup) with encryption? As a raid device re-builds is there ant rearranging of sectors or corresponding blocks that would effect another block device on top of it? No. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc6 reproducible raid5 hang
On Thursday January 10, [EMAIL PROTECTED] wrote: On Jan 10, 2008 12:13 AM, dean gaudet [EMAIL PROTECTED] wrote: w.r.t. dan's cfq comments -- i really don't know the details, but does this mean cfq will misattribute the IO to the wrong user/process? or is it just a concern that CPU time will be spent on someone's IO? the latter is fine to me... the former seems sucky because with today's multicore systems CPU time seems cheap compared to IO. I do not see this affecting the time slicing feature of cfq, because as Neil says the work has to get done at some point. If I give up some of my slice working on someone else's I/O chances are the favor will be returned in kind since the code does not discriminate. The io-priority capability of cfq currently does not work as advertised with current MD since the priority is tied to the current thread and the thread that actually submits the i/o on a stripe is non-deterministic. So I do not see this change making the situation any worse. In fact, it may make it a bit better since there is a higher chance for the thread submitting i/o to MD to do its own i/o to the backing disks. Reviewed-by: Dan Williams [EMAIL PROTECTED] Thanks. But I suspect you didn't test it with a bitmap :-) I ran the mdadm test suite and it hit a problem - easy enough to fix. I'll look out for any other possible related problem (due to raid5d running in different processes) and then submit it. Thanks, NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc6 reproducible raid5 hang
On Fri, 11 Jan 2008, Neil Brown wrote: Thanks. But I suspect you didn't test it with a bitmap :-) I ran the mdadm test suite and it hit a problem - easy enough to fix. damn -- i lost my bitmap 'cause it was external and i didn't have things set up properly to pick it up after a reboot :) if you send an updated patch i'll give it another spin... -dean - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html