> buh...@lothlorien.nfbcal.org (Brian Buhrow) wrote: > > > Hello. I've just encountered a strange problem with > > raidframe under NetBSD-5.1 that I can't immediately explain. > > > > this machine has been runing a raid set since 2007. The raid > > set was originally constructed under NetBSD-3. For the past year, > > it's been running 5.0_stable with sources from > > July 2009 or so without a problem. Last night, I installed > > NetBSD-5.1 with sources from May 23 2012 or so. Now, the raid0 set > > fails the first component with an i/o error with no corresponding > > disk errors underneath. Trying to reconstruct to the failed > > component also fails with an error of 22, invalid argument. Looking > > at the dmesg output compared with the output of raidctl -s reveals > > the problem. The size of the raid in the dmesg output is bogus, and, > > if the raid driver dries to write as many blocks as is reported by > > the configuration output, it will surely fail as it does. However, > > raidctl -g /dev/wd0a looks ok and the underlying disk label > > on /dev/wd0a looks ok as well. Where does the raid driver get the > > numbers it reports on bootup? Also, there is a second raid set on > > this machine, the second half of the same two drives, which was > > constructed at the same time. It works fine with the new code. > > > > Below is the output of the boot sequence before the upgrade, > > and then the boot sequence after the upgrade. Below that are the > > output of raidctl -s raid0 and raidctl -g /dev/wd0a raid0. > > It looks to me like something is not zero'd out in the > > component label that should be, but some change in the raid code is > > no longer ignoring the noise in the component label. > > Correct. > > > Any ideas? > > There was some code added a while back to handle components whose sizes > were larger than 32-bit. But 5.1_stable should have the code to handle > those 'bogus' values in the component label and do the appropriate > thing (see rf_fix_old_label_size in rf_netbsdkintf.c version > 1.250.4.11, for example). > > What is your code rev for src/sys/dev/raidframe/rf_netbsdkintf.c ?
looks like netbsd-5 is missing this change: revision 1.284 date: 2011/03/18 23:53:26; author: mrg; state: Exp; lines: +27 -11 apply the fix_label hack to partitionSizeHi as well. it's needed there. to do so, move the call to fix the label inside of rf_reasonable_label() itself, so we can fix the partition sizes before calling rf_component_label_partitionsize() itself. fixes the failure mode where i had garbage not in numBlocksHi but in partitionSizeHi, and the check against rf_component_label_partitionsize() would fail and my raid would not auto-configure. .mrg.