Re: [zfs-discuss] Re: Re: Managed to corrupt my pool
On Wed, Dec 06, 2006 at 12:35:58PM -0800, Jim Hranicky wrote: > > If those are the original path ids, and you didn't > > move the disks on the bus? Why is the is_spare flag > > Well, I'm not sure, but these drives were set as spares in another pool > I deleted -- should I have done something to the drives (fdisk?) before > rearranging it? > > The rest of the options are spitting out a bunch of stuff I'll be > glad to post links too, but if the problem is that the drives are > erroneously marked as spares I'll re-init them and start over. There are known issues with the way spares are tracked and recorded on disk that can result in a variety of strange behavior in exceptional circumstances. We are working on resolving these issues. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Managed to corrupt my pool
Hold fire on the re-init until one of the devs chips in, maybe I'm barking up the wrong tree ;) --a This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Managed to corrupt my pool
> If those are the original path ids, and you didn't > move the disks on the bus? Why is the is_spare flag Well, I'm not sure, but these drives were set as spares in another pool I deleted -- should I have done something to the drives (fdisk?) before rearranging it? The rest of the options are spitting out a bunch of stuff I'll be glad to post links too, but if the problem is that the drives are erroneously marked as spares I'll re-init them and start over. Jim This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Managed to corrupt my pool
Hi Jim, That looks interesting though, I'm not a zfs expert by any means but look at some of the properties of the children elements of the mirror:- version=3 name='zmir' state=0 txg=770 pool_guid=5904723747772934703 vdev_tree type='root' id=0 guid=5904723747772934703 children[0] type='mirror' id=0 guid=15067187713781123481 metaslab_array=15 metaslab_shift=28 ashift=9 asize=36690722816 children[0] type='disk' id=0 guid=8544021753105415508 [b]path='/dev/dsk/c3t3d0s0'[/b] devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 [b]is_spare=1[/b] DTL=19 children[1] type='disk' id=1 guid=3579059219373561470 [b]path='/dev/dsk/c3t4d0s0'[/b] devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 [b]is_spare=1[/b] DTL=20 If those are the original path ids, and you didn't move the disks on the bus? Why is the is_spare flag set? There are a lot of options to zdb, some can produce a lot of output. Try zdb zmir Check the drive label contents with zdb -l /dev/dsk/c3t0d0s0 zdb -l /dev/dsk/c3t1d0s0 zdb -l /dev/dsk/c3t3d0s0 zdb -l /dev/dsk/c3t4d0s0 Uberblock info with zdb -uuu zmir And dataset info with zdb -dd zmir There are more options, and they give even more info if you repeat the option letter more times ( especially the -d flag... ) These might be worth posting to help one of the developers spot something. Cheers, Alan This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Managed to corrupt my pool
Here's the output of zdb: zmir version=3 name='zmir' state=0 txg=770 pool_guid=5904723747772934703 vdev_tree type='root' id=0 guid=5904723747772934703 children[0] type='mirror' id=0 guid=15067187713781123481 metaslab_array=15 metaslab_shift=28 ashift=9 asize=36690722816 children[0] type='disk' id=0 guid=8544021753105415508 path='/dev/dsk/c3t3d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 is_spare=1 DTL=19 children[1] type='disk' id=1 guid=3579059219373561470 path='/dev/dsk/c3t4d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 is_spare=1 DTL=20 It doesn't seem to give much information, and I don't know any of the "secret options" :-> Can anyone at all give me a good reason why this did happen, or give me any options to zdb so I can find out? I can try plugging the spun-down disk back in and seeing if it can recover, although that's not going to be an option if this happens for real... Jim This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Managed to corrupt my pool
> I think the pool is busted. Even the message printed in your > previous email is bad: > >DATASET OBJECT RANGE > 15 0 lvl=4294967295 blkid=0 > > as level is way out of range. I think this could be from dmu_objset_open_impl(). It sets object to 0 and level to -1 (= 4294967295). [Hmmm, this also seems to indicate a truncation from 64 to 32 bits somewhere.] Would zdb would show any more detail? (Actually, it looks like the ZIL also sets object to 0 and level to -1 when accessing its blocks, but since the ZIL was disabled, I'd guess this isn't the issue here.) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss