Re: [zfs-discuss] Re: Re: Managed to corrupt my pool

2006-12-06 Thread Eric Schrock
On Wed, Dec 06, 2006 at 12:35:58PM -0800, Jim Hranicky wrote:
> > If those are the original path ids, and you didn't
> > move the disks on the bus?  Why is the is_spare flag
> 
> Well, I'm not sure, but these drives were set as spares in another pool 
> I deleted -- should I have done something to the drives (fdisk?) before
> rearranging it?
> 
> The rest of the options are spitting out a bunch of stuff I'll be
> glad to post links too, but if the problem is that the drives are
> erroneously marked as spares I'll re-init them and start over.

There are known issues with the way spares are tracked and recorded on
disk that can result in a variety of strange behavior in exceptional
circumstances.  We are working on resolving these issues.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Managed to corrupt my pool

2006-12-06 Thread Alan Romeril
Hold fire on the re-init until one of the devs chips in, maybe I'm barking up 
the wrong tree ;)

--a
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Managed to corrupt my pool

2006-12-06 Thread Jim Hranicky
> If those are the original path ids, and you didn't
> move the disks on the bus?  Why is the is_spare flag

Well, I'm not sure, but these drives were set as spares in another pool 
I deleted -- should I have done something to the drives (fdisk?) before
rearranging it?

The rest of the options are spitting out a bunch of stuff I'll be
glad to post links too, but if the problem is that the drives are
erroneously marked as spares I'll re-init them and start over.

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Managed to corrupt my pool

2006-12-06 Thread Alan Romeril
Hi Jim,
That looks interesting though, I'm not a zfs expert by any means but look at 
some of the properties of the children elements of the mirror:-

version=3
name='zmir'
state=0
txg=770
pool_guid=5904723747772934703
vdev_tree
type='root'
id=0
guid=5904723747772934703
children[0]
type='mirror'
id=0
guid=15067187713781123481
metaslab_array=15
metaslab_shift=28
ashift=9
asize=36690722816
children[0]
type='disk'
id=0
guid=8544021753105415508
[b]path='/dev/dsk/c3t3d0s0'[/b]
devid='id1,[EMAIL PROTECTED]/a'
whole_disk=1
[b]is_spare=1[/b]
DTL=19
children[1]
type='disk'
id=1
guid=3579059219373561470
[b]path='/dev/dsk/c3t4d0s0'[/b]
devid='id1,[EMAIL PROTECTED]/a'
whole_disk=1
[b]is_spare=1[/b]
DTL=20

If those are the original path ids, and you didn't move the disks on the bus?  
Why is the is_spare flag set?

There are a lot of options to zdb, some can produce a lot of output.
Try

zdb zmir

Check the drive label contents with 

zdb -l /dev/dsk/c3t0d0s0
zdb -l /dev/dsk/c3t1d0s0
zdb -l /dev/dsk/c3t3d0s0
zdb -l /dev/dsk/c3t4d0s0

Uberblock info with 

zdb -uuu zmir

And dataset info with

zdb -dd zmir

There are more options, and they give even more info if you repeat the option 
letter more times ( especially the -d flag... )

These might be worth posting to help one of the developers spot something.
Cheers,
Alan
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Managed to corrupt my pool

2006-12-06 Thread Jim Hranicky
Here's the output of zdb:

zmir
version=3
name='zmir'
state=0
txg=770
pool_guid=5904723747772934703
vdev_tree
type='root'
id=0
guid=5904723747772934703
children[0]
type='mirror'
id=0
guid=15067187713781123481
metaslab_array=15
metaslab_shift=28
ashift=9
asize=36690722816
children[0]
type='disk'
id=0
guid=8544021753105415508
path='/dev/dsk/c3t3d0s0'
devid='id1,[EMAIL PROTECTED]/a'
whole_disk=1
is_spare=1
DTL=19
children[1]
type='disk'
id=1
guid=3579059219373561470
path='/dev/dsk/c3t4d0s0'
devid='id1,[EMAIL PROTECTED]/a'
whole_disk=1
is_spare=1
DTL=20

It doesn't seem to give much information, and I don't know any of
the "secret options" :->

Can anyone at all give me a good reason why this did happen,
or give me any options to zdb so I can find out?

I can try plugging the spun-down disk back in and seeing if it can
recover, although that's not going to be an option if this happens
for real...

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Managed to corrupt my pool

2006-12-05 Thread Anton B. Rang
> I think the pool is busted. Even the message printed in your
> previous email is bad:
> 
>DATASET  OBJECT  RANGE
> 15   0   lvl=4294967295 blkid=0
> 
> as level is way out of range.

I think this could be from dmu_objset_open_impl().

It sets object to 0 and level to -1 (= 4294967295).  [Hmmm, this also seems to 
indicate a truncation from 64 to 32 bits somewhere.]

Would zdb would show any more detail?

(Actually, it looks like the ZIL also sets object to 0 and level to -1 when 
accessing its blocks, but since the ZIL was disabled, I'd guess this isn't the 
issue here.)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss