Re: [zfs-discuss] pool metadata has duplicate children

2013-01-10 Thread Peter Jeremy
On 2013-Jan-08 21:30:57 -0800, John Giannandrea j...@meer.net wrote:
Notice that in the absence of the faulted da2 the OS has assigned da3 to da2 
etc.  I suspect this was part of the original problem in creating a label with 
two da2s

The primary vdev identifier is tha guid.  Tha path is of secondary
importance (ZFS should automatically recover from juggled disks
without an issue - and has for me).

Try running zdb -l on each of your pool disks and verify that
each has 4 identical labels, and that the 5 guids (one on each
disk) are unique and match the vdev_tree you got from zdb.

My suspicion is that you've somehow lost the disk with the guid
3419704811362497180.

twa0: 3ware 9000 series Storage Controller
twa0: INFO: (0x15: 0x1300): Controller details:: Model 9500S-8, 8 ports, 
Firmware FE9X 2.08.00.006
da0 at twa0 bus 0 scbus0 target 0 lun 0
da1 at twa0 bus 0 scbus0 target 1 lun 0
da2 at twa0 bus 0 scbus0 target 2 lun 0
da3 at twa0 bus 0 scbus0 target 3 lun 0
da4 at twa0 bus 0 scbus0 target 4 lun 0

Are these all JBOD devices?

-- 
Peter Jeremy


pgpykCYjUFT7j.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] pool metadata has duplicate children

2013-01-08 Thread John Giannandrea

I seem to have managed to end up with a pool that is confused abut its children 
disks.  The pool is faulted with corrupt metadata:

  pool: d
 state: FAULTED
status: The pool metadata is corrupted and the pool cannot be opened.
action: Destroy and re-create the pool from
a backup source.
   see: http://illumos.org/msg/ZFS-8000-72
  scan: none requested
config:

NAME STATE READ WRITE CKSUM
dFAULTED  0 0 1
  raidz1-0   FAULTED  0 0 6
da1  ONLINE   0 0 0
3419704811362497180  OFFLINE  0 0 0  was /dev/da2
da3  ONLINE   0 0 0
da4  ONLINE   0 0 0
da5  ONLINE   0 0 0

But if I look at the labels on all the online disks I see this:

# zdb -ul /dev/da1 | egrep '(children|path)'
children[0]:
path: '/dev/da1'
children[1]:
path: '/dev/da2'
children[2]:
path: '/dev/da2'
children[3]:
path: '/dev/da3'
children[4]:
path: '/dev/da4'
...

But the offline disk (da2) shows the older correct label:

children[0]:
path: '/dev/da1'
children[1]:
path: '/dev/da2'
children[2]:
path: '/dev/da3'
children[3]:
path: '/dev/da4'
children[4]:
path: '/dev/da5'

zpool import -F doesnt help because none of the labels on the unfaulted disks 
seem to have the right label.  And unless I can import the pool I cant replace 
the bad drive.

Also zpool seems to really not want to import a raidz1 pool with one faulted 
drive even though that should be readable.  I have read about the undocumented 
-V option but dont know if that would help.

I got into this state when i noticed the pool was DEGRADED and was trying to 
replace the bad disk.   I am debugging it under FreeBSD 9.1 

Suggestions of things to try welcome, Im more interested in learning what went 
wrong than restoring the pool.  I dont think I should have been able to go from 
one offline drive to a unrecoverable pool this easily.

-jg

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] pool metadata has duplicate children

2013-01-08 Thread Gregg Wonderly
Have you tried importing the pool with that drive completely unplugged?  Which 
HBA are you using?  How many of these disks are on same or separate HBAs?

Gregg Wonderly


On Jan 8, 2013, at 12:05 PM, John Giannandrea j...@meer.net wrote:

 
 I seem to have managed to end up with a pool that is confused abut its 
 children disks.  The pool is faulted with corrupt metadata:
 
  pool: d
 state: FAULTED
 status: The pool metadata is corrupted and the pool cannot be opened.
 action: Destroy and re-create the pool from
   a backup source.
   see: http://illumos.org/msg/ZFS-8000-72
  scan: none requested
 config:
 
   NAME STATE READ WRITE CKSUM
   dFAULTED  0 0 1
 raidz1-0   FAULTED  0 0 6
   da1  ONLINE   0 0 0
   3419704811362497180  OFFLINE  0 0 0  was /dev/da2
   da3  ONLINE   0 0 0
   da4  ONLINE   0 0 0
   da5  ONLINE   0 0 0
 
 But if I look at the labels on all the online disks I see this:
 
 # zdb -ul /dev/da1 | egrep '(children|path)'
children[0]:
path: '/dev/da1'
children[1]:
path: '/dev/da2'
children[2]:
path: '/dev/da2'
children[3]:
path: '/dev/da3'
children[4]:
path: '/dev/da4'
...
 
 But the offline disk (da2) shows the older correct label:
 
children[0]:
path: '/dev/da1'
children[1]:
path: '/dev/da2'
children[2]:
path: '/dev/da3'
children[3]:
path: '/dev/da4'
children[4]:
path: '/dev/da5'
 
 zpool import -F doesnt help because none of the labels on the unfaulted disks 
 seem to have the right label.  And unless I can import the pool I cant 
 replace the bad drive.
 
 Also zpool seems to really not want to import a raidz1 pool with one faulted 
 drive even though that should be readable.  I have read about the 
 undocumented -V option but dont know if that would help.
 
 I got into this state when i noticed the pool was DEGRADED and was trying to 
 replace the bad disk.   I am debugging it under FreeBSD 9.1 
 
 Suggestions of things to try welcome, Im more interested in learning what 
 went wrong than restoring the pool.  I dont think I should have been able to 
 go from one offline drive to a unrecoverable pool this easily.
 
 -jg
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] pool metadata has duplicate children

2013-01-08 Thread John Giannandrea

Gregg Wonderly gregg...@gmail.com wrote:
 Have you tried importing the pool with that drive completely unplugged?  

Thanks for your reply.   I just tried that.  zpool import now says:

   pool: d
 id: 13178956075737687211
  state: FAULTED
 status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
   see: http://illumos.org/msg/ZFS-8000-72
 config:

dFAULTED  corrupted data
  raidz1-0   FAULTED  corrupted data
da1  ONLINE
3419704811362497180  OFFLINE
da2  ONLINE
da3  ONLINE
da4  ONLINE

Notice that in the absence of the faulted da2 the OS has assigned da3 to da2 
etc.  I suspect this was part of the original problem in creating a label with 
two da2s

zdb still reports that the label has two da2 children:

vdev_tree:
type: 'raidz'
id: 0
guid: 11828532517066189487
nparity: 1
metaslab_array: 23
metaslab_shift: 36
ashift: 9
asize: 920660480
is_log: 0
children[0]:
type: 'disk'
id: 0
guid: 13697627234083630557
path: '/dev/da1'
whole_disk: 0
DTL: 78
children[1]:
type: 'disk'
id: 1
guid: 3419704811362497180
path: '/dev/da2'
whole_disk: 0
DTL: 71
offline: 1
children[2]:
type: 'disk'
id: 2
guid: 6790266178760006782
path: '/dev/da2'
whole_disk: 0
DTL: 77
children[3]:
type: 'disk'
id: 3
guid: 2883571222332651955
path: '/dev/da3'
whole_disk: 0
DTL: 76
children[4]:
type: 'disk'
id: 4
guid: 16640597255468768296
path: '/dev/da4'
whole_disk: 0
DTL: 75



 Which HBA are you using?  How many of these disks are on same or separate 
 HBAs?

all the disks are on the same HBA

twa0: 3ware 9000 series Storage Controller
twa0: INFO: (0x15: 0x1300): Controller details:: Model 9500S-8, 8 ports, 
Firmware FE9X 2.08.00.006
da0 at twa0 bus 0 scbus0 target 0 lun 0
da1 at twa0 bus 0 scbus0 target 1 lun 0
da2 at twa0 bus 0 scbus0 target 2 lun 0
da3 at twa0 bus 0 scbus0 target 3 lun 0
da4 at twa0 bus 0 scbus0 target 4 lun 0

-jg

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss