Re: [zfs-discuss] lost zpool when server restarted.

2008-05-04 Thread Victor Latushkin
Looking at the txg numbers, it's clear that labels on to devices that
are unavailable now may be stale:

Krzys wrote:
 When I do zdb on emcpower3a which seems to be ok from zpool perspective I get 
 the following output:
 bash-3.00# zdb -lv /dev/dsk/emcpower3a
 
 LABEL 0
 
   version=3
   name='mypool'
   state=0
   txg=4367380
   pool_guid=4148251638983938048
   top_guid=9690155374174551757
   guid=9690155374174551757
   vdev_tree
   type='disk'
   id=2
   guid=9690155374174551757
   path='/dev/dsk/emcpower3a'
   whole_disk=0
   metaslab_array=1813
   metaslab_shift=30
   ashift=9
   asize=134208815104

Here we have txg=4367380, but on other two devices (probably; at least
on one of them) - txg=4367379:


 But when I do zdb on emcpower0a which seems to be not that ok and get the 
 following output:
 bash-3.00# zdb -lv /dev/dsk/emcpower0a
 
 LABEL 0
 
   version=3
   name='mypool'
   state=0
   txg=4367379
   pool_guid=4148251638983938048
   top_guid=14125143252243381576
   guid=14125143252243381576
   vdev_tree
   type='disk'
   id=0
   guid=14125143252243381576
   path='/dev/dsk/emcpower0a'
   whole_disk=0
   metaslab_array=13
   metaslab_shift=29
   ashift=9
   asize=107365269504
   DTL=727
 
 that also is the same for emcpower2a in my pool.

What does 'zdb -uuu mypool' say?


 Is there a way to be able to fix failed LABELs 2 and 3? I know you need 4 of 
 them, but is there a way to reconstruct them in any way?

It looks like the problem is not that labels 2 and 3 are missing, but
that labels 0 and 1 are stale


 Or is my pool lost completely and I need to recreate it?
 It would be off that reboot of a server could cause such disaster.

There's Dirty Time Log object allocated for device with unreadable
labels, and it means that device in question was not available for some
time, so something weird might be going on with your storage a while 
back (prior to reboot)...

 But I was unable to find anywhere where people would be able to
 repair or recreate those LABELS. How would I recover my zpools? Any 
 help or suggestion is greatly appreciated.

Have you seen this thread -
http://www.opensolaris.org/jive/thread.jspa?messageID=220125 ?

I think some of that experience may be applicable to this case as well

Btw, what kind of Solaris are you running?

wbr,
victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] lost zpool when server restarted.

2008-05-04 Thread Jeff Bonwick
It's OK that you're missing labels 2 and 3 -- there are four copies
precisely so that you can afford to lose a few.  Labels 2 and 3
are at the end of the disk.  The fact that only they are missing
makes me wonder if someone resized the LUNs.  Growing them would
be OK, but shrinking them would indeed cause the pool to fail to
open (since part of it was amputated).

There ought to be more helpful diagnostics in the FMA error log.
After a failed attempt to import, type this:

# fmdump -ev

and let me know what it says.

Jeff

On Tue, Apr 29, 2008 at 03:31:53PM -0400, Krzys wrote:
 
 
 
 I have a problem on one of my systems with zfs. I used to have zpool created 
 with 3 luns on SAN. I did not have to put any raid or anything on it since it 
 was already using raid on SAN. Anyway server rebooted and I cannot zee my 
 pools. 
 When I do try to import it it does fail. I am using EMC Clarion as SAN and 
 powerpath
 # zpool list
 no pools available
 # zpool import -f
   pool: mypool
   id: 4148251638983938048
 state: FAULTED
 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
   devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-3C
 config:
   mypool UNAVAIL insufficient replicas
   emcpower0a UNAVAIL cannot open
   emcpower2a UNAVAIL cannot open
   emcpower3a ONLINE
 
 I think I am able to see all the luns and I should be able to access them on 
 my 
 sun box.
 # powermt display dev=all
 Pseudo name=emcpower0a
 CLARiiON ID=APM00070202835 [NRHAPP02]
 Logical device ID=6006016045201A001264FB20990FDC11 [LUN 13]
 state=alive; policy=CLAROpt; priority=0; queued-IOs=0
 Owner: default=SP B, current=SP B
 ==
  Host --- - Stor - -- I/O Path - -- Stats ---
 ### HW Path I/O Paths Interf. Mode State Q-IOs Errors
 ==
 3074 [EMAIL PROTECTED],70/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL 
 PROTECTED],0 c2t5006016041E035A4d0s0 SP A4 active 
 alive 0 0
 3074 [EMAIL PROTECTED],70/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL 
 PROTECTED],0 c2t5006016941E035A4d0s0 SP B5 active 
 alive 0 0
 3072 [EMAIL PROTECTED],70/[EMAIL PROTECTED],2/SUNW,[EMAIL 
 PROTECTED]/[EMAIL PROTECTED],0 c3t5006016141E035A4d0s0 SP A5 
 active alive 0 0
 3072 [EMAIL PROTECTED],70/[EMAIL PROTECTED],2/SUNW,[EMAIL 
 PROTECTED]/[EMAIL PROTECTED],0 c3t5006016841E035A4d0s0 SP B4 
 active alive 0 0
 
 
 Pseudo name=emcpower1a
 CLARiiON ID=APM00070202835 [NRHAPP02]
 Logical device ID=6006016045201A004C1388343C10DC11 [LUN 14]
 state=alive; policy=CLAROpt; priority=0; queued-IOs=0
 Owner: default=SP B, current=SP B
 ==
  Host --- - Stor - -- I/O Path - -- Stats ---
 ### HW Path I/O Paths Interf. Mode State Q-IOs Errors
 ==
 3074 [EMAIL PROTECTED],70/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL 
 PROTECTED],0 c2t5006016041E035A4d1s0 SP A4 active 
 alive 0 0
 3074 [EMAIL PROTECTED],70/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL 
 PROTECTED],0 c2t5006016941E035A4d1s0 SP B5 active 
 alive 0 0
 3072 [EMAIL PROTECTED],70/[EMAIL PROTECTED],2/SUNW,[EMAIL 
 PROTECTED]/[EMAIL PROTECTED],0 c3t5006016141E035A4d1s0 SP A5 
 active alive 0 0
 3072 [EMAIL PROTECTED],70/[EMAIL PROTECTED],2/SUNW,[EMAIL 
 PROTECTED]/[EMAIL PROTECTED],0 c3t5006016841E035A4d1s0 SP B4 
 active alive 0 0
 
 
 Pseudo name=emcpower3a
 CLARiiON ID=APM00070202835 [NRHAPP02]
 Logical device ID=6006016045201A00A82C68514E86DC11 [LUN 7]
 state=alive; policy=CLAROpt; priority=0; queued-IOs=0
 Owner: default=SP B, current=SP B
 ==
  Host --- - Stor - -- I/O Path - -- Stats ---
 ### HW Path I/O Paths Interf. Mode State Q-IOs Errors
 ==
 3074 [EMAIL PROTECTED],70/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL 
 PROTECTED],0 c2t5006016041E035A4d3s0 SP A4 active 
 alive 0 0
 3074 [EMAIL PROTECTED],70/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL 
 PROTECTED],0 c2t5006016941E035A4d3s0 SP B5 active 
 alive 0 0
 3072 [EMAIL PROTECTED],70/[EMAIL PROTECTED],2/SUNW,[EMAIL 
 PROTECTED]/[EMAIL PROTECTED],0 c3t5006016141E035A4d3s0 SP A5 
 active alive 0 0
 3072 [EMAIL PROTECTED],70/[EMAIL PROTECTED],2/SUNW,[EMAIL 
 PROTECTED]/[EMAIL PROTECTED],0 c3t5006016841E035A4d3s0 SP B4 
 active alive 0 0
 
 
 Pseudo name=emcpower2a
 CLARiiON ID=APM00070202835 [NRHAPP02]
 Logical device ID=600601604B141B00C2F6DB2AC349DC11 [LUN 24]
 state=alive; policy=CLAROpt; priority=0; queued-IOs=0
 Owner: default=SP B, current=SP B
 

Re: [zfs-discuss] lost zpool when server restarted.

2008-05-04 Thread Jeff Bonwick
 Looking at the txg numbers, it's clear that labels on to devices that
 are unavailable now may be stale:

Actually, they look OK.  The txg values in the label indicate the
last txg in which the pool configuration changed for devices in that
top-level vdev (e.g. mirror or raid-z group), not the last txg synced.

Jeff
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] lost zpool when server restarted.

2008-05-04 Thread Victor Latushkin
Jeff Bonwick wrote:
 Looking at the txg numbers, it's clear that labels on to devices that
 are unavailable now may be stale:
 
 Actually, they look OK.  The txg values in the label indicate the
 last txg in which the pool configuration changed for devices in that
 top-level vdev (e.g. mirror or raid-z group), not the last txg synced.

Agree, I've jumped to conclusions here.

But still it is a difference between two labels presented. Since this 
was running for a while I suppose there have been no admin-initiated 
configuration changes, so config change may be due to allocation of DTL 
object, correct?

Still it would be interesting to know txg of the selected uberblock to 
see how long ago that change happened.

Also it would be interesting to know why did server reboot?

Victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] lost zpool when server restarted.

2008-05-04 Thread Krzys
Because this system was in production I had to fairly quickly recover, so I was 
unable to play much more with it we had to destroy it and recreate new pool and 
then recover data from tapes.

Its a mistery as to why in the middle of a night it rebooted, we could not 
figure this out and why pool had this problem... so unfortunatelly I will not 
be 
able to follow what you Victor and Jeff were suggesting.


before we destroyed that pool I did get output of fmdump on that system to see 
what failed etc. As you can see it happend at around 3:54 am on Sunday morning 
there was no one on the system from admin perspective to break anything, only 
thing that I might think of would be the backups running which could generate 
more traffic, but then I had that system for over a year setup this way, and no 
changes were made to it from storage perspective.

yes I did see this URL: 
http://www.opensolaris.org/jive/thread.jspa?messageID=220125
but unfortunately I was unable to apply it in my situation as I had no idea 
what 
values to apply... :(

anyway here is fmdump

bash-3.00# fmdump -eV

TIME   CLASS

Apr 27 2008 03:54:05.605369200 ereport.fs.zfs.vdev.open_failed

nvlist version: 0

class = ereport.fs.zfs.vdev.open_failed

ena = 0x18594234ea1

detector = (embedded nvlist)

nvlist version: 0

version = 0x0

scheme = zfs

pool = 0x39918ce32491d000

vdev = 0xc40696f31f78fd48

(end detector)



pool = mypool

pool_guid = 0x39918ce32491d000

pool_context = 1

vdev_guid = 0xc40696f31f78fd48

vdev_type = disk

vdev_path = /dev/dsk/emcpower0a

parent_guid = 0x39918ce32491d000

parent_type = root

prev_state = 0x1

__ttl = 0x1

__tod = 0x4814311d 0x24153370



Apr 27 2008 03:54:05.605369725 ereport.fs.zfs.vdev.open_failed

nvlist version: 0

class = ereport.fs.zfs.vdev.open_failed

ena = 0x18594234ea1

detector = (embedded nvlist)

nvlist version: 0

version = 0x0

scheme = zfs

pool = 0x39918ce32491d000

vdev = 0xd56fa2d7686dae8c

(end detector)



pool = mypool

pool_guid = 0x39918ce32491d000

pool_context = 1

vdev_guid = 0xd56fa2d7686dae8c

vdev_type = disk

vdev_path = /dev/dsk/emcpower2a

parent_guid = 0x39918ce32491d000

parent_type = root

prev_state = 0x1

__ttl = 0x1

__tod = 0x4814311d 0x2415357d



Apr 27 2008 03:54:05.605369225 ereport.fs.zfs.zpool

nvlist version: 0

class = ereport.fs.zfs.zpool

ena = 0x18594234ea1

detector = (embedded nvlist)

nvlist version: 0

version = 0x0

scheme = zfs

pool = 0x39918ce32491d000

(end detector)



pool = mypool

pool_guid = 0x39918ce32491d000

pool_context = 1

__ttl = 0x1

__tod = 0x4814311d 0x24153389



Apr 27 2008 03:56:28.180698100 ereport.fs.zfs.vdev.open_failed

nvlist version: 0

class = ereport.fs.zfs.vdev.open_failed

ena = 0x398b69181e00401

detector = (embedded nvlist)

nvlist version: 0

version = 0x0

scheme = zfs

pool = 0x39918ce32491d000

vdev = 0xc40696f31f78fd48

(end detector)



pool = mypool

pool_guid = 0x39918ce32491d000

pool_context = 1

vdev_guid = 0xc40696f31f78fd48

vdev_type = disk

vdev_path = /dev/dsk/emcpower0a

parent_guid = 0x39918ce32491d000

parent_type = root

prev_state = 0x1

__ttl = 0x1

__tod = 0x481431ac 0xac53bf4



Apr 27 2008 03:56:28.180698375 ereport.fs.zfs.vdev.open_failed

nvlist version: 0

class = ereport.fs.zfs.vdev.open_failed

ena = 0x398b69181e00401

detector = (embedded nvlist)

nvlist version: 0

version = 0x0

scheme = zfs

pool = 0x39918ce32491d000

vdev = 0xd56fa2d7686dae8c

(end detector)



pool = mypool

pool_guid = 0x39918ce32491d000

pool_context = 1

vdev_guid = 0xd56fa2d7686dae8c

vdev_type = disk

vdev_path = /dev/dsk/emcpower2a

parent_guid = 0x39918ce32491d000

parent_type = root

prev_state = 0x1

__ttl = 0x1

__tod = 0x481431ac 0xac53d07



Apr 27 2008 03:56:28.180698500 ereport.fs.zfs.zpool

nvlist version: 0

class = ereport.fs.zfs.zpool

ena = 0x398b69181e00401

detector = (embedded nvlist)

nvlist version: 0

version = 0x0

scheme = zfs

pool = 0x39918ce32491d000

(end detector)




[zfs-discuss] lost zpool when server restarted.

2008-04-29 Thread Krzys



I have a problem on one of my systems with zfs. I used to have zpool created 
with 3 luns on SAN. I did not have to put any raid or anything on it since it 
was already using raid on SAN. Anyway server rebooted and I cannot zee my 
pools. 
When I do try to import it it does fail. I am using EMC Clarion as SAN and 
powerpath
# zpool list
no pools available
# zpool import -f
  pool: mypool
  id: 4148251638983938048
state: FAULTED
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
  devices and try again.
  see: http://www.sun.com/msg/ZFS-8000-3C
config:
  mypool UNAVAIL insufficient replicas
  emcpower0a UNAVAIL cannot open
  emcpower2a UNAVAIL cannot open
  emcpower3a ONLINE

I think I am able to see all the luns and I should be able to access them on my 
sun box.
# powermt display dev=all
Pseudo name=emcpower0a
CLARiiON ID=APM00070202835 [NRHAPP02]
Logical device ID=6006016045201A001264FB20990FDC11 [LUN 13]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0
Owner: default=SP B, current=SP B
==
 Host --- - Stor - -- I/O Path - -- Stats ---
### HW Path I/O Paths Interf. Mode State Q-IOs Errors
==
3074 [EMAIL PROTECTED],70/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 c2t5006016041E035A4d0s0 SP A4 active 
alive 0 0
3074 [EMAIL PROTECTED],70/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 c2t5006016941E035A4d0s0 SP B5 active 
alive 0 0
3072 [EMAIL PROTECTED],70/[EMAIL PROTECTED],2/SUNW,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 c3t5006016141E035A4d0s0 SP A5 
active alive 0 0
3072 [EMAIL PROTECTED],70/[EMAIL PROTECTED],2/SUNW,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 c3t5006016841E035A4d0s0 SP B4 
active alive 0 0


Pseudo name=emcpower1a
CLARiiON ID=APM00070202835 [NRHAPP02]
Logical device ID=6006016045201A004C1388343C10DC11 [LUN 14]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0
Owner: default=SP B, current=SP B
==
 Host --- - Stor - -- I/O Path - -- Stats ---
### HW Path I/O Paths Interf. Mode State Q-IOs Errors
==
3074 [EMAIL PROTECTED],70/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 c2t5006016041E035A4d1s0 SP A4 active 
alive 0 0
3074 [EMAIL PROTECTED],70/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 c2t5006016941E035A4d1s0 SP B5 active 
alive 0 0
3072 [EMAIL PROTECTED],70/[EMAIL PROTECTED],2/SUNW,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 c3t5006016141E035A4d1s0 SP A5 
active alive 0 0
3072 [EMAIL PROTECTED],70/[EMAIL PROTECTED],2/SUNW,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 c3t5006016841E035A4d1s0 SP B4 
active alive 0 0


Pseudo name=emcpower3a
CLARiiON ID=APM00070202835 [NRHAPP02]
Logical device ID=6006016045201A00A82C68514E86DC11 [LUN 7]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0
Owner: default=SP B, current=SP B
==
 Host --- - Stor - -- I/O Path - -- Stats ---
### HW Path I/O Paths Interf. Mode State Q-IOs Errors
==
3074 [EMAIL PROTECTED],70/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 c2t5006016041E035A4d3s0 SP A4 active 
alive 0 0
3074 [EMAIL PROTECTED],70/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 c2t5006016941E035A4d3s0 SP B5 active 
alive 0 0
3072 [EMAIL PROTECTED],70/[EMAIL PROTECTED],2/SUNW,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 c3t5006016141E035A4d3s0 SP A5 
active alive 0 0
3072 [EMAIL PROTECTED],70/[EMAIL PROTECTED],2/SUNW,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 c3t5006016841E035A4d3s0 SP B4 
active alive 0 0


Pseudo name=emcpower2a
CLARiiON ID=APM00070202835 [NRHAPP02]
Logical device ID=600601604B141B00C2F6DB2AC349DC11 [LUN 24]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0
Owner: default=SP B, current=SP B
==
 Host --- - Stor - -- I/O Path - -- Stats ---
### HW Path I/O Paths Interf. Mode State Q-IOs Errors
==
3074 [EMAIL PROTECTED],70/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 c2t5006016041E035A4d2s0 SP A4 active 
alive 0 0
3074 [EMAIL PROTECTED],70/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 c2t5006016941E035A4d2s0 SP B5 active 
alive 0 0
3072 [EMAIL PROTECTED],70/[EMAIL PROTECTED],2/SUNW,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 c3t5006016141E035A4d2s0 SP A5 
active alive 0 0
3072 [EMAIL