Hi Folks,

I have a small problem. I've disappeared about 5.9TB of data. 

My host system was (well, still is) connected to this storage via iSCSI and 
MPXIO, doing round robin of a pair of GigE ports. I'd like to make a quick 
excuse before we begin here.

I was originally doing raidz2 (there are 15 disks involved), however during 
heavy load (i.e. a scrub with ~50% disk usage) errors showed up all over the 
place and eventually faulted the pool. I assume I was either running into 
problems with bandwidth or generally freaking out the array with the volume of 
IO.

So - I'm instead exporting the whole deal as a single 5.9TB LUN (done as raid5 
on the iSCSI appliance - a Promise m500i). Well, that was all good and well 
until I had a kernel panic earlier today and system came back rather unhappily. 

My pool now looks like this:

  pool: store
 state: FAULTED
status: One or more devices could not be used because the label is missing 
        or invalid.  There are insufficient replicas for the pool to continue
        functioning.
action: Destroy and re-create the pool from a backup source.
   see: http://www.sun.com/msg/ZFS-8000-5E
 scrub: none requested
config:

        NAME                     STATE     READ WRITE CKSUM
        store                    FAULTED      0     0     0  corrupted data
          c0t22310001557D05D5d0  FAULTED      0     0     0  corrupted data

I still see the LUN:

AVAILABLE DISK SELECTIONS:
       0. c0t22310001557D05D5d0 <Promise-VTrak M500i-021B-5.90TB>
          /scsi_vhci/d...@g22310001557d05d5
....

I can do a zdb to the device and I get some info (well, actually to s0 on the 
disk, which is weird because I think I built the array without specifying a 
slice. Maybe relevant - don't know...)

gecl...@ostore:~# zdb -l /dev/rdsk/c0t22310001557D05D5d0s0
--------------------------------------------
LABEL 0
--------------------------------------------
    version=14
    name='store'
    state=0
    txg=178224
    pool_guid=13934602390719084200
    hostid=8462299
    hostname='store'
    top_guid=14931103169794670927
    guid=14931103169794670927
    vdev_tree
        type='disk'
        id=0
        guid=14931103169794670927
        path='/dev/dsk/c0t22310001557D05D5d0s0'
        devid='id1,s...@x22310001557d05d5/a'
        phys_path='/scsi_vhci/d...@g22310001557d05d5:a'
        whole_disk=1
        metaslab_array=23
        metaslab_shift=35
        ashift=9
        asize=6486985015296
        is_log=0
        DTL=44
--------------------------------------------
LABEL 1
--------------------------------------------
    version=14
    name='store'
    state=0
    txg=178224
    pool_guid=13934602390719084200
    hostid=8462299
    hostname='store'
    top_guid=14931103169794670927
    guid=14931103169794670927
    vdev_tree
        type='disk'
        id=0
        guid=14931103169794670927
        path='/dev/dsk/c0t22310001557D05D5d0s0'
        devid='id1,s...@x22310001557d05d5/a'
        phys_path='/scsi_vhci/d...@g22310001557d05d5:a'
        whole_disk=1
        metaslab_array=23
        metaslab_shift=35
        ashift=9
        asize=6486985015296
        is_log=0
        DTL=44
--------------------------------------------
LABEL 2
--------------------------------------------
    version=14
    name='store'
    state=0
    txg=178224
    pool_guid=13934602390719084200
    hostid=8462299
    hostname='store'
    top_guid=14931103169794670927
    guid=14931103169794670927
    vdev_tree
        type='disk'
        id=0
        guid=14931103169794670927
        path='/dev/dsk/c0t22310001557D05D5d0s0'
        devid='id1,s...@x22310001557d05d5/a'
        phys_path='/scsi_vhci/d...@g22310001557d05d5:a'
        whole_disk=1
        metaslab_array=23
        metaslab_shift=35
        ashift=9
        asize=6486985015296
        is_log=0
        DTL=44
--------------------------------------------
LABEL 3
--------------------------------------------
    version=14
    name='store'
    state=0
    txg=178224
    pool_guid=13934602390719084200
    hostid=8462299
    hostname='store'
    top_guid=14931103169794670927
    guid=14931103169794670927
    vdev_tree
        type='disk'
        id=0
        guid=14931103169794670927
        path='/dev/dsk/c0t22310001557D05D5d0s0'
        devid='id1,s...@x22310001557d05d5/a'
        phys_path='/scsi_vhci/d...@g22310001557d05d5:a'
        whole_disk=1
        metaslab_array=23
        metaslab_shift=35
        ashift=9
        asize=6486985015296
        is_log=0
        DTL=44

I can force export and import the pool, but I can't seem to get it active 
again. I've been reading around trying to get this figured out. Is this a 
scenario in which I should expect to have well and truly lost all of my data?

The data is not irreplaceable, I can rebuild / restore from backups but it will 
take an awful long time. I'm aware that this is a highly suboptimal setup, but 
feel free to beat me up a bit on it anyways.

In an ideal scenario I'd like to somehow get this out of faulted, and do a 
scrub to what portion of the data might actually have been corrupted. 
Incidentally, has anyone else done much with building zpools on iSCSI targets? 
Anyone else noticed an upper limit on the number of targets they can hit per 
GigE connection as part of a loaded ZFS pool? It looks like I was stable with 
up to 7 devices of 2x GigE, but things fell apart quickly after that.

If anybody has some ideas on next steps (aside from restoring form backups), 
I'd love to hear them. 

Thanks,

Graeme
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to