Hi Folks, I have a small problem. I've disappeared about 5.9TB of data.
My host system was (well, still is) connected to this storage via iSCSI and MPXIO, doing round robin of a pair of GigE ports. I'd like to make a quick excuse before we begin here. I was originally doing raidz2 (there are 15 disks involved), however during heavy load (i.e. a scrub with ~50% disk usage) errors showed up all over the place and eventually faulted the pool. I assume I was either running into problems with bandwidth or generally freaking out the array with the volume of IO. So - I'm instead exporting the whole deal as a single 5.9TB LUN (done as raid5 on the iSCSI appliance - a Promise m500i). Well, that was all good and well until I had a kernel panic earlier today and system came back rather unhappily. My pool now looks like this: pool: store state: FAULTED status: One or more devices could not be used because the label is missing or invalid. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. see: http://www.sun.com/msg/ZFS-8000-5E scrub: none requested config: NAME STATE READ WRITE CKSUM store FAULTED 0 0 0 corrupted data c0t22310001557D05D5d0 FAULTED 0 0 0 corrupted data I still see the LUN: AVAILABLE DISK SELECTIONS: 0. c0t22310001557D05D5d0 <Promise-VTrak M500i-021B-5.90TB> /scsi_vhci/d...@g22310001557d05d5 .... I can do a zdb to the device and I get some info (well, actually to s0 on the disk, which is weird because I think I built the array without specifying a slice. Maybe relevant - don't know...) gecl...@ostore:~# zdb -l /dev/rdsk/c0t22310001557D05D5d0s0 -------------------------------------------- LABEL 0 -------------------------------------------- version=14 name='store' state=0 txg=178224 pool_guid=13934602390719084200 hostid=8462299 hostname='store' top_guid=14931103169794670927 guid=14931103169794670927 vdev_tree type='disk' id=0 guid=14931103169794670927 path='/dev/dsk/c0t22310001557D05D5d0s0' devid='id1,s...@x22310001557d05d5/a' phys_path='/scsi_vhci/d...@g22310001557d05d5:a' whole_disk=1 metaslab_array=23 metaslab_shift=35 ashift=9 asize=6486985015296 is_log=0 DTL=44 -------------------------------------------- LABEL 1 -------------------------------------------- version=14 name='store' state=0 txg=178224 pool_guid=13934602390719084200 hostid=8462299 hostname='store' top_guid=14931103169794670927 guid=14931103169794670927 vdev_tree type='disk' id=0 guid=14931103169794670927 path='/dev/dsk/c0t22310001557D05D5d0s0' devid='id1,s...@x22310001557d05d5/a' phys_path='/scsi_vhci/d...@g22310001557d05d5:a' whole_disk=1 metaslab_array=23 metaslab_shift=35 ashift=9 asize=6486985015296 is_log=0 DTL=44 -------------------------------------------- LABEL 2 -------------------------------------------- version=14 name='store' state=0 txg=178224 pool_guid=13934602390719084200 hostid=8462299 hostname='store' top_guid=14931103169794670927 guid=14931103169794670927 vdev_tree type='disk' id=0 guid=14931103169794670927 path='/dev/dsk/c0t22310001557D05D5d0s0' devid='id1,s...@x22310001557d05d5/a' phys_path='/scsi_vhci/d...@g22310001557d05d5:a' whole_disk=1 metaslab_array=23 metaslab_shift=35 ashift=9 asize=6486985015296 is_log=0 DTL=44 -------------------------------------------- LABEL 3 -------------------------------------------- version=14 name='store' state=0 txg=178224 pool_guid=13934602390719084200 hostid=8462299 hostname='store' top_guid=14931103169794670927 guid=14931103169794670927 vdev_tree type='disk' id=0 guid=14931103169794670927 path='/dev/dsk/c0t22310001557D05D5d0s0' devid='id1,s...@x22310001557d05d5/a' phys_path='/scsi_vhci/d...@g22310001557d05d5:a' whole_disk=1 metaslab_array=23 metaslab_shift=35 ashift=9 asize=6486985015296 is_log=0 DTL=44 I can force export and import the pool, but I can't seem to get it active again. I've been reading around trying to get this figured out. Is this a scenario in which I should expect to have well and truly lost all of my data? The data is not irreplaceable, I can rebuild / restore from backups but it will take an awful long time. I'm aware that this is a highly suboptimal setup, but feel free to beat me up a bit on it anyways. In an ideal scenario I'd like to somehow get this out of faulted, and do a scrub to what portion of the data might actually have been corrupted. Incidentally, has anyone else done much with building zpools on iSCSI targets? Anyone else noticed an upper limit on the number of targets they can hit per GigE connection as part of a loaded ZFS pool? It looks like I was stable with up to 7 devices of 2x GigE, but things fell apart quickly after that. If anybody has some ideas on next steps (aside from restoring form backups), I'd love to hear them. Thanks, Graeme -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss