Just to report back to the list...  Sorry for the lengthy post

So I've tested the iSCSI based zfs mirror on Sol 10u4, and it does more 
or less work as expected.  If I unplug one side of the mirror - unplug 
or power down one of the iSCSI targets -  I/O to the zpool stops for a 
while, perhaps a minute, and then things free up again.  zpool commands 
seem to get unworkably slow, and error messages fly by on the console 
like fire ants running from a flood.  Worst of all, plugging the faulted 
mirror back in (before removing the mirror from the pool)  it's very 
hard to bring the faulted device back online:

prudhoe # zpool status
  pool: test
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-4J
 scrub: resilver completed with 0 errors on Tue Apr  8 16:34:08 2008
config:

        NAME        STATE     READ WRITE CKSUM
        test        DEGRADED     0     0     0
          mirror    DEGRADED     0     0     0
            c2t1d0  FAULTED      0 2.88K     0  corrupted data
            c2t1d0  ONLINE       0     0     0

errors: No known data errors

>>>>>>>>> Comment: why are there now two instances of c2t1d0??  <<<<<<<<<<


prudhoe # zpool replace test c2t2d0
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c2t1d0s0 is part of active ZFS pool test. Please see zpool(1M).

prudhoe # zpool replace -f test c2t2d0
invalid vdev specification
the following errors must be manually repaired:
/dev/dsk/c2t1d0s0 is part of active ZFS pool test. Please see zpool(1M).

prudhoe # zpool remove test c2t2d0
cannot remove c2t2d0: no such device in pool

prudhoe # zpool offline test c2t2d0
cannot offline c2t2d0: no such device in pool

prudhoe # zpool online test c2t2d0
cannot online c2t2d0: no such device in pool

>>>>>>>>>>  OK, get more drastic <<<<<<<<<<<<<<

prudhoe # zpool clear test

prudhoe # zpool status
  pool: test
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-4J
 scrub: resilver completed with 0 errors on Tue Apr  8 16:34:08 2008
config:

        NAME        STATE     READ WRITE CKSUM
        test        DEGRADED     0     0     0
          mirror    DEGRADED     0     0     0
            c2t1d0  FAULTED      0     0     0  corrupted data
            c2t1d0  ONLINE       0     0     0

errors: No known data errors

>>>>>>>>>>>>>>>>>>>>>  Frustration setting in.  The error counts are zero, but 
>>>>>>>>>>>>>>>>>>>>> still 
two instances of c2t1d0 listed... <<<<<<<<<<<<<<<<

prudhoe # zpool export test

prudhoe # zpool import test

prudhoe # zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
test                   12.9G   9.54G   3.34G    74%  ONLINE     -

prudhoe # zpool status
  pool: test
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress, 1.11% done, 0h20m to go
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c2t2d0  ONLINE       0     0     0
            c2t1d0  ONLINE       0     0     0

errors: No known data errors


>>>>>  Finally resilvering with the right devices.  The thing I really don't 
>>>>> like here is the pool had to be exported and then imported to make this 
>>>>> work.  For an NFS server, this is not really acceptable.  Now I know this 
>>>>> is ol' Solaris 10u4, but still, I'm surprised I needed to export/import 
>>>>> the pool to get it working correctly again.  Anyone know what I did 
>>>>> wrong?  Is there a canonical way to online the previously faulted device?

Anyway, It looks like for now, I can get some sort of HA our of this iSCSI 
mirror.  The other pluses is the pool can self heal, and reads will be spread 
across both units.  

Cheers,

Jon

--- P.S.  Playing with this more before sending this message, if you can detach 
the faulted mirror before putting it back online, it all works well.  Hope that 
nothing bounces on your network when you have a failure:

---->>>> unplug one iscsi mirror, then: 

prudhoe # zpool status -v
  pool: test
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: scrub completed with 0 errors on Wed Apr  9 14:18:45 2008
config:

        NAME        STATE     READ WRITE CKSUM
        test        DEGRADED     0     0     0
          mirror    DEGRADED     0     0     0
            c2t2d0  UNAVAIL      4    91     0  cannot open
            c2t1d0  ONLINE       0     0     0

errors: No known data errors

prudhoe # zpool detach test c2t2d0

prudhoe # zpool status -v
  pool: test
 state: ONLINE
 scrub: scrub completed with 0 errors on Wed Apr  9 14:18:45 2008
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          c2t1d0    ONLINE       0     0     0

errors: No known data errors

----->>>> replug the downed mirror, and: 

prudhoe # zpool attach test c2t1d0 c2t2d0
prudhoe # zpool status -v
  pool: test
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress, 0.04% done, 2h17m to go
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c2t1d0  ONLINE       0     0     0
            c2t2d0  ONLINE       0     0     0

errors: No known data errors

Viola!

Jon

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to