Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / driver failure better

Ross Smith Tue, 02 Dec 2008 03:33:18 -0800

Hey folks,

I've just followed up on this, testing iSCSI with a raided pool, and
it still appears to be struggling when a device goes offline.


>>> I don't see how this could work except for mirrored pools.  Would that
>>> carry enough market to be worthwhile?
>>> -- richard
>>>
>>
>> I have to admit, I've not tested this with a raided pool, but since
>> all ZFS commands hung when my iSCSI device went offline, I assumed
>> that you would get the same effect of the pool hanging if a raid-z2
>> pool is waiting for a response from a device.  Mirrored pools do work
>> particularly well with this since it gives you the potential to have
>> remote mirrors of your data, but if you had a raid-z2 pool, you still
>> wouldn't want that hanging if a single device failed.
>>
>
> zpool commands hanging is CR6667208, and has been fixed in b100.
> http://bugs.opensolaris.org/view_bug.do?bug_id=6667208
>
>> I will go and test the raid scenario though on a current build, just to be
>> sure.
>>
>
> Please.
> -- richard


I've just created a pool using three snv_103 iscsi Targets, with a
fourth install of snv_103 collating those targets into a raidz pool,
and sharing that out over CIFS.

To test the server, while transferring files from a windows
workstation, I powered down one of the three iSCSI targets.  It took a
few minutes to shutdown, but once that happened the windows copy
halted with the error:
"The specified network name is no longer available."

At this point, the zfs admin tools still work fine (which is a huge
improvement, well done!), but zpool status still reports that all
three devices are online.

A minute later, I can open the share again, and start another copy.

Thirty seconds after that, zpool status finally reports that the iscsi
device is offline.

So it looks like we have the same problems with that 3 minute delay,
with zpool status reporting wrong information, and the CIFS service
having problems tool.

At this point I restarted the iSCSI target, but had problems bringing
it back online.  It appears there's a bug in the initiator, but it's
easily worked around:
http://www.opensolaris.org/jive/thread.jspa?messageID=312981&#312981

What was great was that as soon as the iSCSI initiator reconnected,
ZFS started resilvering.

What might not be so great is the fact that all three devices are
showing that they've been resilvered:

# zpool status
  pool: iscsipool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed after 0h2m with 0 errors on Tue Dec  2 11:04:10 2008
config:

        NAME                                       STATE     READ WRITE CKSUM
        iscsipool                                  ONLINE       0     0     0
          raidz1                                   ONLINE       0     0     0
            c2t600144F04933FF6C00005056967AC800d0  ONLINE       0
0     0  179K resilvered
            c2t600144F04934FAB300005056964D9500d0  ONLINE       5
9.88K     0  311M resilvered
            c2t600144F04934119E000050569675FF00d0  ONLINE       0
0     0  179K resilvered

errors: No known data errors

It's proving a little hard to know exactly what's happening when,
since I've only got a few seconds to log times, and there are delays
with each step.  However, I ran another test using robocopy and was
able to observe the behaviour a little more closely:

Test 2:  Using robocopy for the transfer, and iostat plus zpool status
on the server

10:46:30 - iSCSI server shutdown started
10:52:20 - all drives still online according to zpool status
10:53:30 - robocopy error - "The specified network name is no longer available"
 - zpool status shows all three drives as online
 - zpool iostat appears to have hung, taking much longer than the 30s
specified to return a result
 - robocopy is now retrying the file, but appears to have hung
10:54:30 - robocopy, CIFS and iostat all start working again, pretty
much simultaneously
 - zpool status now shows the drive as offline

I could probably do with using DTrace to get a better look at this,
but I haven't learnt that yet.  My guess as to what's happening would
be:

- iSCSI target goes offline
- ZFS will not be notified for 3 minutes, but I/O to that device is
essentially hung
- CIFS times out (I suspect this is on the client side with around a
30s timeout, but I can't find the timeout documented anywhere).
- zpool iostat is now waiting, I may be wrong but this doesn't appear
to have benefited from the changes to zpool status
- After 3 minutes, the iSCSI drive goes offline.  The pool carries on
with the remaining two drives, CIFS carries on working, iostat carries
on working.  "zpool status" however is still out of date.
- zpool status eventually catches up, and reports that the drive has
gone offline.


So, if my guesses are right, I see several problems here:
1. ZFS could still benefit with the timeout I've suggested to keep the
pool active.  I've now shown this benefits raidz pools as well as
mirrors, and with problems other people have reported, we've shown
that at least two drivers have problems that this would mitigate.
2. I would guess that the timeout needs to be under 30 seconds to
prevent problems with CIFS clients, I need to find some documentation
on this, and find some way to prove it's a client timeout and not a
problem with the CIFS server.
3. zpool iostat is still blocked by a hung device (there may be an
existing bug for this, it rings a bell).
4. zpool status still reports out of date information.
5. When iSCSI targets finally do come back online, ZFS is resilvering
all of them (again, this rings a bell, Miles might have reported
something similar).

And while I don't know the code at all, I really can't understand how
ZFS can be serving files out from a pool, but zpool status doesn't
know what's going on.  ZFS physically can't work unless it knows which
drives it is and isn't writing to.  Why can't you just use this
knowledge for zpool status?

Ross
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / driver failure better

Reply via email to