Hi guys,

Bob, my thought was to have this timeout as something that can be optionally 
set by the administrator on a per pool basis.  I'll admit I was mainly thinking 
about reads and hadn't considered the write scenario, but even having thought 
about that it's still a feature I'd like.  After all, this would be a timeout 
set by the administrator based on the longest delay they can afford for that 
storage pool.

Personally, if a SATA disk wasn't responding to any requests after 2 seconds I 
really don't care if an error has been detected, as far as I'm concerned that 
disk is faulty.  I'd be quite happy for the array to drop to a degraded mode 
based on that and for writes to carry on with the rest of the array.

Eric, thanks for the extra details, they're very much appreciated.  It's good 
to hear you're working on this, and I love the idea of doing a B_FAILFAST read 
on both halves of the mirror.

I do have a question though.  From what you're saying, the response time can't 
be consistent across all hardware, so you're once again at the mercy of the 
storage drivers.  Do you know how long does B_FAILFAST takes to return a 
response on iSCSI?  If that's over 1-2 seconds I would still consider that too 
slow I'm afraid.

I understand that Sun in general don't want to add fault management to ZFS, but 
I don't see how this particular timeout does anything other than help ZFS when 
it's dealing with such a diverse range of media.  I agree that ZFS can't know 
itself what should be a valid timeout, but that's exactly why this needs to be 
an optional administrator set parameter.  The administrator of a storage array 
who wants to set this certainly knows what a valid timeout is for them, and 
these timeouts are likely to be several orders of magnitude larger than the 
standard response times.  I would configure very different values for my SATA 
drives as for my iSCSI connections, but in each case I would be happier knowing 
that ZFS has more of a chance of catching bad drivers or unexpected scenarios.

I very much doubt hardware raid controllers would wait 3 minutes for a drive to 
return a response, they will have their own internal timeouts to know when a 
drive has failed, and while ZFS is dealing with very different hardware I can't 
help but feel it should have that same approach to management of its drives.

However, that said, I'll be more than willing to test the new
B_FAILFAST logic on iSCSI once it's released.  Just let me know when
it's out.


Ross





> Date: Thu, 28 Aug 2008 11:29:21 -0500
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> CC: zfs-discuss@opensolaris.org
> Subject: Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / 
> driver failure better
> 
> On Thu, 28 Aug 2008, Ross wrote:
> >
> > I believe ZFS should apply the same tough standards to pool 
> > availability as it does to data integrity.  A bad checksum makes ZFS 
> > read the data from elsewhere, why shouldn't a timeout do the same 
> > thing?
> 
> A problem is that for some devices, a five minute timeout is ok.  For 
> others, there must be a problem if the device does not respond in a 
> second or two.
> 
> If the system or device is simply overwelmed with work, then you would 
> not want the system to go haywire and make the problems much worse.
> 
> Which of these do you prefer?
> 
>    o System waits substantial time for devices to (possibly) recover in
>      order to ensure that subsequently written data has the least
>      chance of being lost.
> 
>    o System immediately ignores slow devices and switches to
>      non-redundant non-fail-safe non-fault-tolerant may-lose-your-data
>      mode.  When system is under intense load, it automatically
>      switches to the may-lose-your-data mode.
> 
> Bob
> ======================================
> Bob Friesenhahn
> [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
> 

_________________________________________________________________
Get Hotmail on your mobile from Vodafone 
http://clk.atdmt.com/UKM/go/107571435/direct/01/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to