On Sat, Jan 2, 2010 at 4:07 PM, R.G. Keen <k...@geofex.com> wrote:
> OK. From the above suppositions, if we had a desktop (infinitely
> long retry on fail) disk and a soft-fail error in a sector, then the
> disk would effectively hang each time the sector was accessed.
> This would lead to
> (1) ZFS->SD-> disk read of failing sector
> (2) disk does not reply within 60 seconds (default)
> (3) disk is reset by SD
> (4) operation is retried by SD(?)
> (5) disk does not reply within 60 seconds (default)
> (6) disk is reset by SD ?
>
> then what? If I'm reading you correctly, the following string of
> events happens:
>
>> The drivers will retry and fail the I/O. By default, for SATA
>> disks using the sd driver, there are 5 retries of 60 seconds.
>> After 5 minutes, the I/O will be declared failed and that info
>> is passed back up the stack to ZFS, which will start its
>> recovery.  This is why the T part of N in T doesn't work so
>> well for the TLER case.
>
> Hmmm... actually, it may be just fine for my personal wants.
> If I had a desktop drive which went unresponsive for 60 seconds
> on an I/O soft error, then the timeout would be five minutes.
> at that time, zfs would... check me here... mark the block as
> failed, and try to relocate the block on the disk. If that worked
> fine, the previous sectors would be marked as unusable, and
> work goes on, but with the actions noted in the logs.

We use Seagate Barracuda ES.2 1TB disks and every time the OS starts
to bang on a region of the disk with bad blocks (which essentially
degrades the performance of the whole pool) we get a call from our
clients complaining about NFS timeouts. They usually last for 5
minutes but I've seen it last for a whole hour while the drive is
slowly dying. Off-lining the faulty disk fixes it.

I'm trying to find out how the disks' firmware is programmed
(timeouts, retries, etc) but so far nothing in the official docs. In
this case the disk's retry timeout seem way too high for our needs and
I believe a timeout limit imposed by the OS would help.

-- 
Giovanni P. Tirloni
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to