>>>>> "n" == Nathan  <nat...@passivekid.com> writes:

     n> http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery

This sounds silly.  Does it actually work for you?

It seems like comparing 7 seconds to the normal 30 seconds would be
useless.  Instead you want to compare (7 seconds * n levels * of
cargo-cult retry in OS storage stack) to the 0.01 seconds it normally
takes to read a sector.  3 orders of magnitude difference here is what
makes slowly-failing drives useless, not the tiny difference between 7
and 30.

A smart feature would be ``mark unreadable blocks in the drive's
onboard DRAM read cache and fail them instantly without an attempt on
the medium, to work around broken OS storage stacks that can't
distinguish between cabling errors and drive reports and keep
uselessly banging away on dead sectors as errors slowly propogate up
an `abstracted' stack,'' and ``spend at most 30 seconds out of every
2000 seconds on various degraded error recovery gymnastics.  If your
time budget's spent, toss up an error immediately, NO THINKING,
immediately after the second time the platter rotated while the head
should have been over the data, no matter where the head actually was
or what you got or how certain you are the data is there unharmed if
you can just recover head servo.''

but I doubt the EE's are smart enough to put that feature on the
table.

actually it's probably not so much EE's are dumb as that they assume
OS designers can implement such policies in their drivers instead of
needing them pushed down to the drive.  which is, you know, a pretty
reasonable (albeit wrong) assumption.

The most interesting thing on that wikipedia page is that freebsd geom
is already using a 4-second timeout.  Once you've done that, I'm not
sure if it matters whether the drive signals error by sending an error
packet, or signals error by sending nothing for >4 seconds---just so
long as you HEAR the signal and REACT.

     n> Basically drives without particular TLER settings drop out of
     n> RAID randomly.

well...I would guess they'll drop out whenever they hit a recoverable
error. :) Maybe the modern drives are so crappy, this is happening so
often, that it seems ``random''.  With these other cards, do the
drives ``go back in'' to the RAID when they start responding to
commands again?

     n> Does this happen in ZFS?

No.  Any timeouts in ZFS are annoyingly based on the ``desktop''
storage stack underneath it which is unaware of redundancy and of the
possibility of reading data from elsewhere in a redundant stripe
rather than waiting 7, 30, or 180 seconds for it.  ZFS will bang away
on a slow drive for hours, bringing the whole system down with it,
rather than read redundant data from elsewhere in the stripe, so you
don't have to worry about drives dropping out randomly.  Every last
bit will be squeezed from the first place ZFS tried to read it, even
if this takes years.  however you will get all kinds of analysis and
log data generated during those years (assuming the system stays up
enough to write the logs which it probably won't:

 http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFailmodeProblem

) Maybe it's getting better, but there's a fundamental philosophical
position of what piece of code's responsible for what sort of blocking
all this IMHO.

Attachment: pgpoFormZD3Wj.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to