>>>>> "dc" == Daniel Carosone <d...@geek.com.au> writes:
>>>>> "w" == Willy  <willy.m...@gmail.com> writes:
>>>>> "sb" == Simon Breden <sbre...@gmail.com> writes:

First of all, I've been so far assembling vdev stripes from different
manufacturers, such that one manufacturer can have a bad batch or
firmware bug killing all their drives at once without losing my pool.
Based on recent drive problems I think this is a really wise idea.

     w> http://www.csc.liv.ac.uk/~greg/projects/erc/

dead link?

     w> Unfortunately, smartmontools has limited SATA drive support in
     w> opensolaris, and you cannot query or set the values.

also the driver stack is kind of a mess with different mid-layers
depending on which SATA low-level driver you use, and many proprietary
no-source low-level drivers, neither of which you have to deal with on
Linux.  Maybe in a decade it will get better if the oldest driver we
have to deal with is AHCI, but yes smartmontools vs. uscsi still needs
fixing!

     w> I have 4 of the HD154UI Samsung Ecogreens, and was able to set
     w> the error reporting time using HDAT2.  The settings would
     w> survive a warm reboot, but not a powercycle.

after stfw this seems to be some MesS-DOS binary-only tool.  Maybe you
can run it in virtualbox and snoop on its behavior---this worked for
me with Wine and a lite-on RPC tool.  At least on Linux you can for
example run CD burning programs from within Wine---it is that good.

    sb> RAID-version drives at 50%-100% price premium, I have decided
    sb> not to use Western Digital drives any longer, and have
    sb> explained why here:

    sb> http://breden.org.uk/2009/05/01/home-fileserver-a-year-in-zfs/

IMHO it is just a sucker premium because the feature is worthless
anyway.  From the discussion I've read here, the feature is designed
to keep drives which are *reporting failures* to still be considered
*GOOD*, and to not drop out of RAIDsets in RAID-on-a-card
implementations with RAID-level timeouts <60seconds.  It is a
workaround for huge modern high-BER drives and RAID-on-card firmware
that's (according to some person's debateable idea) not well-matched
to the drive.  Of course they are going to sell it as this big
valuable enterprise optimisation, but at its root it has evloved as a
workaround for someone else's broken (from WD POV) software.

The solaris timeout, because of m * n * o multiplicative layered
speculative retry nonsense, is 60 seconds or 180 seconds or many
hours, so solaris is IMHO quite broken in this regard but also does
not benefit from the TLER workaround: the long-TLER drives will not
drop out of RAIDsets on ZFS even if they report an error now and then.

What's really needed for ZFS or RAID in general is (a) for drives to
never spend more than x% of their time attempting recovery, so they
don't effectively lose ALL the data on a partially-damaged drive by
degrading performance to the point it would take n years to read out
what data they're able to deliver and (b) RAID-level smarts to
dispatch reads for redundant data when a drive becomes slow without
reporting failure, and to diagnose drives as failed based on
statistical measurements of their speed.  TLER does not deliver (a)
because reducing error retries to 5 seconds is still 10^3 slowdown
instead of 10^4 and thus basically no difference, and the hard drive
can never do (b) it's a ZFS-level feature.  

so my question is, have you actually found cases where ZFS needs TLER
adjustments, or are you just speculating and synthesizing ideas from a
mess of whitepaper marketing blurbs?  

Because a 7-second-per-read drive will fuck your pool just as badly as
a 70-second-per-read drive: you're going to have to find and unplug it
before the pool will work again.

Attachment: pgpXHCdaAwoIH.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to