>>>>> "dc" == Daniel Carosone <d...@geek.com.au> writes: >>>>> "w" == Willy <willy.m...@gmail.com> writes: >>>>> "sb" == Simon Breden <sbre...@gmail.com> writes:
First of all, I've been so far assembling vdev stripes from different manufacturers, such that one manufacturer can have a bad batch or firmware bug killing all their drives at once without losing my pool. Based on recent drive problems I think this is a really wise idea. w> http://www.csc.liv.ac.uk/~greg/projects/erc/ dead link? w> Unfortunately, smartmontools has limited SATA drive support in w> opensolaris, and you cannot query or set the values. also the driver stack is kind of a mess with different mid-layers depending on which SATA low-level driver you use, and many proprietary no-source low-level drivers, neither of which you have to deal with on Linux. Maybe in a decade it will get better if the oldest driver we have to deal with is AHCI, but yes smartmontools vs. uscsi still needs fixing! w> I have 4 of the HD154UI Samsung Ecogreens, and was able to set w> the error reporting time using HDAT2. The settings would w> survive a warm reboot, but not a powercycle. after stfw this seems to be some MesS-DOS binary-only tool. Maybe you can run it in virtualbox and snoop on its behavior---this worked for me with Wine and a lite-on RPC tool. At least on Linux you can for example run CD burning programs from within Wine---it is that good. sb> RAID-version drives at 50%-100% price premium, I have decided sb> not to use Western Digital drives any longer, and have sb> explained why here: sb> http://breden.org.uk/2009/05/01/home-fileserver-a-year-in-zfs/ IMHO it is just a sucker premium because the feature is worthless anyway. From the discussion I've read here, the feature is designed to keep drives which are *reporting failures* to still be considered *GOOD*, and to not drop out of RAIDsets in RAID-on-a-card implementations with RAID-level timeouts <60seconds. It is a workaround for huge modern high-BER drives and RAID-on-card firmware that's (according to some person's debateable idea) not well-matched to the drive. Of course they are going to sell it as this big valuable enterprise optimisation, but at its root it has evloved as a workaround for someone else's broken (from WD POV) software. The solaris timeout, because of m * n * o multiplicative layered speculative retry nonsense, is 60 seconds or 180 seconds or many hours, so solaris is IMHO quite broken in this regard but also does not benefit from the TLER workaround: the long-TLER drives will not drop out of RAIDsets on ZFS even if they report an error now and then. What's really needed for ZFS or RAID in general is (a) for drives to never spend more than x% of their time attempting recovery, so they don't effectively lose ALL the data on a partially-damaged drive by degrading performance to the point it would take n years to read out what data they're able to deliver and (b) RAID-level smarts to dispatch reads for redundant data when a drive becomes slow without reporting failure, and to diagnose drives as failed based on statistical measurements of their speed. TLER does not deliver (a) because reducing error retries to 5 seconds is still 10^3 slowdown instead of 10^4 and thus basically no difference, and the hard drive can never do (b) it's a ZFS-level feature. so my question is, have you actually found cases where ZFS needs TLER adjustments, or are you just speculating and synthesizing ideas from a mess of whitepaper marketing blurbs? Because a 7-second-per-read drive will fuck your pool just as badly as a 70-second-per-read drive: you're going to have to find and unplug it before the pool will work again.
pgpXHCdaAwoIH.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss