CCTL

Miles Nordin Mon, 25 Jan 2010 14:37:36 -0800

>>>>> "sb" == Simon Breden <sbre...@gmail.com> writes:


    sb> 1. In simple non-RAID single drive 'desktop' PC scenarios
    sb> where you have one drive, if your drive is experiencing
    sb> read/write errors, as this is the only drive you have, and
    sb> therefore you have no alternative redundant source of data to
    sb> help with required reconstruction/recovery, you REALLY NEED
    sb> your drive to try as much as possible to try to recover

this sounds convincing to fetishists of an ordered world where
egg-laying mammals do not exist, but it's utter rubbish.

As drives go bad they return errors frequently, and they don't succeed
in recovering them.  They do not encounter, like, one or two errors
per day under general use most of which are recoverable in 7 < x < 60
seconds: this just does not happen except in your dreams.  Good drives
have zero UNC errors in the smartctl -a logs, and the conditional
probability of soon-failure on a drive that's experienced just one UNC
error is much higher than the regular probability of soon-failure.

Once a drive for which you have no backup/mirror/whatever is returning
errors, the remedy is not to wait longer.  This does not work,
basically ever.  The remedy is to shut down the OS, copy the failing
drive onto a good one with 'dd conv=noerror,sync', fsck, and read back
your data (with a bunch of zeroes inserted for unreadable blocks).
Depending on how bad the drive is, you'll have to use a smaller or
larger block size: the reason is, most unreadable areas are larger
than 1 sector, but the drive is so imbecillic if you read single
sectors it will reinvoke its bogus retry timer for each and every
sector within the same contiguous unreadable region: it has NO MEMORY
for the fact that it already tried to read that area and failed.  60
seconds * <normal # of bad sectors> for a failing/pissed-off drive is
generally somewhere between 3 days and forever, so you have to watch
progress and start over with larger bs= if you are not on target to
finish the dd within three days, because the drive will get worse and
worse, so larger bs= (meaning, not bothering trying to read data that
you would have been able to read) will get your data off the drive
before it fails more completely and thus actually rescue *more*.

Anyway, these drives, once they've gone bad their behavior is very
stupid and nothing like this imaginary world that's been pitched to
you by these bogan electrical engineers who apparently have no
experience using their own product.

pgpP0xqm6hS5z.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] hard drive choice, TLER/ERC/CCTL

Reply via email to