>>>>> "re" == Richard Elling <[EMAIL PROTECTED]> writes:
re> not all devices return error codes which indicate re> unrecoverable reads. What you mean is, ``devices sometimes return bad data instead of an error code.'' If you really mean there are devices out there which never return error codes, and always silently return bad data, please tell us which one and the story of when you encountered it, because I'm incredulous. I've never seen or heard of anything like that. Not even 5.25" floppies do that. Well...wait, actually I have. I heard some SGI disks had special firmware which could be ordered to behave this way, and some kind of ioctl or mount option to turn it on per-file or per-filesystem. But the drives wouldn't disable error reporting unless ordered to. Another interesting lesson SGI offers here: they pushed this feature through their entire stack. The point was, for some video playback, data which arrives after the playback point has passed is just as useless as silently corrupt data, so the disk, driver, filesystem, all need to modify their exception handling to deliver the largest amount of on-time data possible, rather than the traditional goal of eventually returning the largest amount of correct data possible and clear errors instead of silent corruption. This whole-stack approach is exactly what I thought ``green line'' was promising, and exactly what's kept out of Solaris by the ``go blame the drivers'' mantra. Maybe I was thinking of this SGI firmware when I suggested the customized firmware netapp loads into the drives in their study could silently return bad data more often than the firmware we're all using, the standard firmware with 512-byte sectors intended for RAID layers without block checksums. re> I would love for you produce data to that effect. Read the netapp paper you cited earlier http://www.usenix.org/event/fast08/tech/full_papers/bairavasundaram/bairavasundaram.pdf on page 234 there's a comparison of the relative prevalence of each kind of error. Latent sector errors / Unrecoverable reads nearline disks experiencing latent read errors per year: 9.5% Netapp calls the UNC errors, where the drive returns an error instead of data, ``latent sector errors.'' Software RAID systems other than ZFS *do* handle this error, usually better than ZFS to my impression. And AIUI when it doesn't freeze and reboot, ZFS counts this as a READ error. In addition to reporting it, most consumer drives seem to log the last five of these non-volatilely, and you can read the log with 'smartctl -a' (if you're using Linux always, or under Solaris only if smartctl is working with your particular disk driver). Silent corruption nearline disks experiencing silent corruption per year: 0.466% What netapp calls ``silent data corruption'' is bad data silently returned by drives with no error indication, counted by ZFS as CKSUM and seems not to cause ZFS to freeze. I think you have been lumping this in with unrecoverable reads, but using the word ``silent'' makes it clearer because unrecoverable makes it sound to me like the drive tried to recover, and failed, in which case the drive probably also reported the error making it a ``latent sector error''. filesystem corruption This is also discovered silently w.r.t. the driver: the corruption that happens to ZFS systems when SAN targets disappear suddenly or when you offline a target and then reboot (which is also counted in the CKSUM column, and which ZFS-level redundancy also helps fix). I would call this ``ZFS bugs'', ``filesystem corruption,'' or ``manual resilvering''. Obviously it's not included on the Netapp table. It would be nice if ZFS had two separate CKSUM columns to distinguish between what netapp calls ``checksum errors'' vs ``identity discrepancies''. For ZFS the ``checksum error'' would point with high certainty to the storage and silent corruption, and the ``identity discrepancy'' would be more like filesystem corruption and flag things like one side of a mirror being out-of-date when ZFS thinks it shouldn't be. but currently we have only one CKSUM column for both cases. so, I would say, yes, the type of read error that other software RAID systems besides ZFS do still handle is a lot more common: 9.5%/yr vs 0.466%/yr for nearline disks, and the same ~20x factor for enterprise disks. The rare silent error which other software LVM's miss and only ZFS/Netapp/EMC/... handles is still common enough to worry about, at least on the nearline disks in the Netapp drive population. What this also shows, though, is that about 1 in 10 drives will return an UNC per year, and possibly cause ZFS to freeze up. It's worth worrying about availability during an exception as common as that---it might even be more important for some applications than catching the silent corruption. not for my own application, but for some readily imagineable ones, yes.
pgpCkxLRrmDAV.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss