On Thu, Jan 12, 2012 at 05:01:48PM -0800, Richard Elling wrote:
> > This thread is about checksums - namely, now, what are
> > our options when they mismatch the data? As has been
> > reported by many blog-posts researching ZDB, there do
> > happen cases when checksums are broken (i.e. bitrot in
> > block pointers, or rather in RAM while the checksum was
> > calculated - so each ditto copy of BP has the error),
> > but the file data is in fact intact (extracted from
> > disk with ZDB or DD, and compared to other copies).
> 
> Metadata is at least doubly redundant and checksummed.

The implication is that the original calculation of the checksum was
bad in ram (undetected due to lack of ECC), and then written out
redundantly and fed as bad input to the rest of the merkle construct.
The data blocks on disk are correct, but they fail to verify against
the bad metadata.

The complaint appears to be that ZFS makes this 'worse' because the
(independently verified) valid data blocks are inaccessible. 

Worse than what? Corrupted file data that is then accurately
checksummed and readable as valid? Accurate data that is read without
any assertion of validity, in a traditional filesystem? There's
an inherent value judgement here that will vary by judge, but in each
case it's as much a judgement on the value of ECC and reliable
hardware, and your data and time enacting various kinds of recovery,
as it is the value of ZFS.

The same circumstance could, in principle, happen due to bad CPU even
with ECC.  In either case, the value of ZFS includes that an error has
been detected you would otherwise have been unaware of, and you get a
clue that you need to fix hardware and spend time. 

--
Dan.

Attachment: pgpE29pepViE2.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to