On Jan 12, 2012, at 2:34 PM, Jim Klimov wrote: > I guess I have another practical rationale for a second > checksum, be it ECC or not: my scrubbing pool found some > "unrecoverable errors". Luckily, for those files I still > have external originals, so I rsynced them over. Still, > there is one file whose broken prehistory is referenced > in snapshots, and properly fixing that would probably > require me to resend the whole stack of snapshots. > That's uncool, but a subject for another thread. > > This thread is about checksums - namely, now, what are > our options when they mismatch the data? As has been > reported by many blog-posts researching ZDB, there do > happen cases when checksums are broken (i.e. bitrot in > block pointers, or rather in RAM while the checksum was > calculated - so each ditto copy of BP has the error), > but the file data is in fact intact (extracted from > disk with ZDB or DD, and compared to other copies).
Metadata is at least doubly redundant and checksummed. Can you provide links to posts that describe this failure mode? > For these cases bloggers asked (in vain) - why is it > not allowed for an admin to confirm validity of end-user > data and have the system reconstruct (re-checksum) the > metadata for it?.. IMHO, that's a valid RFE. Metadata is COW, too. Rewriting the data also rewrites the metadata. > While the system is scrubbing, I was reading up on theory. > Found a nice text "Keeping Bits Safe: How Hard Can It Be?" > by David Rosenthal [1], where I stumbled upon an interesting > thought: > The bits forming the digest are no different from the > bits forming the data; neither is magically incorruptible. > ...Applications need to know whether the digest has > been changed. Hence for ZFS, the checksum (digest) is kept in the parent metadata. The condition described above can affect T10 DIF-style checksums, but not ZFS. > In our case, where original checksum in the blockpointer > could be corrupted in (non-ECC) RAM of my home-NAS just > before it was dittoed to disk, another checksum - copy > of this same one, or a differently calculated one, could > provide ZFS with the means to determine whether the data > or one of the checksums got corrupted (or all of them). > Of course, this is not an absolute protection method, > but it can reduce the cases where pools have to be > "destroyed, recreated and recovered from tape". Nope. > It is my belief that using dedup contributed to my issue - > there's lots more of updating the block pointers and their > checksums, so it gradually becomes more likely that the > metadata (checksum) blocks gets broken (i.e. in non-ECC > RAM), while the written-once userdata remains intact... > > -- > [1] http://queue.acm.org/detail.cfm?id=1866298 > While the text discusses what all ZFSers mostly know > already - about bit-rot, MTTDL and such, it does so with > great detail and many examples, and gave me a better > understanding of it all even though I deal with this for > several years now. A good read, I suggest it to others ;) > > //Jim Klimov > _______________________________________________ -- richard -- ZFS and performance consulting http://www.RichardElling.com SCALE 10x, Los Angeles, Jan 20-22, 2012 _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss