>>>>> "re" == Richard Elling <richard.ell...@gmail.com> writes:
re> By your logic, SECDED ECC for memory is broken because it only re> corrects ECC is not a checksum. Go ahead, get out your dictionary, enter severe-pedantry-mode. but it is relevantly different. In for example data transmission scenarios, FEC's like ECC are often used along with a strong noncorrecting checksum over a larger block. The OP further described scenarios plausible for storage, like ``long string of zeroes with 1 bit flipped'', that produce collisions with the misimplemented fletcher2 (but, obviously, not with any strong checksum like correct-fletcher2). re> is fletcher2 "good enough" for storage? yes, it probably is good enough, but ZFS implements some other broken algorithm and calls it fletcher2. so, please stop saying fletcher2. re> I'll blame the lawyers. They are causing me to remove certain re> words from my vocabulary :-( yeah, well, allow me to add a word back to the vocabulary: BROKEN. If you are not legally allowed to use words like broken and working, then find another identity from which to talk, please. re> Question for the zfs-discuss participants, have you seen a re> data corruption that was not detected when using fletcher2? This is ridiculous. It's not fletcher2, it's brokenfletcher2. It's avoidably extremely weak. It's reasonable to want to use a real checksum, and this PR game you are playing is frustrating and confidence-harming for people who want that. This does not have to become a big deal, unless you try to spin it with a 7200rpm PR machine like IBM did with their broken Deathstar drives before they became HGST. Please, what we need to do is admit that the checksum is relevantly broken in a way that compromises the integrity guarantees with which ZFS was sold to many customers, fix the checksum, and learn how to conveniently migrate our data. Based on the table you posted, I guess file data can be set to fletcher4 or sha256 using filesystem properties to work around the bug on Solaris versions with the broken implementation. 1. What's needed to avoid fletcher2 on the ZIL on broken Solaris versions? 2. I understand the workaround, but not the fix. How does the fix included S10u8 and snv_114 work? Is there a ZFS version bump? Does the fix work by implementing fletcher2 correctly? or does it just disable fletcher2 and force everything to use brokenfletcher4 which is good enough? If the former, how are the broken and correct versions of fletcher2 distinguished---do they show up with different names in the pool properties? Once you have the fixed software, how do you make sure fixed checksums are actually covering data blocks originally written by old broken software? I assume you have to use rsync or zfs send/recv to rewrite all the data with the new checksum? If yes, what do you have to do before rewriting---upgrade solaris and then 'zfs upgrade' each filesystem one by one? Will zfs send/recv work across the filesystem versions, or does the copying have to be done with rsync? 3. speaking of which, what about the checksum in zfs send streams? is it also fletcher2, and if so was it also fixed in s10u8/snv_114, and how does this affect compatibility for people who have ignored my advice and stored streams instead of zpools? Will a newer 'zfs recv' always work with an older 'zfs send' but not the other way around? there is basically no informaiton about implementing the fix in the bug, and we can't write to the bug from outside Sun. Whatever sysadmins need to do to get their data under the strength of checksum they thought it was under, it might be nice to describe it in the bug for whoever gets referred to the bug and has an affected version.
pgp4LNb1yFFMv.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss