Mark Butler wrote: > As I understand it, there is no way to fix a problem with the algorithm of a > defined checksum without invalidating existing zfs filesystems. Any fix to > to the fletcher2 will have to be given a new name. >
Other than the name, fletcher, is there actually a problem here? > Given how incredibly weak the current fletcher2 is, perhaps the first thing > that should be done is to change the default to fletcher4. The flawed > fletcher2 appears to be 32768 times weaker than the 16 bit TCP checksum > algorithm, i.e. it appears to only have a 50% chance of catching a single bit > error or any series of single bit errors in the most significant bit of any > 64 bit word in a disk block. For those bits, it is equivalent to a *1 bit* > checksum. > The way I see it, the current "fletcher"2 is better than a simple xor, but has the same computational cost. Anecdotal evidence suggests that the current fletcher2 catches a large number of faults. This makes some sense for magnetic disk drives, which already have significant single bit correction. If you know of a formal study which shows the distribution of errors in the current population, I'd be very interested in seeing it. NB, there are already some RFEs for improving the reporting of checksum mismatches, but there may be more work we can do there, too. -- richard