[zfs-code] fletcher2/4 implementations fundamentally flawed

Richard Elling Fri, 27 Mar 2009 21:02:13 -0700

Mark Butler wrote:
> As I understand it, there is no way to fix a problem with the algorithm of a 
> defined checksum without invalidating existing zfs filesystems.   Any fix to 
> to the fletcher2 will have to be given a new name.
>


Other than the name, fletcher, is there actually a problem here?

> Given how incredibly weak the current fletcher2 is, perhaps the first thing 
> that should be done is to change the default to fletcher4.  The flawed 
> fletcher2 appears to be 32768 times weaker than the 16 bit TCP checksum 
> algorithm, i.e. it appears to only have a 50% chance of catching a single bit 
> error or any series of single bit errors in the most significant bit of any 
> 64 bit word in a disk block.  For those bits, it is equivalent to a *1 bit* 
> checksum.
>   

The way I see it, the current "fletcher"2 is better than a simple xor, 
but has the
same computational cost.  Anecdotal evidence suggests that the current 
fletcher2
catches a large number of faults. This makes some sense for magnetic disk
drives, which already have significant single bit correction.  If you 
know of a
formal study which shows the distribution of errors in the current 
population,
I'd be very interested in seeing it.

NB, there are already some RFEs for improving the reporting of checksum
mismatches, but there may be more work we can do there, too.
 -- richard

[zfs-code] fletcher2/4 implementations fundamentally flawed

Reply via email to