On Sun, May 03, 2009 at 12:53:31PM -0700, paul wrote: > Very nice; and ultimately will be very interesting to see what > percentage of checksum errors within a particular deployment turn out > to most likely be correctable single bit errors. (And thereby possibly > even measurably help improve the integrity of non-redundant array > configurations, short of the catastrophic failure of a sector or drive > itself.)
The other thing I'm working on is getting better FMA ereports for checksum errors; one thing that's currently missing in the case of a mirrored or raid-z configuration is the information on the difference between the correct content and the bad content. That way, we'll have a better idea of what's actually happening, and the FMA responses may also get better. > After reviewing the code (and presuming you intended "if (base->a > less-than bad->a)"), I can't quite seem to convince myself the > implementation is immune from misdiagnosing a double/triple bit > error as a single bit error in general (although likely staring me > in the face; as all correct, single, and double bit error checksums > are warranted to be unique; as should also be all 4 and 5 bit > error checksums for a corrected fletcher4 implementation to my > understanding)? Let me work on the math some and get back to you. Cheers, - jonathan