Nice analysis. I understand you've found on the processor/memory-subsystem 
architecture you've experimented with, that if the data being summed isn't 
already cached, it's access will likely dominate it's use in calculations:

- Although this makes sense (presuming no mechanism may used to pre-cache data 
to be only then be subsequently check-summed when cache resident in the 
future), as processor/memory system architectures are still evolving, it may be 
prudent to account for that possibility; for example possible future use of 
streaming vector/memory units, where multiple data streams may be potentially 
summed in parallel and/or possibly integrated into the I/O channel such that 
while data is being retrieved its checksum may be computed without processor 
intervention may be desirable.

- With respect to Fletcher4, as the first two running sums (representing a 
traditional Fletcher checksum implementation) are sufficient to yield a hamming 
distance of at least 3 (meaning all 1 and 2 bit errors are detectable, and 
thereby 1 bit errors also potentially correctable if ever desired) for block 
sizes somewhat larger than 256KB, being larger than required by ZFS; I can't 
help but wonder if this may be sufficient rather than worrying about trying to 
maintain two more running sums each dependent on the previous without potential 
overflow (or even potential algorithm implementation issues which may impede 
performance in the future by having to maintain 4 sums with the later 3 being 
dependant on it's predecessor's sum, rather than just having to schedule only 1 
such dependency if only maintaining 2 sums, and presuming support for 
potentially more efficient data access on future platform implementations)?

(As a caveat, it's not clear to me how much more resilient the checksum will be 
by maintaining 2 more sums than traditionally used, and thereby when I 
previously observed that fletcher4 was fine, it was based on it's first two 
terms not overflowing for data sizes required, without regard to the 
possibility that the upper two terms may overflow, as I didn't consider them 
significant, or rather didn't rely on their significance).

- However in conclusion, although without pre-fetching, streaming, and/or 
channel integration, fletcher2 (corrected to use 32b data with 64b sums) may be 
no faster than the current implementation of fletcher4 (being possibly not 
significantly more resilient than a corrected flecther2, but which may be 
refined to warrant the upper two term also do not overflow, and thereby 
improving it's resilience to some degree); I personally suspect it's best to 
BOTH refine fletcher4 to warrant the upper two sums do not overflow by wrapping 
the upper bits of the upper two sums every N (pick your N) iterations; AND fix 
fletcher2 because when fixed it has the potential to be significantly faster 
than the fixed fletcher4 on future or other platforms leveraging mechanizes to 
pre-fetch data, and/or synchronize computations with access. (Both being 
relatively easily fixes, so why not?)

Again, merely IMHO.
-- 
This message posted from opensolaris.org

Reply via email to