On 01/ 6/11 07:44 PM, Peter Taps wrote:
Folks,
I have been told that the checksum value returned by Sha256 is almost
guaranteed to be unique. In fact, if Sha256 fails in some case, we have a
bigger problem such as memory corruption, etc. Essentially, adding verification
to sha256 is an overkill.
Perhaps (Sha256+NoVerification) would work 99.999999% of the time. But
(Fletcher+Verification) would work 100% of the time.
Which one of the two is a better deduplication strategy?
If we do not use verification with Sha256, what is the worst case scenario? Is
it just more disk space occupied (because of failure to detect duplicate
blocks) or there is a chance of actual data corruption (because two blocks were
assumed to be duplicate although they are not)?
Yes, there is a possibility of data corruption.
Or, if I go with (Sha256+Verification), how much is the overhead of
verification on the overall process?
It really depends on your specific workload.
If your application is mostly reading data then it well might be you
won't even notice verify.
Sha256 is supposed to be almost bullet proof but...
At the end of a day it is all about how much you value your data.
But as I wrote before, try with verify and see if performance is
acceptable. It well might be the case.
You can always disable verify at any time.
If I do go with verification, it seems (Fletcher+Verification) is more
efficient than (Sha256+Verification). And both are 100% accurate in detecting
duplicate blocks.
I don't believe that fletcher is still allowed for dedup - right now it
is only sha256.
--
Robert Milkowski
http://milek.blogspot.com
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss