On 01/ 6/11 07:44 PM, Peter Taps wrote:
Folks,

I have been told that the checksum value returned by Sha256 is almost 
guaranteed to be unique. In fact, if Sha256 fails in some case, we have a 
bigger problem such as memory corruption, etc. Essentially, adding verification 
to sha256 is an overkill.

Perhaps (Sha256+NoVerification) would work 99.999999% of the time. But 
(Fletcher+Verification) would work 100% of the time.

Which one of the two is a better deduplication strategy?

If we do not use verification with Sha256, what is the worst case scenario? Is 
it just more disk space occupied (because of failure to detect duplicate 
blocks) or there is a chance of actual data corruption (because two blocks were 
assumed to be duplicate although they are not)?

Yes, there is a possibility of data corruption.

Or, if I go with (Sha256+Verification), how much is the overhead of 
verification on the overall process?

It really depends on your specific workload.
If your application is mostly reading data then it well might be you won't even notice verify.

Sha256 is supposed to be almost bullet proof but...
At the end of a day it is all about how much you value your data.
But as I wrote before, try with verify and see if performance is acceptable. It well might be the case.
You can always disable verify at any time.

If I do go with verification, it seems (Fletcher+Verification) is more 
efficient than (Sha256+Verification). And both are 100% accurate in detecting 
duplicate blocks.

I don't believe that fletcher is still allowed for dedup - right now it is only sha256.

--
Robert Milkowski
http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to