On Mon, January 10, 2011 02:41, Eric D. Mudama wrote:
> On Sun, Jan  9 at 22:54, Peter Taps wrote:
>> Thank you all for your help. I am the OP.
>> I haven't looked at the link that talks about the probability of
>> collision. Intuitively, I still wonder how the chances of collision
>> can be so low. We are reducing a 4K block to just 256 bits. If the
>> chances of collision are so low, *theoretically* it is possible to
>> reconstruct the original block from the 256-bit signature by using a
>> simple lookup. Essentially, we would now have world's best
>> compression algorithm irrespective of whether the data is text or
>> binary. This is hard to digest.
> "simple" lookup isn't so simple when there are 2^256 records to
> search, however, fundamentally your understanding of hashes is
> correct.

It should also be noted that ZFS itself can "only" address 2^128 bytes
(not even 4K 'records'), and supposedly to fill those 2^128 bytes it would
take as much energy as it would take to boil the Earth's oceans:


So recording and looking up 2^256 records would be quite an
accomplishment. It's a lot of data.

If the OP wants to know why the chances are so low, he'll have to learn a
bit about hash functions (which is what SHA-256 is):


Knowing exactly how the math (?) works is not necessary, but understanding
the principles would be useful if one wants to have a general picture as
to why SHA-256 doesn't need a verification step, and why it was chosen as
one of the ZFS (dedupe) checksum options.

zfs-discuss mailing list

Reply via email to