On Mon, January 10, 2011 02:41, Eric D. Mudama wrote:
> On Sun, Jan  9 at 22:54, Peter Taps wrote:
>> Thank you all for your help. I am the OP.
>>
>> I haven't looked at the link that talks about the probability of
>> collision. Intuitively, I still wonder how the chances of collision
>> can be so low. We are reducing a 4K block to just 256 bits. If the
>> chances of collision are so low, *theoretically* it is possible to
>> reconstruct the original block from the 256-bit signature by using a
>> simple lookup. Essentially, we would now have world's best
>> compression algorithm irrespective of whether the data is text or
>> binary. This is hard to digest.
>
> "simple" lookup isn't so simple when there are 2^256 records to
> search, however, fundamentally your understanding of hashes is
> correct.
[...]

It should also be noted that ZFS itself can "only" address 2^128 bytes
(not even 4K 'records'), and supposedly to fill those 2^128 bytes it would
take as much energy as it would take to boil the Earth's oceans:

    http://blogs.sun.com/bonwick/entry/128_bit_storage_are_you

So recording and looking up 2^256 records would be quite an
accomplishment. It's a lot of data.

If the OP wants to know why the chances are so low, he'll have to learn a
bit about hash functions (which is what SHA-256 is):

    http://en.wikipedia.org/wiki/Hash_function
    http://en.wikipedia.org/wiki/Cryptographic_hash_function

Knowing exactly how the math (?) works is not necessary, but understanding
the principles would be useful if one wants to have a general picture as
to why SHA-256 doesn't need a verification step, and why it was chosen as
one of the ZFS (dedupe) checksum options.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to