On Fri, Jan 07, 2011 at 07:33:53PM +0000, Robert Milkowski wrote:
>  On 01/ 7/11 02:13 PM, David Magda wrote:
> >
> >Given the above: most people are content enough to trust Fletcher to not
> >have data corruption, but are worried about SHA-256 giving 'data
> >corruption' when it comes de-dupe? The entire rest of the computing world
> >is content to live with 10^-15 (for SAS disks), and yet one wouldn't be
> >prepared to have 10^-30 (or better) for dedupe?
> >
> 
> I think you are do not understand entirely the problem.
> Lets say two different blocks A and B have the same sha256 checksum, A 
> is already stored in a pool, B is being written. Without verify and 
> dedup enabled B won't be written. Next time you ask for block B you will 
> actually end-up with the block A. Now if B is relatively common in your 
> data set you have a relatively big impact on many files because of one 
> corrupted block (additionally from a fs point of view this is a silent 
> data corruption). [...]

All true, that's why verification was mandatory for fletcher, which is
not cryptographically strong hash. Until SHA256 is no broken, wasting
power for verification is just a waste of resources, which isn't "green":)
Once SHA256 is broken, verification can be turned on.

> [...] Without dedup if you get a single block corrupted 
> silently an impact usually will be relatively limited.

Except when corruption happens on write, not read, ie. you write data,
it is corrupted on the fly, but corrupted version still matches fletcher
checksum (the default now). Now every read of this block will return
silently corrupted data.

> Now what if block B is a meta-data block?

Metadata is not deduplicated.

> The point is that a potential impact of a hash collision is much bigger 
> than a single silent data corruption to a block, not to mention that 
> dedup or not all the other possible cases of data corruption are there 
> anyway, adding yet another one might or might not be acceptable.

I'm more in opinion that it was mistake that the verification feature
wasn't removed along with fletcher-for-dedup removal. It is good to be
able to turn on verification once/if SHA256 will be broken - that's the
only reason I'll leave it, but I somehow feel that there are bigger
chances you can corrupt your data because of extra code complexity
coming with verification than because of SHA256 collision.

-- 
Pawel Jakub Dawidek                       http://www.wheelsystems.com
p...@freebsd.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

Attachment: pgpDaDkDP6RK3.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to