Mattias Pantzare wrote:
> For this application (deduplication data) the likelihood of matching
> hashes are very high. In fact it has to be, otherwise there would not
> be any data to deduplicate.
>
> In the cp example, all writes would have matching hashes and all need 
> a verify. 

Would the read for verifying a matching hash take much longer than 
writing duplicate data?  Wouldn't the significant overhead be in 
managing hashes and searching for matches, not in verifying matches?  
However, this could be optimized for the cp example by keeping a cache 
of the hashes of data that was recently read, or even caching the data 
itself so that verification requires no duplicate read.  A disk-wide 
search for independent duplicate data would be a different process, though.

Cheers,
11011011
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to