Re: [zfs-discuss] More Dedupe Questions...

Darren J Moffat Tue, 03 Nov 2009 04:27:49 -0800

Tristan Ball wrote:

I'm curious as to how send/recv intersects with dedupe... if I send/recva deduped filesystem, is the data sent it it's de-duped form, ie justsent once, followed by the pointers for subsequent dupe data, or is thethe data sent in expanded form, with the recv side system then having toredo the dedupe process?

The on disk dedup and dedup of the stream are actually separatefeatures. Stream dedup hasn't yet integrated. It will be a choice at*send* time if the stream is to be deduplicated.

Obviously sending it deduped is more efficient in terms of bandwidth andCPU time on the recv side, but it may also be more complicated to achieve?


A stream can be deduped even if the on disk format isn't and vice versa.

Also - do we know yet what affect block size has on dedupe? My guess isthat a smaller block size will perhaps give a better duplication matchrate, but at the cost of higher CPU usage and perhaps reducedperformance, as the system will need to store larger de-dupe hash tables?

That really depends on how the applications write blocks and what yourdata is like. It could go either way very easily. As with all dedupit is a trade off between IO bandwidth and CPU/memory. Sometimes dedupwill improve performance, since like compression it can reduce IOrequirements, but depending on workload the CPU/memory overhead may ormay not be worth it (same with compression).


--
Darren J Moffat
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] More Dedupe Questions...

Reply via email to