Tristan Ball wrote:

I'm curious as to how send/recv intersects with dedupe... if I send/recv a deduped filesystem, is the data sent it it's de-duped form, ie just sent once, followed by the pointers for subsequent dupe data, or is the the data sent in expanded form, with the recv side system then having to redo the dedupe process?

The on disk dedup and dedup of the stream are actually separate features. Stream dedup hasn't yet integrated. It will be a choice at *send* time if the stream is to be deduplicated.

Obviously sending it deduped is more efficient in terms of bandwidth and CPU time on the recv side, but it may also be more complicated to achieve?

A stream can be deduped even if the on disk format isn't and vice versa.

Also - do we know yet what affect block size has on dedupe? My guess is that a smaller block size will perhaps give a better duplication match rate, but at the cost of higher CPU usage and perhaps reduced performance, as the system will need to store larger de-dupe hash tables?

That really depends on how the applications write blocks and what your data is like. It could go either way very easily. As with all dedup it is a trade off between IO bandwidth and CPU/memory. Sometimes dedup will improve performance, since like compression it can reduce IO requirements, but depending on workload the CPU/memory overhead may or may not be worth it (same with compression).

--
Darren J Moffat
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to