Re: [zfs-code] Improving zfs send dedup.

Erik Trimble Thu, 10 Jun 2010 16:34:17 -0700

On 6/10/2010 1:21 PM, Pawel Jakub Dawidek wrote:


If we send incremental stream we can be sure that up to the previous
snapshot we have the same data on the other side. I'm aware it doesn't
mean the data has exactly the same checksum (eg. it can be compressed
with different algorithm). But in theory, are we able to figure out that
the given block we try to send is already part of the dataset's previous
snapshot? I'm fine with discarding incremental stream on the remote site
if it uses different compression algorithm or simply deduplication is
turned off (bascially when there is no block matching stored checksum).
But if I have identical configurations on both ends I'd like not to send
the same block multiple times in multiple incremental streams

No, you can't be sure. You can *assume* you sent the proper incrementalstream to the receiving host, but what if you didn't? Or it got deleted?etc.

You *have* to check with receiving host to see what's there. As Loripointed out, you need the DDT from the receiving host. As I saidearlier, this looks to NOT need code changes, just a smart userland app.I'd use rsync's model, where you SSH over to the other host, run thesame binary (which knows it's in "receive" mode), and set up the comlink between the two. The receiver's DDT gets generated, passed back tothe sender, and the sender can then do lookups using both DDT sets.It's really not that complicated.

My sole worry is that since 'zfs send' and 'zfs receive' are movingtargets to keep up with the zfs filesystem version features, you'll haveto constantly modify your new app to be compatible with newer zfs versions.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
zfs-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/zfs-code

Re: [zfs-code] Improving zfs send dedup.

Reply via email to