On 6/10/2010 1:21 PM, Pawel Jakub Dawidek wrote:
If we send incremental stream we can be sure that up to the previous snapshot we have the same data on the other side. I'm aware it doesn't mean the data has exactly the same checksum (eg. it can be compressed with different algorithm). But in theory, are we able to figure out that the given block we try to send is already part of the dataset's previous snapshot? I'm fine with discarding incremental stream on the remote site if it uses different compression algorithm or simply deduplication is turned off (bascially when there is no block matching stored checksum). But if I have identical configurations on both ends I'd like not to send the same block multiple times in multiple incremental streams
No, you can't be sure. You can *assume* you sent the proper incremental stream to the receiving host, but what if you didn't? Or it got deleted? etc.
You *have* to check with receiving host to see what's there. As Lori pointed out, you need the DDT from the receiving host. As I said earlier, this looks to NOT need code changes, just a smart userland app. I'd use rsync's model, where you SSH over to the other host, run the same binary (which knows it's in "receive" mode), and set up the com link between the two. The receiver's DDT gets generated, passed back to the sender, and the sender can then do lookups using both DDT sets. It's really not that complicated.
My sole worry is that since 'zfs send' and 'zfs receive' are moving targets to keep up with the zfs filesystem version features, you'll have to constantly modify your new app to be compatible with newer zfs versions.
-- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA _______________________________________________ zfs-code mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/zfs-code
