Re: [zfs-code] Improving zfs send dedup.

Erik Trimble Tue, 15 Jun 2010 02:49:42 -0700

On 6/15/2010 2:06 AM, Pawel Jakub Dawidek wrote:

On Thu, Jun 10, 2010 at 04:32:06PM -0700, Erik Trimble wrote:

On 6/10/2010 1:21 PM, Pawel Jakub Dawidek wrote:

If we send incremental stream we can be sure that up to the previous
snapshot we have the same data on the other side. I'm aware it doesn't
mean the data has exactly the same checksum (eg. it can be compressed
with different algorithm). But in theory, are we able to figure out that
the given block we try to send is already part of the dataset's previous
snapshot? I'm fine with discarding incremental stream on the remote site
if it uses different compression algorithm or simply deduplication is
turned off (bascially when there is no block matching stored checksum).
But if I have identical configurations on both ends I'd like not to send
the same block multiple times in multiple incremental streams

No, you can't be sure.  You can *assume* you sent the proper incremental
stream to the receiving host, but what if you didn't? Or it got deleted?
etc.

So for this to work, the following conditions have to be meet:


1. Pools configuration on both sides have to be identical - the same
    checksum algorithms, the same compression algorithms, etc.

2. No snapshots can be removed on remote site as we can lose block by
    doing this.

3. We have to have all datasets on the remote site, as it would be too
    expensive to find if the given block which exists in DDT is referenced
    by the given dataset. If I want to send a block and it exists in DDT
    with refcount>  1, I've no way to tell which datasets are referencing
    it besides from scanning all datasets (or at least my dataset).

If those conditions are meet, I can safely send checksums of blocks with
brith date from before the snapshot I'm sending. Am I right?

Welll......

I suppose so. In theory, I see nothing wrong with what you are saying.But that's a *whole* lot of very iffy preconditions, and it's really notat all practical. In fact, I'd go so far to say that it's *highly*unlikely you can meet them in most real-world cases.

Realistically, you've got four scenarios for sending a incremental fromsender A to receiver B machines:


1.    A's pool has dedup on, B's also is on.
2.    A's pool does NOT have dedup on , B's pool does.
3.    A's pool does NOT have dedup on, neither does B.
4.    A's pool has dedup on, B's pool doesn't have it on.

I'm assuming that your goal is to minimize the amount of data being sendacross the wire from host A to B.

Cases 3 & 4 mean that you can't do any better than 'zfs send -D | zfsreceive', as B has nothing to dedup against. You can dedup the sentstream (which B will then expand when receiving it), but that's it.

Case 1 & 2 will both allow you maximum benefit, as B has a DDT for thereceiving pool already, and you can compare the to-be-sent stream tothis receiving DDT, and do dedup. Case 1 will be faster, since Aalready has a pool DDT computed for the to-be-sent stream, while Case 2will have to compute a DDT solely for that stream.

You simply *must* talk to the receiving machine and pass back a DDT ifyou want to have any practical chance of doing this kind of dedup'd stream.

Note, that if the checksum type used is different in on host A vs hostB, you can't do any form of extra dedup this way. I'd have to check ifdifferent compression types would cause problems, as I can't recall ifthat affects the actual checksum being stored (I think it does, as I'mpretty sure ZFS stores the post-compressed block checksum, but I'm not100% sure). All of these problems would easily be detectable, and aproperly written application would be able to report back to the usersuch conditions (and should then fall back to the standard 'zfs send -D'behavior).




--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
zfs-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/zfs-code

Re: [zfs-code] Improving zfs send dedup.

Reply via email to