On 6/15/2010 2:06 AM, Pawel Jakub Dawidek wrote:
On Thu, Jun 10, 2010 at 04:32:06PM -0700, Erik Trimble wrote:
On 6/10/2010 1:21 PM, Pawel Jakub Dawidek wrote:
If we send incremental stream we can be sure that up to the previous
snapshot we have the same data on the other side. I'm aware it doesn't
mean the data has exactly the same checksum (eg. it can be compressed
with different algorithm). But in theory, are we able to figure out that
the given block we try to send is already part of the dataset's previous
snapshot? I'm fine with discarding incremental stream on the remote site
if it uses different compression algorithm or simply deduplication is
turned off (bascially when there is no block matching stored checksum).
But if I have identical configurations on both ends I'd like not to send
the same block multiple times in multiple incremental streams
No, you can't be sure.  You can *assume* you sent the proper incremental
stream to the receiving host, but what if you didn't? Or it got deleted?
etc.
So for this to work, the following conditions have to be meet:

1. Pools configuration on both sides have to be identical - the same
    checksum algorithms, the same compression algorithms, etc.

2. No snapshots can be removed on remote site as we can lose block by
    doing this.

3. We have to have all datasets on the remote site, as it would be too
    expensive to find if the given block which exists in DDT is referenced
    by the given dataset. If I want to send a block and it exists in DDT
    with refcount>  1, I've no way to tell which datasets are referencing
    it besides from scanning all datasets (or at least my dataset).

If those conditions are meet, I can safely send checksums of blocks with
brith date from before the snapshot I'm sending. Am I right?
Welll......

I suppose so. In theory, I see nothing wrong with what you are saying. But that's a *whole* lot of very iffy preconditions, and it's really not at all practical. In fact, I'd go so far to say that it's *highly* unlikely you can meet them in most real-world cases.



Realistically, you've got four scenarios for sending a incremental from sender A to receiver B machines:

1.    A's pool has dedup on, B's also is on.
2.    A's pool does NOT have dedup on , B's pool does.
3.    A's pool does NOT have dedup on, neither does B.
4.    A's pool has dedup on, B's pool doesn't have it on.


I'm assuming that your goal is to minimize the amount of data being send across the wire from host A to B.

Cases 3 & 4 mean that you can't do any better than 'zfs send -D | zfs receive', as B has nothing to dedup against. You can dedup the sent stream (which B will then expand when receiving it), but that's it.

Case 1 & 2 will both allow you maximum benefit, as B has a DDT for the receiving pool already, and you can compare the to-be-sent stream to this receiving DDT, and do dedup. Case 1 will be faster, since A already has a pool DDT computed for the to-be-sent stream, while Case 2 will have to compute a DDT solely for that stream.


You simply *must* talk to the receiving machine and pass back a DDT if you want to have any practical chance of doing this kind of dedup'd stream.

Note, that if the checksum type used is different in on host A vs host B, you can't do any form of extra dedup this way. I'd have to check if different compression types would cause problems, as I can't recall if that affects the actual checksum being stored (I think it does, as I'm pretty sure ZFS stores the post-compressed block checksum, but I'm not 100% sure). All of these problems would easily be detectable, and a properly written application would be able to report back to the user such conditions (and should then fall back to the standard 'zfs send -D' behavior).



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
zfs-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/zfs-code

Reply via email to