Re: [zfs-discuss] ZFS space efficiency when copying files from another source

2008-11-24 Thread BJ Quinn
Here's an idea - I understand that I need rsync on both sides if I want to 
minimize network traffic.  What if I don't care about that - the entire file 
can come over the network, but I specifically only want rsync to write the 
changed blocks to disk.  Does rsync offer a mode like that?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS space efficiency when copying files from another source

2008-11-24 Thread Bob Friesenhahn
On Mon, 24 Nov 2008, BJ Quinn wrote:

 Here's an idea - I understand that I need rsync on both sides if I 
 want to minimize network traffic.  What if I don't care about that - 
 the entire file can come over the network, but I specifically only 
 want rsync to write the changed blocks to disk.  Does rsync offer a 
 mode like that? -- This message posted from opensolaris.org

My understanding is that the way rsync works, if a file already 
exists, then checksums are computed for ranges of the file, and the 
data is only sent/updated if that range is determined to have changed. 
While you can likely configure rsync to send the whole file, I think 
that it does what you want by default.

This is very easy for you to test for yourself.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS space efficiency when copying files from another source

2008-11-24 Thread Erik Trimble
Bob Friesenhahn wrote:
 On Mon, 24 Nov 2008, BJ Quinn wrote:

   
 Here's an idea - I understand that I need rsync on both sides if I 
 want to minimize network traffic.  What if I don't care about that - 
 the entire file can come over the network, but I specifically only 
 want rsync to write the changed blocks to disk.  Does rsync offer a 
 mode like that? -- This message posted from opensolaris.org
 

 My understanding is that the way rsync works, if a file already 
 exists, then checksums are computed for ranges of the file, and the 
 data is only sent/updated if that range is determined to have changed. 
 While you can likely configure rsync to send the whole file, I think 
 that it does what you want by default.

 This is very easy for you to test for yourself.

 Bob
 ==
 Bob Friesenhahn
 [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

This is indeed the default mode for rsync (deltas only).  The '-W' 
option forces copy of the entire file, rather than just the changes. I 
_believe_ the standard checksum block size is 4kb, but I'm not really 
sure. (it's buried in the documentation somewhere, and it customizable 
via the -B option)

One note here for ZFS users:

On ZFS (or any other COW filesystem), rsync unfortunately does NOT do 
the Right Thing when syncing an existing file.  From ZFS's standpoint, 
the most efficient way would be merely to rewrite the changed blocks, 
thus allowing COW and snapshots to make a fully efficient storage of the 
changed file.

Unfortunately, rsync instead writes the ENTIRE file to an temp file ( 
.blahtmpfoosomethingorother ) in the same directory as the changed file, 
writes the changed blocks in that copy, then unlinks the original file 
and changes the name to the temp file to the original one.

This results in about worst-case space usage.  I have this problem with 
storing backups of mbox files (don't ask) - I have large files which 
change frequently, but less than 10% of the file actually changes 
daily.  Due to the way rsync works, ZFS snapshots don't help me on 
replicated data, so I end up having to restore the entire file every time.

I _really_ wish rsync had an option to copy in place or something like 
that, where the updates are made directly to the file, rather than a 
temp copy.




-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS space efficiency when copying files from another source

2008-11-24 Thread Bob Friesenhahn
On Mon, 24 Nov 2008, Erik Trimble wrote:

 One note here for ZFS users:

 On ZFS (or any other COW filesystem), rsync unfortunately does NOT do the 
 Right Thing when syncing an existing file.  From ZFS's standpoint, the most 
 efficient way would be merely to rewrite the changed blocks, thus allowing 
 COW and snapshots to make a fully efficient storage of the changed file.

Bummer. In that case, someone should file a bug in rsync's bug tracker 
(same one as used by Samba) to offer a better (direct overwrite) 
mode for ZFS.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS space efficiency when copying files from another source

2008-11-24 Thread Albert Chin
On Mon, Nov 24, 2008 at 08:43:18AM -0800, Erik Trimble wrote:
 I _really_ wish rsync had an option to copy in place or something like 
 that, where the updates are made directly to the file, rather than a 
 temp copy.

Isn't this what --inplace does?

-- 
albert chin ([EMAIL PROTECTED])
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS space efficiency when copying files from another source

2008-11-17 Thread BJ Quinn
We're considering using an OpenSolaris server as a backup server.  Some of the 
servers to be backed up would be Linux and Windows servers, and potentially 
Windows desktops as well.  What I had imagined was that we could copy files 
over to the ZFS-based server nightly, take a snapshot, and only the blocks that 
had changed of the files that were being copied over would be stored on disk.

What I found was that you can take a snapshot, make a small change to a large 
file on a ZFS filesystem, take another snapshot, and you'll only store a few 
blocks extra.  However, if you copy the same file of the same name from another 
source to the ZFS filesystem, it doesn't conserve any blocks.  To a certain 
extent, I understand why - when copying a file from another system (even if 
it's the same file or a slightly changed version of the same file), the 
filesystem actually does write to every block of the file, which I guess marks 
all those blocks as changed.

Is there any way to have ZFS check to realize that in fact the blocks being 
copied from another system aren't different, or that only a few of the blocks 
are different?  Perhaps there's another way to copy the file across the network 
that only copies the changed blocks.  I believe rsync can do this, but some of 
the servers in question are Windows servers and rsync/cygwin might not be an 
option.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS space efficiency when copying files from another source

2008-11-17 Thread BJ Quinn
Thank you both for your responses.  Let me see if I understand correctly - 

1.  Dedup is what I really want, but it's not implemented yet.

2.  The only other way to accomplish this sort of thing is rsync (in other 
words, don't overwrite the block in the first place if it's not different), and 
if I'm on Windows, I'll just have to go ahead and install rsync on my Windows 
boxes if I want it to work correctly.

Wmurnane, you mentioned there was a Windows-based rsync daemon.  Did you mean 
one other than the cygwin-based version?  I didn't know of any native Windows 
rsync software.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS space efficiency when copying files from another source

2008-11-17 Thread Will Murnane
On Mon, Nov 17, 2008 at 20:54, BJ Quinn [EMAIL PROTECTED] wrote:
 1.  Dedup is what I really want, but it's not implemented yet.
Yes, as I read it.  greenBytes [1] claims to have dedup on their
system; you might investigate them if you decide rsync won't work for
your application.

 2.  The only other way to accomplish this sort of thing is rsync (in other 
 words, don't overwrite the block in the first place if it's not different), 
 and if I'm on Windows, I'll just have to go ahead and install rsync on my 
 Windows boxes if I want it to work correctly.
I believe so, yes.  Other programs may have the same capability, but
rsync by any other name would smell as sweet.

 Wmurnane, you mentioned there was a Windows-based rsync daemon.  Did you mean 
 one other than the cygwin-based version?  I didn't know of any native Windows 
 rsync software.
The link I gave ([2]) contains a version of rsync which is
``self-contained''---it does use Cygwin libraries, but it includes its
own copies of the ones it needs.  It's also nicely integrated with the
Windows management tools, in that it uses a Windows service and
Windows scheduled tasks to do its job rather than re-inventing
circular rolling things everywhere.

Will

[1]: http://www.green-bytes.com/
[2]: http://www.aboutmyip.com/AboutMyXApp/DeltaCopy.jsp
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS space efficiency when copying files from another source

2008-11-17 Thread Tim
On Mon, Nov 17, 2008 at 3:33 PM, Will Murnane [EMAIL PROTECTED]wrote:

 On Mon, Nov 17, 2008 at 20:54, BJ Quinn [EMAIL PROTECTED] wrote:
  1.  Dedup is what I really want, but it's not implemented yet.
 Yes, as I read it.  greenBytes [1] claims to have dedup on their
 system; you might investigate them if you decide rsync won't work for
 your application.

  2.  The only other way to accomplish this sort of thing is rsync (in
 other words, don't overwrite the block in the first place if it's not
 different), and if I'm on Windows, I'll just have to go ahead and install
 rsync on my Windows boxes if I want it to work correctly.
 I believe so, yes.  Other programs may have the same capability, but
 rsync by any other name would smell as sweet.

  Wmurnane, you mentioned there was a Windows-based rsync daemon.  Did you
 mean one other than the cygwin-based version?  I didn't know of any native
 Windows rsync software.
 The link I gave ([2]) contains a version of rsync which is
 ``self-contained''---it does use Cygwin libraries, but it includes its
 own copies of the ones it needs.  It's also nicely integrated with the
 Windows management tools, in that it uses a Windows service and
 Windows scheduled tasks to do its job rather than re-inventing
 circular rolling things everywhere.



Rsync:
http://www.nexenta.com/corp/index.php?option=com_contenttask=viewid=64Itemid=85
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss