Erik Trimble wrote:
roland wrote:
hello !
i think of using zfs for backup purpose of large binary data files
(i.e. vmware vm`s, oracle database) and want to rsync them in regular
interval from other systems to one central zfs system with compression
on.
i`d like to have historical versions and thus want to make a snapshot
before each backup - i.e. rsync.
now i wonder:
if i have one large datafile on zfs, make a snapshot from that zfs fs
holding it and then overwrting that file by a newer version with
slight differences inside - what about the real disk consumption on
the zfs side ?
do i need to handle this a special way to make it space-efficient ? do
i need to use rsync --inplace ?
typically , rsync writes a complete new (temporary) file based on the
existing one and on what has change at the remote site - and then
replacing the old one by the new one via delete/rename. i assume this
will eat up my backup space very quickly, even when using snapshots
and even if only small parts of the large file are changing.
You are correct, when you write a new file, we will allocate space for that
entire new file, even if some of its blocks happen to have the same content
as blocks in the previous file.
This is one of the reasons that we implemented "zfs send". If only a few
blocks of a large file were modified on the sending side, then only those
blocks will be sent, and we will find the blocks extremely quickly (in
O(modified blocks) time; using the POSIX interfaces (as rsync does) would
take O(filesize) time). Of course, if the system you're backing up from is
not running ZFS, this does not help you.
Under ZFS, any equivalent to 'cp A B' takes up no extra space. The
metadata is updated so that B points to the blocks in A. Should anyone
begin writing to B, only the updated blocks are added on disk, with the
metadata for B now containing the proper block list to be used (some
from A, and the new blocks in B). So, in your case, you get maximum
space efficiency, where only the new blocks are stored, and the old
blocks simply are referenced.
That is not correct; what lead you to believe that? With ZFS (and UFS, EXT2,
WAFL, VxFS, etc), "cp a b" will copy the contents of the file, resulting in
two copies stored on disk.
--matt
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss