Erik Trimble wrote:
roland wrote:
hello !

i think of using zfs for backup purpose of large binary data files (i.e. vmware vm`s, oracle database) and want to rsync them in regular interval from other systems to one central zfs system with compression on.

i`d like to have historical versions and thus want to make a snapshot before each backup - i.e. rsync.

now i wonder:
if i have one large datafile on zfs, make a snapshot from that zfs fs holding it and then overwrting that file by a newer version with slight differences inside - what about the real disk consumption on the zfs side ? do i need to handle this a special way to make it space-efficient ? do i need to use rsync --inplace ?

typically , rsync writes a complete new (temporary) file based on the existing one and on what has change at the remote site - and then replacing the old one by the new one via delete/rename. i assume this will eat up my backup space very quickly, even when using snapshots and even if only small parts of the large file are changing.

You are correct, when you write a new file, we will allocate space for that entire new file, even if some of its blocks happen to have the same content as blocks in the previous file.

This is one of the reasons that we implemented "zfs send". If only a few blocks of a large file were modified on the sending side, then only those blocks will be sent, and we will find the blocks extremely quickly (in O(modified blocks) time; using the POSIX interfaces (as rsync does) would take O(filesize) time). Of course, if the system you're backing up from is not running ZFS, this does not help you.

Under ZFS, any equivalent to 'cp A B' takes up no extra space. The metadata is updated so that B points to the blocks in A. Should anyone begin writing to B, only the updated blocks are added on disk, with the metadata for B now containing the proper block list to be used (some from A, and the new blocks in B). So, in your case, you get maximum space efficiency, where only the new blocks are stored, and the old blocks simply are referenced.

That is not correct; what lead you to believe that? With ZFS (and UFS, EXT2, WAFL, VxFS, etc), "cp a b" will copy the contents of the file, resulting in two copies stored on disk.

--matt
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to