I was looking at the performance of using rsync to copy some large files
which change only a little between each run (database files).  I take a
snapshot after every successful run of rsync, so when using rsync
--inplace, only changed portions of the file will occupy new disk space.

Unfortunately, performance wasn't too good, the source server in
question simply didn't have much CPU to perform the rsync delta
algorithm, and in addition it creates read I/O load on the destination
server.  So I had to switch it off and transfer the whole file instead.
In this particular case, that means I need 120 GB to store each run
rather than 10, but that's the way it goes.

If I had enabled deduplication, this would be a moot point, dedup would
take care of it for me.  Judging from early reports my server
will probably not have the required oomph to handle it, so I'm holding
off until I get to replace it with a server with more RAM and CPU.

But it occured to me that this is a special case which could be
beneficial in many cases -- if the filesystem uses secure checksums, it
could check the existing block pointer and see if the replaced data
matches.  (Due to the (infinitesimal) potential for hash collisions this
should be configurable the same way it is for dedup.)  In essence,
rsync's writes would become no-ops, and very little CPU would be wasted
on either side of the pipe.

Even in the absence of snapshots, this would leave the filesystem less
fragmented, since the COW is avoided.  This would be a win-win if the
ZFS pipeline can communicate the correct information between layers.

Are there any ZFS hackers who can comment on the feasibility of this
idea?

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to