Thanks very much Mark.  That thread was very useful and it's great to 
understand what's actually going on here.

I've summarized all this in an answer back on stackoverflow:
http://stackoverflow.com/questions/6917505/inexplicable-svn-repository-size-increase-from-small-differences-to-big-files/7001562#7001562

Jon


From: Mark Phippard [mailto:markp...@gmail.com]
Sent: 09 August 2011 10:50
To: Jon Stafford
Cc: Andreas Krey; Daniel Shahaf; users@subversion.apache.org
Subject: Re: not storing diffs of binary files

On Tue, Aug 9, 2011 at 1:19 PM, Jon Stafford 
<jon.staff...@complyserve.com<mailto:jon.staff...@complyserve.com>> wrote:
Thanks everyone for the responses.  To check my understanding, and to give half 
a conclusion -

Every revision apart from the very initial revision of a file is stored as a 
delta against some previous version.  Subversion would typically probably use 
the least disk space *if* each revision was stored as a delta against the 
immediately preceding revision.  But that would be really slow for 
reconstructing the 1000th revision.  So instead, each revision is stored as a 
delta against a base of flip-rightmost-1.

This generally gives a balance between space used up and time to recreate any 
given revision of the file.

OK, how does all that sound so far?

Knowing this I was hoping I'd look again and understand what was going on with 
my repository with successive zips of my database data checked in.  Not quite...

I can see that the deltas aren't necessarily against the immediately preceding 
version - in fact with 15 revisions it's satisfying/reassuring to see them 
doing exactly as billed in the skip deltas document.

The bit I still can't reconcile is the difference in the delta size between 
xdelta standalone (small) and the delta stored by subversion (large - almost 
the size of the file itself sometimes).

I've checked in various versions of my database data zipped.  Some with a month 
of changes between each revision, some with the most trivial change possible 
between revisions.

For a trivial change:
xdelta delta size = 300KB, subversion db\revs file size = 300KB

For a month of database edits:
xdelta delta size = 3 or 4MB, subversion db\revs file size = 50MB

Obviously for fair comparison I'm only picking on revisions where subversion 
did delta against the immediately preceding revision.

So does subversion (version 1.6.11) use an old, not quite so good, xdelta?  Or 
is it just that it applies xdelta after its already done some format 
manipulation on the file, which then makes it less delta-able?  Or something 
else...

I do not understand it enough to give a lot of details so let me point you to 
an old thread on the list:

http://svn.haxx.se/dev/archive-2007-03/1277.shtml

The xdelta algorithm has a configurable window that determines the amount of 
memory used.  The more memory you give it, the smaller the delta it can often 
produce.  It is likely the xdelta binary you are using uses a larger window 
than Subversion.

--
Thanks

Mark Phippard
http://markphip.blogspot.com/

Reply via email to