Hello, I'm not quite sure how to properly phrase the subject
as a query term, so if this has been answered, please forgive
the redundancy and quietly point me to where this gets addressed.

We are using svn at work to hold customer 'vault' data [various bits
of information for each customer].  It has been a huge success -- to
the point where we have over 1,000 customers using vaults.  The checkins
are automated, and we have amassed over 100,000 revisions thus far.

User directories are created as /Ab/username [where Ab is a 2-character
hash via a known (balanced) algorithm to make location of username files more
machine-efficient].  So we have about 1,200 of these guys, with some hashes
obviously being re-used, no big deal.

The problem is that, even on miniscule changes, we are finding the
db/rev/<shard>/<revno> files to be disproportionately large; for an
addition or change of a file that is about 1k-4k, the rev files are
at 100K each.  At lower revisions, we noticed that the rev files are
4k but have been increasing in size with each shard that gets added,
usually to the tune of 1k/shard.  With so many revisions being checked
in at a rapid rate, we found ourselves having to take production off
line for a couple of minutes while we migrated the repository in question
to a larger filesystem due to the threat of the filesystem filling
up.

The upshot of this is:  Why does a minimal delta create such a large
delta file?  100k for a small change?  What's going on and how can we
mitigate this?
-- 
                --*greywolf; 

Reply via email to