On Thu, 28 Jan 2016 11:54:14 +0200, you wrote: > I have a svn 1.9 repository, created with svnsync, that has ~150000 > revisions and size about 45 GB.
300kB/rev is quite large, like >1 MB of changes before compression - on average. Are these office documents, large xml / html files or simply many files per commit? > Due to some issues in svn-all-fast-export I > wanted to have svn 1.8 version repository so I downgraded it by doing > svnadmin (v 1.9) dump /svnadmin (v 1.8) load cycle. I was surprised that > the size of v 1.8 repository is "only" 37.5 GB > I tried to compare content of db\revs folder: some files are bigger in 1.8 > repo, some in 1.9 repo. For the record: you already said elsewhere in this thread that you used 1.8 to create the 1.8 repo and 1.9 for the 1.9. I also assume standard settings as in "no fsfs.conf tweaks". > Now I'm wondering: > 1. Is such size increase expected for 1.9 repository? I read that 1.9 was > aimed at speed optimizations, but 20% size increase compared to 1.8 sounds > pretty big... A 20% plus is definitely unexpected, +/-5% being a more typical number. It is not entirely implausible, though. Here is how 1.9 differs from 1.8: * 1.9 adds "index" data to the rev / pack files, allowing for slightly shorter data elsewhere. The typical net effect is +5% in size. * 1.9 adds some padding at the end of each block (64k boundary by default) to avoid parsed data crossing block boundaries. Net effect typ. +1%. * 1.9 will use skip-deltas between shards where 1.8 would still use "linear" deltification. Net effect typ. +2% * 1.9 will store deltas against very small files or directories. Net effect typ. <1% * 1.9 now supports representation sharing for node properties. Net effect typ. 0..-5%. * 1.9 now supports representation sharing when committing the same data to multiple paths / branches within the same revision. Net effect typ. 0..-5%. The theme behind these changes is I/O reduction: Maximize data sharing, enable reordering of repo data upon pack and avoid "pointer chasing" for small pieces of information. > 2. Or is my "dumped and reloaded 1.8" broken somehow? How could I verify? > (dump revisions one by one and compare? Or is there any better way?) There is a simple way to compare the "content size" your repositories. Run the 1.9 svnfsfs tool on both: svnfsfs stats -M 1000 /path/to/repo > /some/output/path It basically reads the whole repository, groups and aggregates the item sizes and produces a long report. Number of changes and node revision should be more or less (exactly?) the same. If they are, you'll be good. "Representation" size is where the numbers will differ. Looking at the differences in detail, you should be able to pin down one or two file extensions that account for most of the increase. It would be interesting to learn what is special about them ... -- Stefan^2.