Also, if you were to adjust the scale I believe the two graphs would have a similar slope for the first several updates.
On Mon, May 31, 2010 at 16:39, Randall Leeds <[email protected]> wrote: > Hi Konrad, > > I'll take a stab at this and if I'm wrong hopefully someone will correct me. > > The on disk BTree is written in an append only fashion rather than > modified in place. Append only updates mean that every inner node of > the BTree along the path from the root to the new update has to be > re-written each time. Initially, when there are very few inner nodes, > the amount of disk space used for each new update is relatively > constant. Since the tree has a large fan-out the depth does not change > much at first. In the second graph you are seeing a tree that has a > depth of 1 (just the root) being written over an over again to disk > and the corresponding expected linear growth results. However, when > you have a higher revision limit the old revisions are kept in the > tree and the tree grows taller and fatter with each update. As you > make more updates more inner nodes need to be rewritten for each > update which causes the growth to accelerate. Eventually, you hit the > revision limit and old revisions are discarded, the tree stops getting > any taller or fatter and the number of inner nodes that need to be > changed for each update remains relatively constant (but greater than > in the case of rev_limit=1). I suspect that the first graph becomes > linear above 1000 updates and does not continue to accelerate. > > Cheers, > Randall > > 2010/5/31 Konrad Förstner <[email protected]>: >> Hi, >> >> I have an issue with CouchDB and posted the question on stackoverflow >> [1] but did not get any helpful answer. I would be great if somebody >> could answer this here or a stackoverflow (there I also had a problem >> with the compaction which was just a timing issue as explaint in the >> comment) >> >> I was wondering why my CouchDB database was growing to fast so I wrote >> a little test script [2]. This script changes an attributed of a CouchDB >> document 1200 times and takes the size of the database after each >> change. After performing these 1200 writing steps the database is >> doing a compaction step and the db size is measured again. In the end >> the script plots the databases size against the revision numbers. The >> benchmarking is run twice: >> >> * The first time the default number of document revision (=1000) is used >> (_revs_limit). >> >> * The second time the number of document revisions is set to 1. >> >> The first run produces the following plot >> http://www.flickr.com/photos/konradfoerstner/4656011444/ >> >> The second run produces this plot second run >> http://www.flickr.com/photos/konradfoerstner/4656012732/ >> >> For me this is quite an unexpected behavior. In the first run I would >> have expected a linear growth as every change produces a new >> revision. When the 1000 revisions are reached the size value should be >> constant as the older revisions are discarded. >> >> In the second run the first revision should result in certain database >> size that is then keeps during the following writing steps as every >> new revision leads to the deletion of the previous one. >> >> I could understand if there is a little bit of overhead needed to >> manage the changes but this growth behavior seems weird to me. Can >> anybody explain this phenomenon or correct my assumptions that lead to >> the wrong expectations? >> >> Many thanks in advance >> >> Konrad >> >> [1] >> http://stackoverflow.com/questions/2921151/why-do-my-couchdb-databases-grow-so-fast >> [2] http://github.com/konrad/couchdb-benchmarking >> >> >> >> >> >> >
