On Mon May 31, 2010 at 06:02:56PM -0700, Randall Leeds wrote: >I don't think so, but check his script on github. Here's a link to the lines. >http://github.com/konrad/couchdb-benchmarking/blob/master/couchdb-benchmarking.rb#L84
Many thanks for your explanations, Randal and Nick. I am still astonished about the dimensions of disk space which this underlying data handling needs but I guess I have to live with it. Chees Konrad >On Mon, May 31, 2010 at 17:39, Nicholas Orr <[email protected]> wrote: >> Ahh yes, I see that now... >> >> Is that what the grey faint line on the end of those graphs represent? >> I actually didn't notice that the first time, just thought the black >> line going up was it... >> >> On Tue, Jun 1, 2010 at 10:13 AM, Randall Leeds <[email protected]> >> wrote: >>> If you look at the script the compaction is only performed at the end >>> and not on each iteration. >>> >>> On Mon, May 31, 2010 at 16:58, Nicholas Orr <[email protected]> wrote: >>>> This all makes sense except the OP says a compaction step is being >>>> performed. >>>> A compaction is essentially a copy/paste/delete/rename operation, so the on >>>> disk size should be fairly constant as the data copied is just the info >>>> required isn't it? >>>> >>>> Nick >>>> >>>> On Tue, Jun 1, 2010 at 9:39 AM, Randall Leeds >>>> <[email protected]>wrote: >>>> >>>>> Hi Konrad, >>>>> >>>>> I'll take a stab at this and if I'm wrong hopefully someone will correct >>>>> me. >>>>> >>>>> The on disk BTree is written in an append only fashion rather than >>>>> modified in place. Append only updates mean that every inner node of >>>>> the BTree along the path from the root to the new update has to be >>>>> re-written each time. Initially, when there are very few inner nodes, >>>>> the amount of disk space used for each new update is relatively >>>>> constant. Since the tree has a large fan-out the depth does not change >>>>> much at first. In the second graph you are seeing a tree that has a >>>>> depth of 1 (just the root) being written over an over again to disk >>>>> and the corresponding expected linear growth results. However, when >>>>> you have a higher revision limit the old revisions are kept in the >>>>> tree and the tree grows taller and fatter with each update. As you >>>>> make more updates more inner nodes need to be rewritten for each >>>>> update which causes the growth to accelerate. Eventually, you hit the >>>>> revision limit and old revisions are discarded, the tree stops getting >>>>> any taller or fatter and the number of inner nodes that need to be >>>>> changed for each update remains relatively constant (but greater than >>>>> in the case of rev_limit=1). I suspect that the first graph becomes >>>>> linear above 1000 updates and does not continue to accelerate. >>>>> >>>>> Cheers, >>>>> Randall >>>>> >>>>> 2010/5/31 Konrad Förstner <[email protected]>: >>>>> > Hi, >>>>> > >>>>> > I have an issue with CouchDB and posted the question on stackoverflow >>>>> > [1] but did not get any helpful answer. I would be great if somebody >>>>> > could answer this here or a stackoverflow (there I also had a problem >>>>> > with the compaction which was just a timing issue as explaint in the >>>>> > comment) >>>>> > >>>>> > I was wondering why my CouchDB database was growing to fast so I wrote >>>>> > a little test script [2]. This script changes an attributed of a CouchDB >>>>> > document 1200 times and takes the size of the database after each >>>>> > change. After performing these 1200 writing steps the database is >>>>> > doing a compaction step and the db size is measured again. In the end >>>>> > the script plots the databases size against the revision numbers. The >>>>> > benchmarking is run twice: >>>>> > >>>>> > * The first time the default number of document revision (=1000) is used >>>>> (_revs_limit). >>>>> > >>>>> > * The second time the number of document revisions is set to 1. >>>>> > >>>>> > The first run produces the following plot >>>>> > http://www.flickr.com/photos/konradfoerstner/4656011444/ >>>>> > >>>>> > The second run produces this plot second run >>>>> > http://www.flickr.com/photos/konradfoerstner/4656012732/ >>>>> > >>>>> > For me this is quite an unexpected behavior. In the first run I would >>>>> > have expected a linear growth as every change produces a new >>>>> > revision. When the 1000 revisions are reached the size value should be >>>>> > constant as the older revisions are discarded. >>>>> > >>>>> > In the second run the first revision should result in certain database >>>>> > size that is then keeps during the following writing steps as every >>>>> > new revision leads to the deletion of the previous one. >>>>> > >>>>> > I could understand if there is a little bit of overhead needed to >>>>> > manage the changes but this growth behavior seems weird to me. Can >>>>> > anybody explain this phenomenon or correct my assumptions that lead to >>>>> > the wrong expectations? >>>>> > >>>>> > Many thanks in advance >>>>> > >>>>> > Konrad >>>>> > >>>>> > [1] >>>>> http://stackoverflow.com/questions/2921151/why-do-my-couchdb-databases-grow-so-fast >>>>> > [2] http://github.com/konrad/couchdb-benchmarking >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> >>>> >>> >>
