On Thu, Feb 26, 2009 at 5:58 PM, Barry Wark <[email protected]> wrote: > On Thu, Feb 26, 2009 at 11:18 AM, Damien Katz <[email protected]> wrote: >> >> On Feb 26, 2009, at 1:55 PM, Jan Lehnardt wrote: >> >>> >>> On 26 Feb 2009, at 19:49, Barry Wark wrote: >>>>> >>>>> or ascending... >>>> >>>> As an asside, why is it that sequential document ids would produce a >>>> significant performance boost? I suspect the answer is something >>>> rather fundamental to CouchDB's design, and I'd like to try to grok >>>> it. >>> >>> b-trees inner-nodes can get cached better if inserts basically always >>> use the same path. >>> >> >> What he said. It's pretty standard btree stuff, most, if not all the major >> rdbms have similar issues with primary keys. >> >> Also, he Ids don't need to be sequential (1,2,3,4...), just ordered >> (1,5,19,22...). And they don't need to sort higher or lower than all the >> other ids, so long as they are clustered together. The each btree nodes >> that have to be loaded that isn't in cache is expensive. The more the keys >> have to be inserted into random places in the btree, the worse the caching >> behavior. Right now, with the crypto random UUIDs we generate, it's >> basically the worst case scenario for doc inserts. > > > Ah, of course. Thanks everyone! > > *thinking back to my CS undergrad days* (I don't come across this > stuff much in my work these days...) > Inserting with sequential IDs would lead to btree balance issues, > though, wouldn't it? In that sense the crypto random UUIDs would seem > to be the ideal for read performance in the future as opposed to > insert performance. Am I on the right track? > > Thanks for the info, > Barry >> >> -Damien >> >
Not sure i follow what you mean by balance issues. Seeing as its a btree it can't be unbalanced because if it were it wouldn't be a btree. But then our definitions of balance might be different. :D The only performance difference between reading random and the sequential document ids I can think of would favor the sequential side when you're reading documents in roughly the same order as their ids so we get FS cache hits. HTH, Paul Davis
