On Fri, Jun 18, 2010 at 07:01:25PM -0700, Scott Hess scratched on the wall: > On Fri, Jun 18, 2010 at 5:24 PM, Eric Smith <eas....@gmail.com> wrote: > > Jay A. Kreibich wrote: > >> Yes. Hence the "and this is the important part" comment. Most of > >> the time when people are building billion-row files, they're building > >> a new DB by importing a static source of data. If things go wrong, > >> you just throw out the database and try again. > > > > That's kinda like doing it all in a single big transaction, which I > > wanted to avoid. :) > > > > But my take-away from this is conversation is generally "you're SOL": > > we are backed by a tree, so we have to update existing nodes to point > > to the new ones, so we have to touch existing pages on INSERT. Is that > > about right? > > The problem is that if your incoming data is uniformly unsorted > (essentially random), then for each transaction you'll find yourself > updating approximately every page of the database.
In a linear format, half. You only move every page if you insert at the beginning (e.g. a perfect reverse-sort). The number of elements changes as well, as you add them. So when inserting N elements, each insert requires, on average, 25% of N elements to be updated. But databases don't work that way. The B-Trees are specifically designed to address the needs of a paged database, and require minimal updates to keep the tree balanced. In this case, the original primary key was being inserted in-order, and the ROWID values, being generated, are clearly in-order. It was already best case. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users