On Fri, Jun 18, 2010 at 07:01:25PM -0700, Scott Hess scratched on the wall:
> On Fri, Jun 18, 2010 at 5:24 PM, Eric Smith <eas....@gmail.com> wrote:
> > Jay A.  Kreibich wrote:
> >> Yes.  Hence the "and this is the important part" comment.  Most of
> >> the time when people are building billion-row files, they're building
> >> a new DB by importing a static source of data.  If things go wrong,
> >> you just throw out the database and try again.
> >
> > That's kinda like doing it all in a single big transaction, which I
> > wanted to avoid.  :)
> >
> > But my take-away from this is conversation is generally "you're SOL":
> > we are backed by a tree, so we have to update existing nodes to point
> > to the new ones, so we have to touch existing pages on INSERT.  Is that
> > about right?
> 
> The problem is that if your incoming data is uniformly unsorted
> (essentially random), then for each transaction you'll find yourself
> updating approximately every page of the database.

  In a linear format, half.  You only move every page if you insert at
  the beginning (e.g. a perfect reverse-sort).  The number of elements
  changes as well, as you add them.  So when inserting N elements,
  each insert requires, on average, 25% of N elements to be updated.

  But databases don't work that way.  The B-Trees are specifically
  designed to address the needs of a paged database, and require minimal
  updates to keep the tree balanced.

  In this case, the original primary key was being inserted in-order,
  and the ROWID values, being generated, are clearly in-order.  It was
  already best case.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to