On Fri, Jun 18, 2010 at 08:24:47PM -0400, Eric Smith scratched on the wall:
> Jay A.  Kreibich wrote: 
> 
> > Yes.  Hence the "and this is the important part" comment.  Most of 
> > the time when people are building billion-row files, they're building 
> > a new DB by importing a static source of data.  If things go wrong, 
> > you just throw out the database and try again.  
> 
> That's kinda like doing it all in a single big transaction, which I 
> wanted to avoid.  :) 

  Sorta, kinda, only you don't have the option of rolling it back.

> But my take-away from this is conversation is generally "you're SOL": 
> we are backed by a tree, so we have to update existing nodes to point 
> to the new ones, so we have to touch existing pages on INSERT.  Is that 
> about right?  

  If the goal is to eliminate the journal file, then yes.

> It's not in any way a result of my schema? 

  It is directly dependent on the schema.  I don't remember seeing a,
  other than you stating you didn't have any user indexes.

> My primary key is a pair of integers A,B.

  Then you have an automatic unique index across these two columns.
  The additional index will cause more page churn.

> The first column in this particular use case is in the 
> range A = [0, 2million) and the second is in the range B = [0, infinity).
> We
> 
> insert records A=0->2million, B=0, then
> insert records A=0->2million, B=1,
> 
> etc.
> 
> Could this have an impact on how many pages need to be touched on
> INSERT?

  Yes, but my guess is that this is the ideal order.

> > It would also help to bump the cache up...  
> 
> That works great until the db size blows through the total RAM on the 
> system, at which point we're of course disk-bound again.

  Yes, but that's not the point.  The goal with making the cache large
  is that you can keep the majority of the BTree nodes in memory.  This
  makes balancing the tree much faster.  You're still going to be
  disk-bound, but the larger cache should help lower the total number
  if I/O operations.  
  
  If the trees don't quite fit into the cache things get *really* slow.

>  At the moment I'm only inserting about 4k rows/second. :/

  Try getting rid of the PK definition and see how much that buys you.
  It might be worth it, or it might not.

    -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to