On Fri, Jun 18, 2010 at 03:37:19PM -0400, Eric Smith scratched on the wall:
> I have no user-defined indices in my db, and want to do a largish number 
> of inserts (a few billion).  I COMMIT every 10 million INSERTs or so -- 
> so if my app dies (or I want to kill it) I don't have to start over.  
> 
> Row sizes are small, a couple hundred bytes across 15ish columns.  
> The primary key on the table is a pair of integers.
> 
> After a few BEGIN/INSERT/COMMIT cycles the journal file grows 
> mid-transaction to a pretty big size, e.g.  around 1Gb against a 14Gb db 
> file, meaning (right?) that sqlite wrote to ~1Gb of the existing 
> pages during that round of INSERTs.  

  I'd guess this is B-Tree re-balancing.  It happens with both indexes
  and the tables themselves.  It will be worse if the file has an
  INTEGER PRIMARY KEY that you're providing, and isn't pre-sorted.

> This means the time spent doing a batch of INSERTs goes up as the number 
> of existing rows, which is a big frowny-face.* 

  Perhaps, but that's not unexpected.

> I'd really love to avoid writing a big journal file.  And I'd love to 
> avoid doing a billion-row insert in one transaction.

  So turn journaling off.  If you're building a database from an
  external source and (this is the important part) can re-build the
  database if something goes wrong, just turn off journaling and
  syncing for the duration of the data import.  It would also help to
  bump the cache up... if you're on a nice desktop with a few gigs of
  RAM, bump it up 10x to 100x.  There are PRAGMAs to do all this.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to