On Fri, Jun 18, 2010 at 03:37:19PM -0400, Eric Smith scratched on the wall: > I have no user-defined indices in my db, and want to do a largish number > of inserts (a few billion). I COMMIT every 10 million INSERTs or so -- > so if my app dies (or I want to kill it) I don't have to start over. > > Row sizes are small, a couple hundred bytes across 15ish columns. > The primary key on the table is a pair of integers. > > After a few BEGIN/INSERT/COMMIT cycles the journal file grows > mid-transaction to a pretty big size, e.g. around 1Gb against a 14Gb db > file, meaning (right?) that sqlite wrote to ~1Gb of the existing > pages during that round of INSERTs.
I'd guess this is B-Tree re-balancing. It happens with both indexes and the tables themselves. It will be worse if the file has an INTEGER PRIMARY KEY that you're providing, and isn't pre-sorted. > This means the time spent doing a batch of INSERTs goes up as the number > of existing rows, which is a big frowny-face.* Perhaps, but that's not unexpected. > I'd really love to avoid writing a big journal file. And I'd love to > avoid doing a billion-row insert in one transaction. So turn journaling off. If you're building a database from an external source and (this is the important part) can re-build the database if something goes wrong, just turn off journaling and syncing for the duration of the data import. It would also help to bump the cache up... if you're on a nice desktop with a few gigs of RAM, bump it up 10x to 100x. There are PRAGMAs to do all this. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

