I’m importing a large data set with a lot of rows — an entire Wikipedia dump, 
about 60GB, one article per row — into a brand new SQLite database in WAL mode. 
What’s the fastest way to import it?

I started with one big transaction, but noted that (of course) the WAL file was 
growing rapidly while the main database file stayed tiny. I figured this would 
become inefficient, so I stopped the run and adjusted my code to commit and 
re-open a transaction every 10,000 rows.

With that, the import started quickly, but as time went on the commits were 
taking longer and longer, so the process was spending most of its time 
committing. (I wasn’t able to finish the job, as it ran into an unrelated fatal 
error in my code about ⅔ of the way through.)

Would it have been faster to use a single transaction? Even if the commit at 
the end is epic-length, it wouldn’t be rewriting the b-tree nodes over and over 
again. If so, would periodic WAL checkpoints help?

—Jens
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to