I am creating an SQLite database via Python, and trying to understand
some performance issues.

This application does 3.8M inserts.   Most inserts are to a main
database that ends up being around 293MB.  Around 740K of the inserts
with larger data records are to 25 related databases of around 600MB
each.  The main database and 1 subdatabase are active together, then
the subdatabase is closed when it gets to around 600MB and a new one
is opened.  This is an archiving application.

As a performance baseline, I changed the application to just write out
plain sequential text files, and it takes around 62 minutes to
complete.  This includes all the Python overhead and the raw hard
drive overhead to write out the same amount of data in a "best case"
scenario:

  Time: 3754.16 seconds
  Files: 708120 Bytes: 31565490710

  real    62m34.335s
  user    53m55.492s
  sys     2m58.305s

If I use SQLite and commit every 5 seconds, I get this performance:

  Time: 11383.95 seconds
  Files: 708120 Bytes: 31565490710

  real    189m45.061s
  user    55m58.638s
  sys     4m46.528s

If I commit every 30 seconds, I get this:

  Time: 13021.34 seconds
  Files: 708120 Bytes: 31565490710

  real    217m2.078s
  user    56m9.647s
  sys     4m59.850s

I believe fsync/fdatasync are significant performance issues, so I
thought that it should improve performance if I start a transaction,
do ALL of the inserts, then commit.  Since I'm starting with an empty
database, the journal should stay mostly empty, the database will be
built without any syncs, and when I commit, the small journal will be
deleted.  I also added pragma synchronous=off.  (All of these test are
run with pragma lockmode exclusive.)  But when I tried this, the
performance was slowest of all:

  Time: 15356.42 seconds
  Files: 708120 Bytes: 31565490710

  real    255m57.523s
  user    55m51.215s
  sys     4m22.173s

Can anyone shed light on why building a database inside a single
transaction would be slower than periodically commiting?

Thanks,
Jim
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to