Hi all,
I've got a reasonably large dataset to load via sql insert statements
(3.3M inserts spread out over 4 or 5 tables)
into an empty  database. The Db is created using sqlalchemy. I then
drop all indexes using sql statements.
The inserts are performed by calling sqlite3 my.db < my_sql_text_file-xx-of-20
for each of the 20 text files (with each file containing ~120K statements).
In each file I surround every 500 statements with
BEGIN;
COMMIT;
so as to batch my statements.
Even with dropped indexes and the BEGIN COMMIT blocking, things start
fast but get progressively sllooooowww:

I0321 16:23:55.884863  3924 gordon_generate_database.py:120] Running
sqlite3 /tmp/test.db <
/tmp/gordon_generate_databaseu0g1_c/gordon_sql-00000-of-00020.
I0321 16:24:01.305967  3924 gordon_generate_database.py:123] Ran in
5.42035007477 sec
I0321 16:24:01.306075  3924 gordon_generate_database.py:120] Running
sqlite3 /tmp/test.db <
/tmp/gordon_generate_databaseu0g1_c/gordon_sql-00001-of-00020.
I0321 16:24:09.609338  3924 gordon_generate_database.py:123] Ran in
8.3025021553 sec
I0321 16:24:09.609455  3924 gordon_generate_database.py:120] Running
sqlite3 /tmp/test.db <
/tmp/gordon_generate_databaseu0g1_c/gordon_sql-00002-of-00020.
I0321 16:24:43.088789  3924 gordon_generate_database.py:123] Ran in
33.480257988 sec
I0321 16:24:43.088905  3924 gordon_generate_database.py:120] Running
sqlite3 /tmp/test.db <
/tmp/gordon_generate_databaseu0g1_c/gordon_sql-00003-of-00020.
I0321 16:25:20.215434  3924 gordon_generate_database.py:123] Ran in
37.126128912 sec
I0321 16:25:20.215564  3924 gordon_generate_database.py:120] Running
sqlite3 /tmp/test.db <
/tmp/gordon_generate_databaseu0g1_c/gordon_sql-00004-of-00020.
I0321 16:30:39.857538  3924 gordon_generate_database.py:123] Ran in
319.643631935 sec
I0321 16:30:39.857649  3924 gordon_generate_database.py:120] Running
sqlite3 /tmp/test.db <
/tmp/gordon_generate_databaseu0g1_c/gordon_sql-00005-of-00020.
I0321 16:38:17.367182  3924 gordon_generate_database.py:123] Ran in
457.509384871 sec
I0321 16:38:17.367294  3924 gordon_generate_database.py:120] Running
sqlite3 /tmp/test.db <
/tmp/gordon_generate_databaseu0g1_c/gordon_sql-00006-of-00020.
I0321 16:48:04.492243  3924 gordon_generate_database.py:123] Ran in
587.122864962 sec
I0321 16:48:04.492315  3924 gordon_generate_database.py:120] Running
sqlite3 /tmp/test.db <
/tmp/gordon_generate_databaseu0g1_c/gordon_sql-00007-of-00020.
I0321 16:59:39.222325  3924 gordon_generate_database.py:123] Ran in
694.731889009 sec
I0321 16:59:39.222442  3924 gordon_generate_database.py:120] Running
sqlite3 /tmp/test.db <
/tmp/gordon_generate_databaseu0g1_c/gordon_sql-00008-of-00020.
I0321 17:12:18.610039  3924 gordon_generate_database.py:123] Ran in
759.385544062 sec
I0321 17:12:18.610147  3924 gordon_generate_database.py:120] Running
sqlite3 /tmp/test.db <
/tmp/gordon_generate_databaseu0g1_c/gordon_sql-00009-of-00020.
I0321 17:26:25.676740  3924 gordon_generate_database.py:123] Ran in
847.068361998 sec
I0321 17:26:25.676851  3924 gordon_generate_database.py:120] Running
sqlite3 /tmp/test.db <
/tmp/gordon_generate_databaseu0g1_c/gordon_sql-00010-of-00020.
I0321 17:40:35.345407  3924 gordon_generate_database.py:123] Ran in
849.668410063
I0321 17:40:35.345524  3924 gordon_generate_database.py:120] Running
sqlite3 /tmp/test.db <
/tmp/gordon_generate_databaseu0g1_c/gordon_sql-00011-of-00020.

I believe I could do this faster with .csv but I'd like to avoid the
extra code to go from SQL to CSV.

Any hints?

Doug Eck
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to