Hi everyone -

I've been using SQLite through Python (3.7) for a scientific project. The
data comes out to 10 billion rows of an 8 byte signed integer (~200-300 gb
pre-index), and while insertion takes ~6 hours, indexing takes 8 hours by
itself. Indexing also seems to slow as it is built. Does this sound
reasonable for this amount of data? I'm running this on an SSD to try and
help the IO as best I can.

While I've been getting a lot of mileage out of this, I was wondering if
you had any tips on getting it to run faster. I've tried various PRAGMA
modifications to try and help the insert, but I'm wondering if there's
anything I can do to appreciably speed any of this up.

For my purposes, I don't need any sort of safeguards for power loss etc. -
I've already turned the journal and synchronous to off. This is a database
that will be built one time and accessed on occasion, and query speed is
fine as-is. The only things I can think about are perhaps partitioning the
table and running the indexing in parallel on the partitions, but this
seems clunky, especially with Python's 10-database ATTACH limit. The
parameter for modifying this is non-obvious in the Python package, and
since I haven't done the experiment, I don't know to what extent that would
help.

Thank you for any insight into this. The database is working fine as-is,
but I am trying to see for the sake of convenience and education if I can
get it to insert and/or index faster.

Cheers,

-AJ
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to