Hi everyone - I've been using SQLite through Python (3.7) for a scientific project. The data comes out to 10 billion rows of an 8 byte signed integer (~200-300 gb pre-index), and while insertion takes ~6 hours, indexing takes 8 hours by itself. Indexing also seems to slow as it is built. Does this sound reasonable for this amount of data? I'm running this on an SSD to try and help the IO as best I can.
While I've been getting a lot of mileage out of this, I was wondering if you had any tips on getting it to run faster. I've tried various PRAGMA modifications to try and help the insert, but I'm wondering if there's anything I can do to appreciably speed any of this up. For my purposes, I don't need any sort of safeguards for power loss etc. - I've already turned the journal and synchronous to off. This is a database that will be built one time and accessed on occasion, and query speed is fine as-is. The only things I can think about are perhaps partitioning the table and running the indexing in parallel on the partitions, but this seems clunky, especially with Python's 10-database ATTACH limit. The parameter for modifying this is non-obvious in the Python package, and since I haven't done the experiment, I don't know to what extent that would help. Thank you for any insight into this. The database is working fine as-is, but I am trying to see for the sake of convenience and education if I can get it to insert and/or index faster. Cheers, -AJ _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users