Have you increased the paltry default cache size?  (PRAGMA CACHE_SIZE)  The 
bigger the better, especially since you are sorting and balancing large 
B-Tree's.  The more this can be done in memory without having to spill to slow 
disk (or disk cache) the faster it will go ... (the best way to optimize I/O is 
to not do it)

You can also enable multithreaded merge sorts.  It has been my experience 
though that enabling multithreaded merge sorts slows things down rather than 
speeds them up however YMMV.  You might just have a database size that is in 
the realm of scale the multithreaded sorter was intended to address.

https://sqlite.org/pragma.html#pragma_threads


---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[email protected]] On Behalf Of AJ M
>Sent: Wednesday, 28 November, 2018 10:03
>To: [email protected]
>Subject: [sqlite] Boosting insert and indexing performance for 10
>billion rows (?)
>
>Hi everyone -
>
>I've been using SQLite through Python (3.7) for a scientific project.
>The
>data comes out to 10 billion rows of an 8 byte signed integer (~200-
>300 gb
>pre-index), and while insertion takes ~6 hours, indexing takes 8
>hours by
>itself. Indexing also seems to slow as it is built. Does this sound
>reasonable for this amount of data? I'm running this on an SSD to try
>and
>help the IO as best I can.
>
>While I've been getting a lot of mileage out of this, I was wondering
>if
>you had any tips on getting it to run faster. I've tried various
>PRAGMA
>modifications to try and help the insert, but I'm wondering if
>there's
>anything I can do to appreciably speed any of this up.
>
>For my purposes, I don't need any sort of safeguards for power loss
>etc. -
>I've already turned the journal and synchronous to off. This is a
>database
>that will be built one time and accessed on occasion, and query speed
>is
>fine as-is. The only things I can think about are perhaps
>partitioning the
>table and running the indexing in parallel on the partitions, but
>this
>seems clunky, especially with Python's 10-database ATTACH limit. The
>parameter for modifying this is non-obvious in the Python package,
>and
>since I haven't done the experiment, I don't know to what extent that
>would
>help.
>
>Thank you for any insight into this. The database is working fine as-
>is,
>but I am trying to see for the sake of convenience and education if I
>can
>get it to insert and/or index faster.
>
>Cheers,
>
>-AJ
>_______________________________________________
>sqlite-users mailing list
>[email protected]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to