James, I didn?t say it explicitly but we did understand where the speed up came from. As you say, changing the collation was the key element in this. This speedup also meant that the a dirty, nasty, vile hack (and thats the nicest thing we can say about it and its our code) so-called optimisation we put in the Perl script to try and fix things stopped being effective.
The way the optimisation worked was to simply remove the number of SELECT calls made. This optimisation was done before we updated the COLLATION on the index. After the COLLATION was added, our optimisation no longer had the same effect. The reason was that the number of searches of the binary tree had been dramatically reduced, this mean our ?optimisation' was no longer worth it as the ?optimisation' now only saved a few searches as opposed to 40K of them. The code worked but we were now only saying 13 or so searches as opposed to 40,000. We?ve pulled the ?optimisation? out as it made the code untidy and it was a dirty hack. Thanks for helping, Rob > On 5 Jul 2015, at 21:36, James K. Lowden <jklowden at schemamania.org> wrote: > > On Fri, 3 Jul 2015 11:35:21 +0100 > Rob Willett <rob.sqlite at robertwillett.com> wrote: > >> It basically has taken no time to run 10,551 selects. This implies >> that we had issues on the database before with either fragmentation >> or indexes being poorly setup. > > Perhaps you said so and I missed it, but to me it's clear that almost > all the improvement comes from using RAG_Idx1. Because it indexes the > Text column, you're now doing a binary search instead of a linear > scan. > > By changing the collation, you went from scanning every row to > searching a binary tree. That took you from O(n) to O(log2 n). If > each row in your 16 GB database is 100 bytes, allowing for overhead you > might have 80 million rows? To satisfy your query, on average that > would be an improvement from 40,000,000 I/O operations to 13. > > When something that used to take 122 seconds starts going a million > times faster, it begins to look pretty much instantaneous. :-) > > --jkl > _______________________________________________ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users