On 4 Aug 2011, at 3:07am, Seth Price wrote:

> The full DB has around 17 million rows in it, and for each row I am trying to 
> count all rows with similar characteristics, divided into different 
> classifications. I was already able to improve speed 10x by using the R*tree 
> extension to narrow my search. My queries look like this:
> 
> SELECT class, COUNT(*) AS count FROM data_r JOIN data USING (rowNum) WHERE 57 
> < col0min AND col0max < 61 AND 52 < col1min AND col1max < 56 AND 66 < col2min 
> AND col2max < 70 AND 88 < col3min AND col3max < 92 AND 133 < col4min AND 
> col4max < 137 AND 57 < col0 AND col0 < 61 AND 52 < col1 AND col1 < 56 AND 66 
> < col2 AND col2 < 70 AND 88 < col3 AND col3 < 92 AND 133 < col4 AND col4 < 
> 137 AND 81 < col5 AND col5 < 85 GROUP BY class;

Forget multithreading.  You have a bottleneck of accessing the database file 
and the threads will just block each other.

There're two ways to make that SELECT run quickly.  One is to create one index 
on all those columns.

The other is to do the SELECT in parts, by disqualifying records you don't 
want.  Create another column called, perhaps, notThisOne.  Then do a succession 
of disqualifying checks:

UPDATE data SET notThisOne=1 WHERE col0min<=57 OR col0min>=61;

etc.  Then all the rows you didn't disqualify are the ones you want.  
Obviously, separate indices on each of those columns will speed up the 
disqualifying check.

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to