Re: [sqlite] trying to optimize left outer join for large data set (multiple indices needed?)

Simon Slavin Tue, 09 Jun 2009 00:21:21 -0700

On 9 Jun 2009, at 7:12am, Elizabeth Purdom wrote:

> SELECT regions.id, TOTAL(reads.data) FROM  regions LEFT OUTER JOIN  
> reads ON
> ( regions.chr = reads.chr AND regions.strand=reads.strand
>     AND regions.start<=reads.start AND regions.stop>=reads.stop
> )
> GROUP BY regions.id ORDER BY regions.id
> ;
>
>  [snip]
>
> In reading about optimizing with indexing, then it seems that I would
> want an index of the relevant columns in 'reads' (chr, strand, start,
> stop); however, since I have inequality matches for both start and  
> stop,
> then a joint index like this will not help
> (http://www.sqlite.org/optoverview.html). I can make two indexes (chr,
> strand, start) and (chr,strand,stop), but I don't know how to get an
> outer join to use them both.



Although you can force SQLite to use a specific index when doing some  
things, it's not really your job. SQLite has strategies that allow it  
to look at your SELECT and pick the indices that will let it do the  
job fastest.  It's SQLite's job to figure out how to do this given  
whatever indices you provided, or none at all.

Write some code to execute a SELECT like the above and time how long  
it takes to do the SELECT and iterate through the result.  Then  
experiment with a few different indices.  You could start with two  
separate indices on reads.chr and reads.strand, but you might then try  
one index which combines them both, or even

CREATE INDEX fred ON TABLE reads (chr, strand, start)

Try a few combinations and see which one leads to fastest retrieval.

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] trying to optimize left outer join for large data set (multiple indices needed?)

Reply via email to