Hello, I'm basically a newbie, but have been plunged into not just having correct code but having to optimize a particular query. The following query works but is very slow on my database with millions of entries in one of the tables.
SELECT regions.id, TOTAL(reads.data) FROM regions LEFT OUTER JOIN reads ON ( regions.chr = reads.chr AND regions.strand=reads.strand AND regions.start<=reads.start AND regions.stop>=reads.stop ) GROUP BY regions.id ORDER BY regions.id ; [I am writing with simplified names, so if there are any typos, its probably that, because the code does run] 'regions' is a table of somewhere between 20K to 300K rows and 'reads' is the larger one with millions of rows. I'm using sqlite 3.3.3. I currently have a single index of (chr,strand,start,stop), though I don't think this is helping (see below) I would be glad as to any advice as to the most efficient way to do this (and any idea of whether it will really speed anything up, or if I'm just stuck with the speed I see). Just to say what I've been thinking, though I don't know if its useful: In reading about optimizing with indexing, then it seems that I would want an index of the relevant columns in 'reads' (chr, strand, start, stop); however, since I have inequality matches for both start and stop, then a joint index like this will not help (http://www.sqlite.org/optoverview.html). I can make two indexes (chr, strand, start) and (chr,strand,stop), but I don't know how to get an outer join to use them both. There was then an example about breaking up the where clause into two parts 1=SELECT rowid from tablename WHERE (regions.chr = reads.chr AND regions.strand=reads.strand AND regions.start<=reads.start) 2= SELECT rowid from tablename WHERE (regions.chr = reads.chr AND regions.strand=reads.strand AND regions.stop>=reads.stop) and then intersecting the row ids, but I can't figure out to extend this to outer join (I'm not clear what are the correct replacement of rowid). I think I could do SELECT * from regions LEFT OUTER JOIN reads ON [1st] INTERSECT SELECT * from regions LEFT OUTER JOIN reads ON [2nd] but then that seems unlikely to be efficient, since you have to intersect all of the column information. And I also don't know how to do the syntax for summary that should surround this table. Thanks for any assistance, Elizabeth Purdom _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users