Igor Tandetnik wrote: > Jeremy Zeiber wrote: > >> This query runs in ~ 17 ms: >> SELECT COUNT(data) FROM detail AS outerdetail WHERE headerid=4 AND data >> NOT IN (SELECT DISTINCT data FROM detail WHERE headerid<4) >> > > Here the subquery is not coordinated. It is run once, the results are stored > in an ephemeral table, then the check in the main query is performed against > that table. > > >> This query runs in ~ 15s: >> SELECT COUNT(data) FROM detail AS outerdetail WHERE headerid=4 AND data >> NOT IN (SELECT DISTINCT data FROM detail WHERE >> headerid<outerdetail.headerid) >> > > Here the subquery is coordinated: since it depends on values in the current > row of the main query, it has to be rerun for each row. So you end up with > quadratic complexity. > > >> Why is there such a huge difference in the query time for two very >> similar queries? Is there anything I can do with the second query to >> make it perform more like the first? >> > > The second query is equivalent to the first, it's just written in the form > that makes it difficult for SQLite to optimize. Why don't you just use the > first query? > > >> This query is part of a larger >> query that aggregates detail data such as the following, so I can't hard >> code the headerid in the query. >> >> SELECT headerid, >> (SELECT COUNT(data) FROM detail AS outerdetail WHERE >> headerid=header.headerid AND data NOT IN (SELECT DISTINCT data FROM >> detail WHERE headerid<outerdetail.headerid)) AS newcount >> FROM header GROUP BY headerid; >> > > Try replacing outerdetail.headerid with header.headerid. > That particular query runs in ~ 30 seconds with outerdetail.header or header.headerid. I do have another query which gives the same result that doesn't quite run as fast as the first, but it is certainly faster than the second:
SELECT headerid, (SELECT (SELECT COUNT(DISTINCT data) FROM detail WHERE headerid=header.headerid)-COUNT(DISTINCT one.data) FROM detail AS one INNER JOIN detail AS two ON one.data=two.data WHERE two.headerid<one.headerid AND one.headerid=header.headerid) AS newcount FROM header GROUP BY headerid; That runs in about 3 seconds. Still, there are only a few thousand rows in the test database, and the real data is going to have hundreds of thousands of rows, and this is just a small portion of the query. Is there any way to rewrite the query to a better performing one? > Igor Tandetnik > > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users