Igor Tandetnik wrote:
> Jeremy Zeiber wrote:
>   
>> This query runs in ~ 17 ms:
>> SELECT COUNT(data) FROM detail AS outerdetail WHERE headerid=4 AND data
>> NOT IN (SELECT DISTINCT data FROM detail WHERE headerid<4)
>>     
>
> Here the subquery is not coordinated. It is run once, the results are stored 
> in an ephemeral table, then the check in the main query is performed against 
> that table.
>
>   
>> This query runs in ~ 15s:
>> SELECT COUNT(data) FROM detail AS outerdetail WHERE headerid=4 AND data
>> NOT IN (SELECT DISTINCT data FROM detail WHERE
>> headerid<outerdetail.headerid)
>>     
>
> Here the subquery is coordinated: since it depends on values in the current 
> row of the main query, it has to be rerun for each row. So you end up with 
> quadratic complexity.
>
>   
>> Why is there such a huge difference in the query time for two very
>> similar queries?  Is there anything I can do with the second query to
>> make it perform more like the first?
>>     
>
> The second query is equivalent to the first, it's just written in the form 
> that makes it difficult for SQLite to optimize. Why don't you just use the 
> first query?
>
>   
>> This query is part of a larger
>> query that aggregates detail data such as the following, so I can't hard
>> code the headerid in the query.
>>
>> SELECT headerid,
>> (SELECT COUNT(data) FROM detail AS outerdetail WHERE
>> headerid=header.headerid AND data NOT IN (SELECT DISTINCT data FROM
>> detail WHERE headerid<outerdetail.headerid)) AS newcount
>> FROM header GROUP BY headerid;
>>     
>
> Try replacing outerdetail.headerid with header.headerid.
>   
That particular query runs in ~ 30 seconds with outerdetail.header or 
header.headerid.  I do have another query which gives the same result 
that doesn't quite run as fast as the first, but it is certainly faster 
than the second:

SELECT headerid,
(SELECT (SELECT COUNT(DISTINCT data) FROM detail WHERE 
headerid=header.headerid)-COUNT(DISTINCT one.data) FROM detail AS one 
INNER JOIN detail AS two ON one.data=two.data WHERE 
two.headerid<one.headerid AND one.headerid=header.headerid) AS newcount
FROM header GROUP BY headerid;

That runs in about 3 seconds.  Still, there are only a few thousand rows 
in the test database, and the real data is going to have hundreds of 
thousands of rows, and this is just a small portion of the query.  Is 
there any way to rewrite the query to a better performing one?

> Igor Tandetnik
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>   

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to