Re: [sqlite] Query planner bug on "distinct" clause

Dan Kennedy Mon, 08 Nov 2010 01:25:04 -0800

On Nov 8, 2010, at 5:12 AM, Alexey Pechnikov wrote:

> sqlite> .s object_record
> CREATE TABLE object_record
> (
>  record_id INTEGER PRIMARY KEY,
>  ts INTEGER NOT NULL DEFAULT (strftime('%s','now')),
>  object_id INTEGER NOT NULL
> );
> CREATE INDEX object_id_ts_idx on object_record(object_id,ts);
> CREATE INDEX object_ts_idx on object_record(ts);
>
> The planner bug with "distinct":
> sqlite> explain query plan select distinct object_id from (select
> ts,object_id from object_record where rowid>1581369-1) as x;
> 0|0|TABLE object_record WITH INDEX object_id_ts_idx ORDER BY
> CPU Time: user 0.000000 sys 0.000000
> sqlite> select distinct object_id from (select ts,object_id from
> object_record where rowid>1581369-1) as x;
> 31596
> CPU Time: user 0.224014 sys 0.000000
>
> The plan without "distinct" is correct:
> sqlite> explain query plan select object_id from (select  
> ts,object_id from
> object_record where rowid>1581369-1) as x;
> 0|0|TABLE object_record USING PRIMARY KEY
> CPU Time: user 0.000000 sys 0.000000
> sqlite> select object_id from (select ts,object_id from  
> object_record where
> rowid>1581369-1) as x;
> 31596
> CPU Time: user 0.000000 sys 0.000000


The concept of "correct" is tricky here. When planning a query,
SQLite normally assumes that the (rowid>?) clause identifies
1/3 of the rows in table object_record. Using this assumption,
it goes on to assume that scanning via the index (so that
duplicate object_id values can excluded without using any
additional data structure) is going to be faster than using the
rowid index and a temporary b-tree to filter out duplicate
entries.

If the (rowid>?) clause identifies a small number of rows,
then the last assumption will be incorrect - using the rowid
index would be much faster. But SQLite doesn't know this when
formulating a query plan.

Regards,
Dan.





> The correct plan may be:
> sqlite> drop  INDEX object_id_ts_idx;
> CPU Time: user 0.024001 sys 0.000000
> sqlite> explain query plan select distinct object_id from (select
> ts,object_id from object_record where rowid>1581369-1) as x;
> 0|0|TABLE object_record USING PRIMARY KEY
> CPU Time: user 0.000000 sys 0.000000
> sqlite> select distinct object_id from (select ts,object_id from
> object_record where rowid>1581369-1) as x;
> 31596
> CPU Time: user 0.000000 sys 0.000000
>
> And bug again:
> sqlite> CREATE INDEX object_id_idx on object_record(object_id);
> CPU Time: user 4.540283 sys 0.016001
> sqlite> explain query plan select distinct object_id from (select
> ts,object_id from object_record where rowid>1581369-1) as x;
> 0|0|TABLE object_record WITH INDEX object_id_idx ORDER BY
> CPU Time: user 0.000000 sys 0.000000
> sqlite> select distinct object_id from (select ts,object_id from
> object_record where rowid>1581369-1) as x;
> 31596
> CPU Time: user 0.236015 sys 0.000000
>
>
> -- 
> Best regards, Alexey Pechnikov.
> http://pechnikov.tel/
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Query planner bug on "distinct" clause

Reply via email to