Re: [sqlite] sqlite 3.13.0 does not use indexes as 3.8.11.1 does

Richard Hipp Tue, 26 Jul 2016 12:47:37 -0700

On 7/26/16, Alessandro Fardin <alessandro.far...@gavazziacbu.it> wrote:
>
> The Query planner with the same SELECT statement on same table with the
> same indexes does not use index at all, but parse the entire table.
> Of course this causes a dramatically slow down of the application.
>


When you say "of course this causes a slowdown" you imply that using
an index is always faster.  That is a reasonable heuristic, but it is
not always true.  It is sometimes faster to do a table scan rather
than use an index.

Consider this simple example:

    CREATE TABLE t1(a,b,c,d);
    CREATE INDEX t1a ON t1(a);
    SELECT * FROM t1 WHERE b=99 ORDER BY a LIMIT 1;

To use the index here means to scan the index from beginning to end.
Scanning the index rather than scanning the table causes the output to
appear in the correct order so that no sorting is required to satisfy
the ORDER BY.  For each row in the index, SQLite has to look up the
corresponding row in the table, extract the value of column b, check
to see if b=99, then output the row and exit if it does.  If most of
the t1.b values are not equal to 99, then we might have to do many
thousands of lookups into the table before arriving at an answer.
This is the approach taken by SQLite 3.11 and earlier.

The alternative approach is to simply scan the table looking for
entries where b=99 and remember the one entry that has the smallest
value for t1.a.  This alternative algorithm does a single linear scan
through the entire table, but avoids any random lookups, and so is
faster for the usual case when t1.b!=99.  This alternative algorithm
was unknown to older versions of SQLite.  It was introduced as a new
possibility in SQLite 3.12 and is used if the query planner thinks it
will be the faster.  The algorithm probably will be the faster for the
example schema above.

The difficulty is that the example above does not exactly match your
use-case. Your use-case is more like this:

    CREATE TABLE t1(a,b,c,d);
    CREATE INDEX t1ab ON t1(a, b);
    SELECT * FROM t1 WHERE b=99 ORDER BY a LIMIT 1;

The difference is that the index now contains the t1.b value.  Hence
when scanning the index, SQLite no longer needs to look up each
corresponding row in the original table in order to find t1.b in order
to do the b=99 test.  The t1.b is available in the index.  And because
the table lookups are not required, the older 3.11 algorithm will
likely be faster than the newer 3.12 algorithm.

The problem is that the query planner is currently not smart enough to
recognize that the table lookups are not required.  So the query
planner estimates the cost of the 3.11 algorithm as if each row
requires a table lookup.  Thus the cost estimate for the 3.11
algorithm is artificially high, and this causes SQLite to choose the
3.12 algorithm instead.

Two possible fixes are:

(1) Disable the new query algorithm introduced in 3.12.  This does not
fix the cost estimate for the 3.11 algorithm, but as there are no
other reasonable competing algorithm choices, the 3.11 algorithm will
still win.  The problem here is that there exist queries for which the
new 3.12 algorithm is desirable, and so disabling it will cause those
queries to run more slowly.

(2) Invest the time and code needed for the query planner to make a
better cost estimate for the 3.11 algorithm in cases where the WHERE
clause can be checked using terms taken from only the index.  This
will be a good deal of work, and will likely delay the release of
3.14.

Do not underestimate the difficulty of implementing fix (2).  It is
not a simple binary decision of "does the index contain all the terms
needed to evaluate the WHERE clause".  It might be that the index
contains some of the terms, but not others.  Consider:

    SELECT * FROM t1 WHERE b=99 AND c<>22 ORDER BY a LIMIT 1;

For this new query, the b=99 term can be evaluated without doing a
table lookup, but the c<>22 term requires a table lookup.  The query
planner could score this by assuming that b=99 is usually false, so
that most rows are eliminated without a table lookup and only rarely
is a table lookup necessary in order to evaluate c<>22.  But that
would also require that the query planner recognize that the b=99 term
is less expensive than the c<>22 term and hence should be run first -
a distinction that the query planner is not currently able to make.

I do not yet know which resolution we will take on this issue.

A Note On Terminology:  I do not call this a "bug".  A "bug" means
SQLite gets the wrong answer.  In this case, SQLite is still getting
the correct answer, it is merely doing so more slowly than desired.
Running slowly is an "issue" and it will be addressed, but slowness
does not raise to the level of being called a 'bug".
-- 
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sqlite 3.13.0 does not use indexes as 3.8.11.1 does

Reply via email to