Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

Richard Hipp Mon, 02 Sep 2013 17:13:14 -0700

On Mon, Sep 2, 2013 at 10:58 AM, Bert Huijben <rhuij...@apache.org> wrote:


> We anticipate that the wc_id column will be used more in future versions
> and
> I had hoped that the assumption that a better index match (matching more
> usable columns) would always be preferable over one that uses less columns.
>
> All indexes have the wc_id column first, so I don't see a good reason that
> Sqlite in this case prefers only this column of the specific (bad) index
> over any other index that can help with the rest of the query.
>
> This optimization added in 3.7.12 made it easier for us to write clearer
> well performing queries, without any hassle. . . We can't really assume our
> users to run a specific version of Sqlite (as most unix distributions
> compile sqlite themselves for products like ours)...
>

As I've alluded to before, the issue here boils down to the "shape" of the
content of the database file.  By "shape", I mean the distribution of
values in each table, and in particular how well each index is able to
narrow down a search.  Knowledge of the shape of the data is critically
important in choosing the best query plan when there are two or more
potentially good query plans.  It isn't possible to make good decisions
about query plans without knowing the shape of the data.

When you run ANALYZE, SQLite records information in the sqlite_stat1 table
that gives it a good picture of the shape of the data.  If you have not run
ANALYZE or if you have dropped or deleted the sqlite_stat1 table, then
SQLite has no information about the true shape of the data, so it has to
guess.  The guess that SQLite has made for the the past decade is that the
left-most column of each index does a pretty good job of narrowing down the
search and then subsequent columns merely refine that search slightly.
This is just a wild guess, of course.  But it seems to have worked
remarkably well in a wide variety of databases in literally millions of
applications over the past ten years.

The issue at hand is that the guess that SQLite is making on the shape of
unanalyzed databases is not working very well for SVN.  The problem is that
the left-most column of many of the indices do a very poor job of narrowing
down the search, in violation of the assumption made for the standard guess.

As I've pointed out, one could just run ANALYZE to populate the
sqlite_stat1 table, then SQLite will know the database shape and will pick
good plans.  Or one can manually populate the sqlite_stat1 table with
something close to the anticipated shape of the database.  But Bert says he
doesn't want to do any of that.

So I've come up with a new, experimental compile-time option:

    -DSQLITE_DEFAULT_INDEX_SHAPE=1

Rebuilding with the compile-time option above, using code from the
index-shape-1 branch (http://www.sqlite.org/src/timeline?r=index-shape-1)
causes SQLite to make a much more pessimistic guess about the effectiveness
of unanalyzed indices.  In particular, using the new default shape, SQLite
assumes that you need to use all columns of an index in order to narrow the
search significantly.  The number of rows that need to be searched
increases geometrically based on the number of unused columns on the
right-hand side of the index.

Leave the SQLITE_DEFAULT_INDEX_SHAPE unset or set it to 0 to get the legacy
assumption about unanalyzed indices.  Set it to 1 for the new assumption
that seems to work better for the test queries I've run on SVN.  If I
decide to include this compile-time option in a release, future releases
might define other default index shapes using larger integers.

I am personally skeptical of this idea of having a compile-time option to
change the assumptions about the shape of unanalyzed content.  After all,
the legacy shape assumption has worked remarkably well up until now.  And,
more importantly, you can always run ANALYZE or otherwise populate the
sqlite_stat1 table to define the shape of the data very precisely.   It
seems to me that if you want to tweak the data shape assumptions, you
should use the mechanism that is already in place to do that (namely, the
sqlite_stat1 table).   But I'm open to including this new compile-time
option if it seems like it might make things easier for developers.

Please try this out (by downloading sources from
http://www.sqlite.org/src/info/d8daaba7da or by applying the simple patch
shown there to any 3.7.x or 3.8.x version of SQLite) and let me know what
you think.

Your feedback is appreciated.


-- 
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

Reply via email to