Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

Hick Gunter Mon, 02 Sep 2013 22:39:24 -0700

I think this may help me significantly, as none of the application SELECTs use 
native tables (only the schema metadata analyzer/checker uses native tables) 
but quite a few virtual tables (which cannot be ANALYZED).

-----Ursprüngliche Nachricht-----
Von: Richard Hipp [mailto:[email protected]]
Gesendet: Dienstag, 03. September 2013 02:12
An: General Discussion of SQLite Database
Betreff: Re: [sqlite] Inefficient covering index used for Subversion with 
SQLite 3.8.0

On Mon, Sep 2, 2013 at 10:58 AM, Bert Huijben <[email protected]> wrote:

> We anticipate that the wc_id column will be used more in future
> versions and I had hoped that the assumption that a better index match
> (matching more usable columns) would always be preferable over one
> that uses less columns.
>
> All indexes have the wc_id column first, so I don't see a good reason
> that Sqlite in this case prefers only this column of the specific
> (bad) index over any other index that can help with the rest of the query.
>
> This optimization added in 3.7.12 made it easier for us to write
> clearer well performing queries, without any hassle. . . We can't
> really assume our users to run a specific version of Sqlite (as most
> unix distributions compile sqlite themselves for products like ours)...
>

As I've alluded to before, the issue here boils down to the "shape" of the 
content of the database file.  By "shape", I mean the distribution of values in 
each table, and in particular how well each index is able to narrow down a 
search.  Knowledge of the shape of the data is critically important in choosing 
the best query plan when there are two or more potentially good query plans.  
It isn't possible to make good decisions about query plans without knowing the 
shape of the data.

When you run ANALYZE, SQLite records information in the sqlite_stat1 table that 
gives it a good picture of the shape of the data.  If you have not run ANALYZE 
or if you have dropped or deleted the sqlite_stat1 table, then SQLite has no 
information about the true shape of the data, so it has to guess.  The guess 
that SQLite has made for the the past decade is that the left-most column of 
each index does a pretty good job of narrowing down the search and then 
subsequent columns merely refine that search slightly.
This is just a wild guess, of course.  But it seems to have worked remarkably 
well in a wide variety of databases in literally millions of applications over 
the past ten years.

The issue at hand is that the guess that SQLite is making on the shape of 
unanalyzed databases is not working very well for SVN.  The problem is that the 
left-most column of many of the indices do a very poor job of narrowing down 
the search, in violation of the assumption made for the standard guess.

As I've pointed out, one could just run ANALYZE to populate the
sqlite_stat1 table, then SQLite will know the database shape and will pick good 
plans.  Or one can manually populate the sqlite_stat1 table with something 
close to the anticipated shape of the database.  But Bert says he doesn't want 
to do any of that.

So I've come up with a new, experimental compile-time option:

    -DSQLITE_DEFAULT_INDEX_SHAPE=1

Rebuilding with the compile-time option above, using code from the
index-shape-1 branch (http://www.sqlite.org/src/timeline?r=index-shape-1)
causes SQLite to make a much more pessimistic guess about the effectiveness of 
unanalyzed indices.  In particular, using the new default shape, SQLite assumes 
that you need to use all columns of an index in order to narrow the search 
significantly.  The number of rows that need to be searched increases 
geometrically based on the number of unused columns on the right-hand side of 
the index.

Leave the SQLITE_DEFAULT_INDEX_SHAPE unset or set it to 0 to get the legacy 
assumption about unanalyzed indices.  Set it to 1 for the new assumption that 
seems to work better for the test queries I've run on SVN.  If I decide to 
include this compile-time option in a release, future releases might define 
other default index shapes using larger integers.

I am personally skeptical of this idea of having a compile-time option to 
change the assumptions about the shape of unanalyzed content.  After all, the 
legacy shape assumption has worked remarkably well up until now.  And, more 
importantly, you can always run ANALYZE or otherwise populate the
sqlite_stat1 table to define the shape of the data very precisely.   It
seems to me that if you want to tweak the data shape assumptions, you should 
use the mechanism that is already in place to do that (namely, the
sqlite_stat1 table).   But I'm open to including this new compile-time
option if it seems like it might make things easier for developers.

Please try this out (by downloading sources from 
http://www.sqlite.org/src/info/d8daaba7da or by applying the simple patch shown 
there to any 3.7.x or 3.8.x version of SQLite) and let me know what you think.

Your feedback is appreciated.

--
D. Richard Hipp
[email protected]
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

--------------------------------------------------------------------------
 Gunter Hick
Software Engineer
Scientific Games International GmbH
Klitschgasse 2 – 4, A - 1130 Vienna, Austria
FN 157284 a, HG Wien
Tel: +43 1 80100 0
E-Mail: [email protected]

This e-mail is confidential and may well also be legally privileged. If you 
have received it in error, you are on notice as to its status and accordingly 
please notify us immediately by reply e-mail and then delete this message from 
your system. Please do not copy it or use it for any purposes, or disclose its 
contents to any person as to do so could be a breach of confidence. Thank you 
for your cooperation.
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

Reply via email to