On Mon, Sep 2, 2013 at 10:58 AM, Bert Huijben <rhuij...@apache.org> wrote:
> We anticipate that the wc_id column will be used more in future versions > and > I had hoped that the assumption that a better index match (matching more > usable columns) would always be preferable over one that uses less columns. > > All indexes have the wc_id column first, so I don't see a good reason that > Sqlite in this case prefers only this column of the specific (bad) index > over any other index that can help with the rest of the query. > > This optimization added in 3.7.12 made it easier for us to write clearer > well performing queries, without any hassle. . . We can't really assume our > users to run a specific version of Sqlite (as most unix distributions > compile sqlite themselves for products like ours)... > As I've alluded to before, the issue here boils down to the "shape" of the content of the database file. By "shape", I mean the distribution of values in each table, and in particular how well each index is able to narrow down a search. Knowledge of the shape of the data is critically important in choosing the best query plan when there are two or more potentially good query plans. It isn't possible to make good decisions about query plans without knowing the shape of the data. When you run ANALYZE, SQLite records information in the sqlite_stat1 table that gives it a good picture of the shape of the data. If you have not run ANALYZE or if you have dropped or deleted the sqlite_stat1 table, then SQLite has no information about the true shape of the data, so it has to guess. The guess that SQLite has made for the the past decade is that the left-most column of each index does a pretty good job of narrowing down the search and then subsequent columns merely refine that search slightly. This is just a wild guess, of course. But it seems to have worked remarkably well in a wide variety of databases in literally millions of applications over the past ten years. The issue at hand is that the guess that SQLite is making on the shape of unanalyzed databases is not working very well for SVN. The problem is that the left-most column of many of the indices do a very poor job of narrowing down the search, in violation of the assumption made for the standard guess. As I've pointed out, one could just run ANALYZE to populate the sqlite_stat1 table, then SQLite will know the database shape and will pick good plans. Or one can manually populate the sqlite_stat1 table with something close to the anticipated shape of the database. But Bert says he doesn't want to do any of that. So I've come up with a new, experimental compile-time option: -DSQLITE_DEFAULT_INDEX_SHAPE=1 Rebuilding with the compile-time option above, using code from the index-shape-1 branch (http://www.sqlite.org/src/timeline?r=index-shape-1) causes SQLite to make a much more pessimistic guess about the effectiveness of unanalyzed indices. In particular, using the new default shape, SQLite assumes that you need to use all columns of an index in order to narrow the search significantly. The number of rows that need to be searched increases geometrically based on the number of unused columns on the right-hand side of the index. Leave the SQLITE_DEFAULT_INDEX_SHAPE unset or set it to 0 to get the legacy assumption about unanalyzed indices. Set it to 1 for the new assumption that seems to work better for the test queries I've run on SVN. If I decide to include this compile-time option in a release, future releases might define other default index shapes using larger integers. I am personally skeptical of this idea of having a compile-time option to change the assumptions about the shape of unanalyzed content. After all, the legacy shape assumption has worked remarkably well up until now. And, more importantly, you can always run ANALYZE or otherwise populate the sqlite_stat1 table to define the shape of the data very precisely. It seems to me that if you want to tweak the data shape assumptions, you should use the mechanism that is already in place to do that (namely, the sqlite_stat1 table). But I'm open to including this new compile-time option if it seems like it might make things easier for developers. Please try this out (by downloading sources from http://www.sqlite.org/src/info/d8daaba7da or by applying the simple patch shown there to any 3.7.x or 3.8.x version of SQLite) and let me know what you think. Your feedback is appreciated. -- D. Richard Hipp d...@sqlite.org _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users