Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

Bert Huijben Tue, 03 Sep 2013 04:34:30 -0700


> -----Original Message-----
> From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> boun...@sqlite.org] On Behalf Of Richard Hipp
> Sent: dinsdag 3 september 2013 02:12
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Inefficient covering index used for Subversion with
> SQLite 3.8.0
> 
> On Mon, Sep 2, 2013 at 10:58 AM, Bert Huijben <rhuij...@apache.org>
> wrote:
> 
> > We anticipate that the wc_id column will be used more in future versions
> > and
> > I had hoped that the assumption that a better index match (matching more
> > usable columns) would always be preferable over one that uses less
> columns.
> >
> > All indexes have the wc_id column first, so I don't see a good reason
that
> > Sqlite in this case prefers only this column of the specific (bad) index
> > over any other index that can help with the rest of the query.
> >
> > This optimization added in 3.7.12 made it easier for us to write clearer
> > well performing queries, without any hassle. . . We can't really assume
our
> > users to run a specific version of Sqlite (as most unix distributions
> > compile sqlite themselves for products like ours)...
> >
> 
> As I've alluded to before, the issue here boils down to the "shape" of the
> content of the database file.  By "shape", I mean the distribution of
> values in each table, and in particular how well each index is able to
> narrow down a search.  Knowledge of the shape of the data is critically
> important in choosing the best query plan when there are two or more
> potentially good query plans.  It isn't possible to make good decisions
> about query plans without knowing the shape of the data.


I understand.
A long long time ago I implemented similar indexing myself for a library
system, where I couldn't even use the whole keys as an index caused by the
storage limitations at the time. I'm glad we don't have that 1980's limits
any more.


At Subversion Our databases aren't well formed by the relational standards,
but everything worked reasonably well with Sqlite 3.7 before. 
I then tuned all queries for Sqlite 3.7.12-3.7.18 where I could get all of
our queries use the right indexes without much trouble.

> When you run ANALYZE, SQLite records information in the sqlite_stat1 table
> that gives it a good picture of the shape of the data.  If you have not
run
> ANALYZE or if you have dropped or deleted the sqlite_stat1 table, then
> SQLite has no information about the true shape of the data, so it has to
> guess.  The guess that SQLite has made for the the past decade is that the
> left-most column of each index does a pretty good job of narrowing down
> the
> search and then subsequent columns merely refine that search slightly.
> This is just a wild guess, of course.  But it seems to have worked
> remarkably well in a wide variety of databases in literally millions of
> applications over the past ten years.

I don't like that I have to do this, but this appears to be the best
maintainable option for the long term future

> The issue at hand is that the guess that SQLite is making on the shape of
> unanalyzed databases is not working very well for SVN.  The problem is
that
> the left-most column of many of the indices do a very poor job of
narrowing
> down the search, in violation of the assumption made for the standard
> guess.
> 
> As I've pointed out, one could just run ANALYZE to populate the
> sqlite_stat1 table, then SQLite will know the database shape and will pick
> good plans.  Or one can manually populate the sqlite_stat1 table with
> something close to the anticipated shape of the database.  But Bert says
he
> doesn't want to do any of that.
> 
> So I've come up with a new, experimental compile-time option:
> 
>     -DSQLITE_DEFAULT_INDEX_SHAPE=1
> 
> Rebuilding with the compile-time option above, using code from the
> index-shape-1 branch (http://www.sqlite.org/src/timeline?r=index-shape-1)
> causes SQLite to make a much more pessimistic guess about the
> effectiveness
> of unanalyzed indices.  In particular, using the new default shape, SQLite
> assumes that you need to use all columns of an index in order to narrow
the
> search significantly.  The number of rows that need to be searched
> increases geometrically based on the number of unused columns on the
> right-hand side of the index.

I can confirm that this patch updates all our queries to work with a query
plan at least as efficient as with Sqlite 3.7.12-3.7.18.

I will spend some time over the next few days to see if I can get more
details on the difference over the different versions by using a standard
install, a _stat1 table and this patch by extending our test infrastructure.

> Leave the SQLITE_DEFAULT_INDEX_SHAPE unset or set it to 0 to get the
> legacy
> assumption about unanalyzed indices.  Set it to 1 for the new assumption
> that seems to work better for the test queries I've run on SVN.  If I
> decide to include this compile-time option in a release, future releases
> might define other default index shapes using larger integers.
> 
> I am personally skeptical of this idea of having a compile-time option to
> change the assumptions about the shape of unanalyzed content.  After all,
> the legacy shape assumption has worked remarkably well up until now.  And,
> more importantly, you can always run ANALYZE or otherwise populate the
> sqlite_stat1 table to define the shape of the data very precisely.   It
> seems to me that if you want to tweak the data shape assumptions, you
> should use the mechanism that is already in place to do that (namely, the
> sqlite_stat1 table).   But I'm open to including this new compile-time
> option if it seems like it might make things easier for developers.

The easiest way for us to integrate this would be for this to be a runtime
option (as some platforms will just replace an amalgamation if they can hack
that in), but personally I would be even more skeptical about that than
about a new compile-time option :)

If this option is not generic enough for Sqlite itself we should find a way
to handle things properly on the Subversion side.


A few years ago we replaced our own working copy storage with Sqlite to make
things much easier to maintain for the developers, but in some cases our
design doesn't really map to the common cases for a relational database.
Pathnames are bad keys and by adding a working copy specifier we made things
worse. Especially since we didn't even get to using this specifier over the
last two versions.

Initially we just assumed that Sqlite with the right indexes would just
handle all our cases fast enough, but we found out that while this worked
for reasonably sized environments it didn't scale well. At that point we
started investigating how our queries were handled.

After 3.7.12 made the OR optimization more aggressive by default, we wrongly
assumed that we could just depend on this. 
The better query optimizer in 3.8 now handles quite a few other cases
better, but to make Subversion scale to big working copies we really need
Sqlite to use the indexes wherever it can. (At one point we even
investigated if we could just use sqlite as the storage and introduce
pre-optimized queries. I'm glad we didn't have to go that far). We prepare
all our queries only once (at first usage) so the runtime cost is far more
important to us than the prepare time.


It appears this configuration option works great for us, but I'll spend more
time looking into other options too, to find out which direction we should
take.

Thanks for your help+advice,

        Bert
> 
> Please try this out (by downloading sources from
> http://www.sqlite.org/src/info/d8daaba7da or by applying the simple patch
> shown there to any 3.7.x or 3.8.x version of SQLite) and let me know what
> you think.
> 
> Your feedback is appreciated.
> 
> 
> --
> D. Richard Hipp
> d...@sqlite.org
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

Reply via email to