On 4/28/16, Rowan Worth <rowanw at dugeo.com> wrote: > Hi guys, > > In an attempt to understand a slow query I've had a quick look at the > contents of the sqlite_stat1 table. It looks like the stat column contains > a series of integers like: > > <number of rows in index> > <number of rows divided by distinct values of first indexed column> > <number of rows divided by distinct values of second indexed column> > ... > > Is this observation correct? And if so, does sqlite essentially assume that > the rows are equally distributed amongst all distinct values of the column?
Canonical documentation is here: https://www.sqlite.org/fileformat2.html#stat1tab > > It seems like an assumption that will definitely fail for certain kinds of > data, and since sqlite generally deals well with contingencies I'd be > surprised if that's the only criteria used to select an index by the query > planner... > > It looks like the ENABLE_STAT2/STAT4 options generate histogram data which > would probably help for non-uniform data? > Yes, STAT4 provides histogram data to help with query planning on tables with non-uniform data distributions. STAT4 is a compile-time option that is off by default, because it has the disadvantage of breaking the Query Planner Stability Guarantee (https://www.sqlite.org/queryplanner-ng.html#qpstab). The Query Planner Stability Guarantee is normally a more important consideration in embedded systems, which is the primary use case for SQLite. -- D. Richard Hipp drh at sqlite.org