[sqlite] ANALYZE, sqlite_stat1, and query planning

Richard Hipp Thu, 28 Apr 2016 11:53:24 -0400

On 4/28/16, Rowan Worth <rowanw at dugeo.com> wrote:
> Hi guys,
>
> In an attempt to understand a slow query I've had a quick look at the
> contents of the sqlite_stat1 table. It looks like the stat column contains
> a series of integers like:
>
> <number of rows in index>
> <number of rows divided by distinct values of first indexed column>
> <number of rows divided by distinct values of second indexed column>
> ...
>
> Is this observation correct? And if so, does sqlite essentially assume that
> the rows are equally distributed amongst all distinct values of the column?


Canonical documentation is here:
https://www.sqlite.org/fileformat2.html#stat1tab


>
> It seems like an assumption that will definitely fail for certain kinds of
> data, and since sqlite generally deals well with contingencies I'd be
> surprised if that's the only criteria used to select an index by the query
> planner...
>
> It looks like the ENABLE_STAT2/STAT4 options generate histogram data which
> would probably help for non-uniform data?
>

Yes, STAT4 provides histogram data to help with query planning on
tables with non-uniform data distributions.  STAT4 is a compile-time
option that is off by default, because it has the disadvantage of
breaking the Query Planner Stability Guarantee
(https://www.sqlite.org/queryplanner-ng.html#qpstab).  The Query
Planner Stability Guarantee is normally a more important consideration
in embedded systems, which is the primary use case for SQLite.

-- 
D. Richard Hipp
drh at sqlite.org

[sqlite] ANALYZE, sqlite_stat1, and query planning

Reply via email to