Re: [sqlite] Massive performance regression in 3.7.x

Max Vlasov Sat, 25 Sep 2010 12:06:28 -0700

On Sat, Sep 25, 2010 at 10:34 PM, Richard Hipp <[email protected]> wrote:


> On Sat, Sep 25, 2010 at 10:54 AM, Max Vlasov <[email protected]> wrote:
>
> > On Sat, Sep 25, 2010 at 6:35 PM, Richard Hipp <[email protected]> wrote:
> >
> > >
> > > You have a very unusual data distribution in your tables.  SQLite does
> > not
> > > know this and so it chooses a query plan (in 3.7.1 and later) that
> > assumes
> > > a
> > > different and more typical data distribution.  The new query plan works
> > > better for most cases, but is much worse in your unusual case.
> > >
> > >
> > >
> > Richard,
> > I think it's something different.
> > .....
> > So my guess is that an optimization introduced in 3.6.20 and existed in
> .23
> > was removed in 3.7 for some unknown reasons. Maybe actually there was a
> > reason, but the difference for INNER JOIN and LEFT JOIN leaves a chance
> > it's
> > some side effect. But I may be wrong
> >
>
> You are indeed wrong.
>
> A simplification of the schema is this:
>
>      CREATE TABLE a(w, x, PRIMARY KEY(w,x));
>      CREATE TABLE b(y, z, PRIMARY KEY(y,z));
>
> And the corresponding query is this:
>
>      SELECT * FROM a JOIN b ON w=3 AND x=z;
>
> Because ANALYZE has not been run, SQLite has no statistics on the content
> of
> the two tables so it guesses that both tables contain 1,000,000 elements
> and
> that the w=3 constraint on table A will narrow down the search to just 10
> entries.
>
> Given this information, the best plan is to lookup entries in table A where
> w=3 in the outer loop, and for each match do a full table scan of B looking
> for entries where x=z.  The query planner considers creating a temporary
> index on table B to help it find entries where z=x, but because it guesses
> that the inner loop will only run 10 times, the cost of creating an index
> on
> 1 million entries exceeds (very slightly) the cost of doing 10 full table
> scans.  And so it chooses to do the full table scan.
>
>

Thanks, Richard, this is a perfect explanation and now it makes sense to me.

As for left joins, I suppose that the optimizer choses other approach for
them since it has a bias in estimating the sizes of tables when the left
join was used. As common sense tells me, left joins usually used with big
tables looking for lookups in little tables. So in this case other approach
would have better estimate in performance. Is this true? I'm asking because
replacing inner with left would be used as an alternative in cases like this
in the future.

Max
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Massive performance regression in 3.7.x

Reply via email to