Also ran a few index to "force" the query plan, but with limited success:

- the "indexed by" clause does not result in the optimizer using the index
first, it just uses the indexes in the later steps of the query plan.
- using "not indexed" still results in the same table scan of
high_volume_table first, just without the index.
- using the unary "+" on the d table filters has no effect on the query
plan (as these are not indexed in the first place I guess)

Using unlikely() on the d table filters seems to be the only option that
works.


On Tue, Jun 26, 2018 at 10:02 AM, Eric Grange <zar...@gmail.com> wrote:

> Hi,
>
> I am experiencing a massive performance issue on a query with a left join
> in 3.24, now taking 40 seconds, whereas in 3.22 it was just a few
> milliseconds.
> The problematic query looks like
>
>       select d.key_field, count(*) nb
>       from low_volume_table b
>       join mid_volume_table c on c.key_b = b.id
>       left join high_volume_table d on d.key_c = c.id
>       where b.id >= $1
>       and b.filter1 = 0
>       and c.filter2 > 0
>       and d.filter3 > 0
>       and d.filter4 = 1
>       group by d.key_field
>       order by nb desc
>
> The filter fields on it are relatively non-discriminant (and
> non_indexed), however the key_field is indexed.
>
> The most discriminating conditions in this query are those on the
> low_volume and mid_volume tables, but the optimizer
> selects as first action:
>
>     SCAN TABLE high_volume_table USING INDEX key_field_idx
>
> which leads to a huge number of iterations.
>
> If on the other hand, just one of the d filter conditions is removed, then
> the optimizer goes (like 3.22) first for
>
>    SEARCH TABLE low_volume_table AS b USING COVERING INDEX
> low_volume_table_id_idx (b.filter1=? AND rowid>?)
>
> This happens after running ANALYZE, the sqlite1_stat for the high_volume
> table and key_field_idx is
>
>      5855234 6
>
> while for the  low_volume_table_filter1_idx it is
>
>      1976628 988314
>
> While the  low_volume_table_filter1_idx does not look very selective, as
> it is coupled with rowid filtering, it is actually very effective
> as it combines rowid & filter1, so there are just thousandths of rows
> being considered in the "group by", while when starting from
> a key_field_idx, there are millions of rows being considered, the
> overwhelming majority not fulfilling the conditions.
>
> The above table names and fields have been anonymized, if someone from the
> SQLite team want to have a look at the actual data,
> I can provide a database download (it's about 1.7 GB)
>
> Thanks!
>
>
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to