Hi,
I am experiencing a massive performance issue on a query with a left join
in 3.24, now taking 40 seconds, whereas in 3.22 it was just a few
milliseconds.
The problematic query looks like
select d.key_field, count(*) nb
from low_volume_table b
join mid_volume_table c on c.key_b = b.id
left join high_volume_table d on d.key_c = c.id
where b.id >= $1
and b.filter1 = 0
and c.filter2 > 0
and d.filter3 > 0
and d.filter4 = 1
group by d.key_field
order by nb desc
The filter fields on it are relatively non-discriminant (and
non_indexed), however the key_field is indexed.
The most discriminating conditions in this query are those on the
low_volume and mid_volume tables, but the optimizer
selects as first action:
SCAN TABLE high_volume_table USING INDEX key_field_idx
which leads to a huge number of iterations.
If on the other hand, just one of the d filter conditions is removed, then
the optimizer goes (like 3.22) first for
SEARCH TABLE low_volume_table AS b USING COVERING INDEX
low_volume_table_id_idx
(b.filter1=? AND rowid>?)
This happens after running ANALYZE, the sqlite1_stat for the high_volume
table and key_field_idx is
5855234 6
while for the low_volume_table_filter1_idx it is
1976628 988314
While the low_volume_table_filter1_idx does not look very selective, as it
is coupled with rowid filtering, it is actually very effective
as it combines rowid & filter1, so there are just thousandths of rows being
considered in the "group by", while when starting from
a key_field_idx, there are millions of rows being considered, the
overwhelming majority not fulfilling the conditions.
The above table names and fields have been anonymized, if someone from the
SQLite team want to have a look at the actual data,
I can provide a database download (it's about 1.7 GB)
Thanks!
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users