On Thu, Oct 31, 2013 at 2:45 AM, Olaf Schmidt <n...@vbrichclient.com> wrote:

> Am 29.10.2013 13:19, schrieb Dominique Devienne:
>
>> [...]
>> First off, when you use functions like this, you basically rule
>> out index use, even if some_column is indexed. That's not good.
>> << WHERE col IN list>> OTOH, might use an index.
>>
>
> Might - yes, and *if* an index is used for the In-Checks, then
> you're perhaps "wasting it" - or it could be the wrong index
> which is chosen by the query-optimizer.
>
> The better index (in case you use Tmp-Tables) is not the index
> on col of the "real table", but the index on the Tmp-Table-Col.
>

I'm not convinced by this. The "real table" can be quite large, several
100's to 100,000's rows (up to 1+ million rows) and col can be the primary
key, or a non-unique "parent" key where many parent keys have about 10 rows
each, and a few have in the 1000's, while the in-list could very small
(down to just 1 element) or quite large (several thousands).

With a function based approach, you are *always* full-scanning the whole
"real" table, no matter the cardinality of the InList operand, and even
with a very fast InList function, this is not going to beat getting 10 PK
rows, or 10 "parent" key rows (e.g. 100 rows to 10,000 rows) via indexes,
especially since these are virtual tables with Boost.MultiIndex unique or
non-unique indexes (i.e. 5x to 10x faster than SQLite's paged B-tree
indexes). It might well beat it if the InList operand cardinality is high,
as in your 40K and 60K testing in a 100K rows table, because an InList
that's 40% or 60% of the whole table is close enough to a full scan that
using a native code set or map test similarly outperforms SQLite's generic
paged B-tree indexes like our Boost.MultiIndex-based indexes.

Of course that's speculation on my part, versus your timed experimentation,
so could well be that I'm wrong. And I'll need to look into this eventually.

Plus I haven't looked at the stat tables the new query optimizer is
increasingly using to find the best plan, to put information in there for
the cardinality of our vtables and our "selection" tmp-tables, so SQLite
has enough info to do its planning. Heck when I'm mixing vtable index costs
and real (tmp) table index costs, I have no clue the costs am I returning
are compatible. That's an area that's not well covered by the doc IMHO,
which I haven't explored enough. So as of now it's possible SQLite would
never select a plan that privileges a PK or non-unique index access on the
"real" table.

In any case, thank you for your persistence and challenging my assumptions.
Your experiments are very interesting, and I'll try to report back in this
thread any of my own findings in light of the information we've provided.
Thanks a bunch Olaf.

Cheers, --DD
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to