[Repost: used a wrong e-mail at first, apologies if this comes through
twice]
On 2018/01/08 12:39 PM, x wrote:
Thanks Cezary and Scott. I’m now a bit clearer as to what’s happening. I
imagined the RowID as being a separate index which is the root of my confusion.
It would explain this
To elaborate a little - We often get people here asking "But why it
table-scans in stead of using my nice Index?".
This stems from an often-held misconception that Indexes are God-sent
magic to improve everything. The fact is that Indexes are costly
mechanisms which allows fast lookup which, only AFTER a certain critical
size and for specific circumstances, become more efficient than a scan.
The Query Planner has to do a lot of work to figure out what those
"critical size and specific circumstances" is for any specific query,
and it does get real fuzzy.
I think I've heard Richard or Dan explain it as follows (if memory
serves, someone please point out if I'm mistaken):
You can think of an SQLite table as essentially a btree covering Index
by itself with the Key being the Row_ID (or more recently, the PK for
WITHOUT ROWID tables). This is why the rowid (or any column serving as
an alias to it, or the PK for WITHOUT ROWID tables) cannot have NULL
values, but any other primary key could (in SQLite).
Being an Index by itself means that a Table-Scan is perhaps not as
inefficient as one might think and indeed using any other index means a
round-trip reading and hitting values in THAT index, then returning and
looking up the hit result in the rowid table index, and then reading the
pages(s) from it and extracting the data - where during a table scan,
all this round tripping is skipped.
So unless any prospective candidate Index for any query offers a truly
magnificent cost advantage, a table scan would probably be more
efficient, and so be chosen. This is why running ANALYZE on large
tables is needed, because it allows the QP to better deduce whether a
prospective Index might in fact offer such a magnificent cost reduction
or not. Another way is hinting at the QP (Search "likelihood" in the docs).
This is why a non-rowid-alias Primary Key on a rowid table is also less
efficient to scan than the table itself (often very non-intuitive) - or
- why a covering index sometimes gets avoided in a JOIN when it seems to
contain all needed data to fulfill the join obligation.
Also, often a great index is not used simply because the query planner
does not know enough about it and its prospective cost to obtain a good
estimate of its utility, and sometimes what feels intuitively to us as a
great Index just isn't really. The QP is not infallible, but it is quite
smart.
Cheers,
Ryan
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users