William,

William Hachfeld wrote:

Hi,

Have a question for everyone regarding index usage in SQLite... Say that I have
the following database schema:

   CREATE TABLE Example (
        id INTEGER PRIMARY KEY,
        grp INTEGER,
        begin INTEGER,
        end INTEGER
   );

and I want to perform the following query:

   SELECT id FROM Example WHERE grp=g AND x < end AND y >= begin;

on a large number of rows (say around one million) for some group 'g' and an
interval '[x, y)'. And, of course, with the assumption that (end > begin) for
all rows. Will my query performance be substantially improved by creating an
index such as:

   CREATE INDEX MultiColumnIndex ON Example (grp, begin, end)

or will the operators "<" and ">=" prohibit SQLite from using the index?


I have almost the same table in my linguistic database, Emdros (http://emdros.org). What I have found that works best for me is to put an index on what you call "begin" (not a double index), then do

SELECT id
FROM Example
WHERE grp = g
   AND begin BETWEEN {x} AND {y-1}
   AND end BETWEEN {x} AND {y-1}.

For some strange reason, this is about 5% faster than what you were proposing. It could be because SQLite does not know that begin <= end, and so can't make optimizations about when to stop looking.


Also, I'm aware that SQLite supports multi-column indicies, but not the use of
multiple indicies per query. Is it possible to get around the later restriction
by expressing my above query using a sub-select:

   SELECT id FROM (SELECT * FROM Example WHERE grp=g)
        WHERE x < end AND y >= begin;

and then creating the following indicies instead:

   CREATE INDEX GroupIndex ON Example (group)
   CREATE INDEX IntervalIndex ON Example (begin, end)

And if so, can any generalizations be made regarding the performance of using
the two indicies versus the first, single, index? How about disk usage?


I cannot comment on this, except that I've run EXPLAIN on my versions of the above queries, and found that SQLite wouldn't consult the "end" part of the (begin,end) index. Instead, it would consult the "end" part of the table column, and then only use the "begin" part of the index. At least that's how I understood the EXPLAIN output, but I may be wrong.

The upshot of the above is that you can save diskspace by not doing the double index, and only indexing "begin", since for these queries, the "end" part is redundant (i.e., not used) in the index.

Cheers,

Ulrik

--
Ulrik Petersen, MA, B.Sc.
Emdros -- the text database engine for analyzed or annotated text
http://emdros.org/




Reply via email to