I’ve got an urgent-to-me issue wherein a customer says that our library has
slowed down by several orders of magnitude in iOS 10.3 compared to iOS 9. I ran
a test case they provided and found that some queries that should be fast are
taking a very long time (~500ms) and generating huge numbers of reads. Using
EXPLAIN QUERY PLAN I found that the optimizer has generated a horrifically bad
plan, essentially an inside-out join, that results in linear scans of two large
tables. (Presumably this is due to some change in the optimizer between the
older and newer versions of SQLite in iOS.)
Now I need to figure out how to fix this ASAP. Our library periodically runs
ANALYZE, so sqlite might not even start out using the bad query plan; it might
happen only with large databases or with some specific data sets.
The query, with some extraneous details removed, looks like:
SELECT docs.doc_id, revs.sequence, docs.docid, rev.revid, rev.json, […]
FROM revs, docs
WHERE revs.sequence>? AND revs.current!=0 AND revs.doc_id = docs.doc_id
ORDER BY revs.doc_id, revs.deleted, revs.revid DESC
The “revs” table has “sequence” as its integer primary key, and a foreign key
“doc_id” referencing the “docs” table. (Plus many other columns.)
The “docs” table has “doc_id” as its primary key and “docid” as a text column.
Basically this query is meant to find the latest revs, including their “docid”
strings joined from the “docs” table. It should just be using the “revs”
table’s primary index starting from the “?” parameter, then looking up the
corresponding doc for each revision. Instead it does this:
selectid = 0
order = 0
from = 1
detail = SCAN TABLE docs USING COVERING INDEX docs_docid
selectid = 0
order = 1
from = 0
detail = SEARCH TABLE revs USING INDEX revs_current (doc_id=?)
selectid = 0
order = 0
from = 0
detail = USE TEMP B-TREE FOR ORDER BY
It’s scanning through all 50,000+ rows in “docs”, and then for each document
looking up all the rows in “revs” that reference it (“revs” has 175,000 rows),
and comparing their sequences to the parameter.
(I’ve pasted the relevant parts of the db schema at the bottom of this email if
you want to refer to them.)
My first priority is to find some way to restate this query such that it will
always use the revs.sequence primary key index.
A lower priority is to know whether there’s a valid reason the query planner
came up with this, or if it’s a bug in the optimizer or the ANALYZE command? In
the planner’s defense, it probably doesn’t know that this query is almost
always run with the starting sequence (“?”) relatively close to the maximum
sequence. If low numbers were frequently given for “?”, it’s possible that its
current plan would be more efficient…
Thanks,
—Jens
CREATE TABLE docs (doc_id INTEGER PRIMARY KEY, docid TEXT UNIQUE NOT NULL,
expiry_timestamp INTEGER);
CREATE INDEX docs_docid ON docs(docid);
CREATE TABLE revs (sequence INTEGER PRIMARY KEY AUTOINCREMENT, doc_id INTEGER
NOT NULL REFERENCES docs(doc_id) ON DELETE CASCADE, revid TEXT NOT NULL COLLATE
REVID, parent INTEGER REFERENCES revs(sequence) ON DELETE SET NULL, current
BOOLEAN, deleted BOOLEAN DEFAULT 0, json BLOB, no_attachments
BOOLEAN, doc_type TEXT, UNIQUE (doc_id, revid));
CREATE INDEX revs_parent ON revs(parent);
CREATE INDEX revs_by_docid_revid ON revs(doc_id, revid desc, current, deleted);
CREATE INDEX revs_current ON revs(doc_id, current desc, deleted, revid desc);
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users