Re: [sqlite] Performance regression between 3.7.17 and 3.8.0.2?

Richard Hipp Thu, 18 Sep 2014 19:25:54 -0700

On Thu, Sep 18, 2014 at 8:54 AM, Merike <gas...@smail.ee> wrote:

> Hi everyone,
>
> Since upgrading to Kubuntu 14.04 I've had an issue with Quassel irc
> client being slow on startup when it retrieves backlog from database.
> I've tracked it down to different sqlite version being installed.
> Previously I had 3.7.17 and now have 3.8.2. I've tried various versions
> from https://launchpad.net/ubuntu/trusty/+source/sqlite3/+builds and can
> only pinpoint it to between 3.7.17 and 3.8.0.2 because there doesn't
> appear to be intermediate builds.
>


Thanks again for the regression report.  This problem is now fixed on the
SQLite trunk.  See
http://www.sqlite.org/src/info/72727b68cd0796?dc=22 for the check-in that
fixes the problem.

If all you want to know is that the problem has been fixed, you may stop
reading now.  If you are curious about the cause of the problem was, you
may continue reading.

Here is a simplified test case that illustrates the problem:

CREATE TABLE t1(x INTEGER PRIMARY KEY, a, b);
CREATE INDEX t1a ON t1(a);
CREATE INDEX t1ab ON t1(a,b);
explain query plan
SELECT * FROM t1 WHERE a=?1 ORDER BY x;

The query could use either index, t1a or t1ab, to lookup the appropriate
rows based on "WHERE a=?1".  If index t1a is used for the lookup, then the
rows will come out ordered by x, since every index contains the primary key
after the indexed columns.  It is as if the t1a index was really over two
columns, a and x, and the t1ab index was over three columns, a, b, and x.
So when t1a is used for lookup, SQLite searches for the first entry in the
index where a=?1 then it starts reading successive entries, as long as the
a=?1 condition holds, to find the primary key and hence to look up the
table content.  But notice how doing that extracts the rows in ascending
"x" order - exactly as requested by the ORDER BY clause.  SQLite recognizes
this and suppresses the sort operation since the rows are going to come out
in the correct order naturally.

But if t1ab is used to lookup the rows, there is that pesky 'b" column in
between the "a" which is constrained by "WHERE a=?" and the "x" column.
There might be two or more different "b" values for the same "a", and hence
the "x" values will not necessarily be in the right order.  So the output
must be sorted by a separate post-processing step.  That is much slower.

Normally, the query planner looks at all possible indexes and tries each
one to see which one will generate a plan with the least amount of work.
In that case, t1a would be selected over t1ab since t1a can omit the sort
operation at the end, which is usually one of the most expensive parts of a
query.  Unfortunately, the query planner (incorrectly) rejected the t1a
index in favor of t1ab very early in the query planning process.  There is
a prefilter that asks whether or not a index might give output in the ORDER
BY order naturally.  Such indexes are given special protection from early
rejection so that they can be thoroughly analyzed.  But in this case,
because the ORDER BY was on the INTEGER PRIMARY KEY rather than on a key
column, that prefilter failed to recognize that t1a might naturally render
the output in the desired order.  Hence, t1a ended up being rejected in
favor of t1ab, which is a covering index and hence involves less lookup
work.

The one-line change shown above fixes the prefilter so that it recognizes
that t1a might be useful in implementing the ORDER BY clause, thus giving
t1a protection from being overridden by t1ab.  Both t1a and t1ab indexes
have to compete in the final selection process which t1a clearly wins
because the satisfies the ORDER BY clause without having to do a separate
sort.




>
> I don't know what exact queries Quassel runs but most likely (after
> looking at some Quassel source on github) it's something like the
> following for a single channel:
> SELECT messageid, time, type, flags, sender, message
> FROM backlog
> JOIN sender ON backlog.senderid = sender.senderid
> WHERE bufferid = 102
> ORDER BY messageid DESC
> LIMIT 100;
>
> When I run this query on 3.7.17 it returns nearly instantly. When I do
> it on 3.8.0.2 or 3.8.6 which I also tried then it takes about 4 seconds
> on my not so fast spinning drive laptop. I've posted example database
> (minimized from original) at
> https://docs.google.com/file/d/0Bzx3gCDqfzVdcDNhdzlfVlh4ZTA/. Could
> someone either confirm or prove false that this query has become slower
> with newer versions?
>
> Merike
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Performance regression between 3.7.17 and 3.8.0.2?

Reply via email to