On 14/03/2013 3:09 PM, Richard Hipp wrote:
Nitpick:  A "bug" means it gets the wrong answer, which is not the case
here.  What you are reporting here is not a bug but an optimization
opportunity.
Oops... you're right. Sorry about that.

On Thu, Mar 14, 2013 at 2:07 PM, Ryan Johnson
<ryan.john...@cs.utoronto.ca>wrote:

The offending query (slightly modified version of Q7:

select count(*)
from supplier, lineitem, orders, customer, nation n1, nation n2
where s_suppkey = l_suppkey and o_orderkey = l_orderkey
       and c_custkey = o_custkey and s_nationkey = n1.n_nationkey
       and c_nationkey = n2.n_nationkey and (
           (n1.n_name = 'ALGERIA' and n2.n_name = 'EGYPT')
           or (n1.n_name = 'EGYPT' and n2.n_name = 'ALGERIA')
       ) and l_shipdate between  '1995-01-01' and  '1996-12-31';

The optimizer chooses this plan:

0|0|4|SCAN TABLE nation AS n1 (~25 rows)
0|1|5|SEARCH TABLE nation AS n2 USING INDEX nni (N_NAME=?) (~1 rows)
0|1|5|SEARCH TABLE nation AS n2 USING INDEX nni (N_NAME=?) (~1 rows)
0|2|0|SEARCH TABLE supplier USING INDEX snki (S_NATIONKEY=?) (~40 rows)
0|3|3|SEARCH TABLE customer USING INDEX cnki (C_NATIONKEY=?) (~600 rows)
0|4|2|SEARCH TABLE orders USING INDEX ocki (O_CUSTKEY=?) (~15 rows)
0|5|1|SEARCH TABLE lineitem USING INDEX lpki (L_ORDERKEY=?) (~2 rows)

Dropping index nni and disabling automatic indexing improves things a bit
(3.3 s, 50% speedup):

0|0|4|SCAN TABLE nation AS n1 (~25 rows)
0|1|0|SEARCH TABLE supplier USING INDEX snki (S_NATIONKEY=?) (~40 rows)
0|2|1|SEARCH TABLE lineitem USING INDEX lski (L_SUPPKEY=?) (~300 rows)
0|3|2|SEARCH TABLE orders USING INDEX opki (O_ORDERKEY=?) (~1 rows)
0|4|3|SEARCH TABLE customer USING INDEX cpki (C_CUSTKEY=?) (~1 rows)
0|5|5|SEARCH TABLE nation AS n2 USING INDEX npki (N_NATIONKEY=?) (~1 rows)

SQLite version 3.7.16 chooses the second plan regardless.  So that much has
already been addressed.
Great! I'll give it a try.


Presumably it's slow because predicates don't really hit until after all
the joins finish.

Sort of.  SQLite does evaluate predicates as soon as it can.  It doesn't
wait until after the inner join finishes to evaluate the predicates.  As
soon as all information needed for a predicate is available, it is
evaluated.

The problem is that SQLite does not do a lot of algebraic manipulation of
predicates to try to factor out terms that can be evaluated early.  So the
predicate:

       (n1.n_name='ALGERIA' and n2.n_name='EGYPT')
       OR (n1.n_name='EGYPT' and n2.n_name='ALGERIA')

is treated as a unit and cannot be evaluated until both n1 and n2 are
available.
That's what I figured; part of the "lite" in sqlite.

  If SQLite were to be enhanced to deal with this case, what it
would need to do is factor this into three separate conjuncts, as follows:

       (n1.n_name='ALGERIA' AND n2.n_name='EGYPT')
       OR (n1.n_name='EGYPT' and n2.n_name='ALGERIA')
    AND
       (n1.n_name='ALGERIA OR n1.n_name='EGYPT')
    AND
       (n2.n_name='EGYPT' OR n2.n_name='ALGERIA')

The second two are entirely redundant in the sense that if the first is
true then the second two are also true.  (SQLite has a method of marking
them so and making sure they are not evaluated if the first is evaluated.
Similar auxiliary conjuncts are used to help optimize LIKE, GLOB, and
BETWEEN operators)  But the second two conjuncts also depend on just a
single table, so they have the option of being evaluated early whereas the
first must wait until both tables have been evaluated.

If you augment the WHERE clause of your query by adding "AND
(n1.n_name='ALGERIA' OR n1.n_name='EGYPT')" you get the observed speedup.
I guess that feature got added after 3.7.13; it has no impact on 3.7.13, with or without the n_name index present, but a home-brew 3.7.16 beta build handles it fine. Actually, I get 0.21 s response time for even the original, unmodified query using 3.7.16.

None of this explains what purpose all those index probes on nation.n_name were doing if it wasn't applying predicates, but I guess it doesn't matter since the problem has been fixed in later versions.

Thanks,
Ryan



_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to