Richard,

Thanks for acknowledging this, you are absolutely right, that's why I stated
that no DB does perfect optimisations and that computing the alternative
-OR- based closures are probably much harder to tackle. Also E. Pasma
pointed out the -OR- queries as I wrote them are not really semantically
equivalent unless the 2 joins are disjunct.

However, the case of the imperative WHERE NOT NULL implying INNER JOIN is
just a matter of replacing a predicate with a stronger one, so in all
fairness I imagined it a far lesser overhead than, say, the query flattener.
And I imagine it's a much more common situation, too, especially when users
are adding additional filters via WHERE clauses to a base query, so it might
benefit a lot of users, too. I know it would us, by not having to rewrite
these queries when porting; we are working on x86 servers, and a stick of
memory or a hard drive cost less than a programmer's day for us :) 

For the extra memory, I know for computing relational closures the spatial
complexity can get big, but only when the structure of the query is written
warrants it in the first place, so it shouldn't manifest heavily on a query
that doesn't have this structure.

This is just my best view on this, obviously it's a political decision to be
made so it's no make-it-or-break-it thing, like mentioned before, we are
porting some pretty big system and when I notice differences with SQLite, I
jolt them down, in the hope it might benefit you or the millions of users,
if not by changing SQLite, then simply by pointing out the workaround to
other users, such as moving the WHERE condition out to the ON clause, it's
not necessarily a trivial thing to consider for everyone. 

Ryan, 

You cannot ask SQL a Method query, that's where my whole RDBMS understanding
takes me. It nullifies the purpose of queries as well as all efforts you
yourselves have put into a lot of things, query flattening to mention just
one. The "same result" is not accidental, the equivalent queries will
produce the same result no matter which data populates the tables. That is
the only deffinition I know of semantic equivalence. SQL is declarative and
thus everything that describes the same thing is the same thing.

E.Pasma,

Thanks for taking the time to make the TC. This is always a huge putdown for
me, because finally the execution plan depends on the data indexes are
populated with (via ANALYZE) and are tables are huge so it's always a
putdown for me to create a minimal TC.

Indeed I noticed just now the 2 queries are not equivalent that way :)
Thanks for pointing that out! I will work on an equivalent -and- optimized
rewrite :)

For the query plans though, here is where the index stats come in: here a
"SCAN a" makes sense, but in our case the number of records in a is on the
order of 10000x records to b and c, and also the cardinality of b.d and c.d
is on the order of 1000; so a "SEARCH b, SEARCH c" works out. 

At minimum you should have indexes on b.d, c.d, a.ab, a.ac; but even so and
with adding another 1000 records on a, b and c and running the query:

EXPLAIN QUERY PLAN
SELECT * 
FROM 
  a 
JOIN 
  b ON b=ab AND b.d IN (1,2,3) 
JOIN 
  c ON c=ac AND c.d IN (4,5) 

selectid |order |from |detail                                             |
---------|------|-----|---------------------------------------------------|
0        |0     |0    |SCAN TABLE a                                       |
0        |1     |1    |SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?) |
0        |0     |0    |EXECUTE LIST SUBQUERY 1                            |
0        |2     |2    |SEARCH TABLE c USING INTEGER PRIMARY KEY (rowid=?) |

it still plans a "SCAN a" first. So I guess I'll have to backtrack from the
real data to generate a TC.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to