I think the difficulty here is that the optimizer is oriented toward the
"low level" and mainly concerned with choosing indexes, processing order
etc  (see https://www.sqlite.org/optoverview.html ) and has sort of a
narrow view of the task.

Approaching it from the other end and breaking down the SQL statement into
a set of constraints over a set of relations linked by those constraints
you can use ideas from constraint programming, such as your optimisation
which is constraint propagation: adding redundant constraints that are
logically implied by the existing constraints; the redundant constraints
don't change the meaning of the query but serve to "prune" the query space
thus reducing the work.

This sort of approach could be implemented as a SQL to SQL transformation
and would complement the existing query planner. Maybe there's a tool out
there already which processes standard SQL in this way?

On Wed, May 27, 2015 at 3:31 AM, Matthias-Christian Ott <ott at mirix.org>
wrote:

> I have the following (simplified and generalized) schema and query:
>
> CREATE TABLE nodes (
>   id INTEGER NOT NULL PRIMARY KEY
> );
>
> CREATE TABLE edges (
>   a INTEGER NOT NULL REFERENCES nodes (id),
>   b INTEGER NOT NULL REFERENCES nodes (id),
>   PRIMARY KEY (a, b)
> );
>
> WITH reachable (a, b) AS (
>   SELECT a, b FROM edges
>   UNION ALL
>   SELECT reachable.a AS a, edges.b AS b
>   FROM reachable
>   INNER JOIN edges ON reachable.b = edges.a
> )
> SELECT nodes.id, reachable.b
> FROM nodes
> INNER JOIN reachable ON a = id
> WHERE id = 1;
>
> The query plan for the query looks like this:
>
> 2|0|0|SCAN TABLE edges
> 3|0|0|SCAN TABLE reachable
> 3|1|1|SEARCH TABLE edges USING COVERING INDEX sqlite_autoindex_edges_1
> (a=?)
> 1|0|0|COMPOUND SUBQUERIES 0 AND 0 USING TEMP B-TREE (UNION)
> 0|0|0|SEARCH TABLE nodes USING INTEGER PRIMARY KEY (rowid=?)
> 0|1|1|SCAN SUBQUERY 1
>
> The query first computes the reachable relation for all nodes and then
> filters out the reachable nodes of node 1 which is clearly inefficient.
>
> If I'm not mistaken the query could be optimized to:
>
> WITH reachable (a, b) AS (
>   SELECT a, b FROM edges WHERE a = 1
>   UNION ALL
>   SELECT reachable.a AS a, edges.b AS b
>   FROM reachable
>   INNER JOIN edges ON reachable.b = edges.a
> )
> SELECT nodes.id, reachable.b
> FROM nodes
> INNER JOIN reachable ON a = id
> WHERE id = 1;
>
> The query plan for the optimized query looks like this:
>
> 2|0|0|SEARCH TABLE edges USING COVERING INDEX sqlite_autoindex_edges_1
> (a=?)
> 3|0|0|SCAN TABLE reachable
> 3|1|1|SEARCH TABLE edges USING COVERING INDEX sqlite_autoindex_edges_1
> (a=?)
> 1|0|0|COMPOUND SUBQUERIES 0 AND 0 USING TEMP B-TREE (UNION)
> 0|0|0|SEARCH TABLE nodes USING INTEGER PRIMARY KEY (rowid=?)
> 0|1|1|SCAN SUBQUERY 1
>
> It looks like the optimization can be generalized because you can pull
> the selection into the initial select query of the CTE. Is there a
> chance that SQLite could perform this optimization in one of the next
> releases?
>
> I could tolerate with repeating the parameter but I want to create a
> view out of it, so it's not really an option. Are there any alternatives
> if the optimization will not be automated?
>
> Moreover, "COMPOUND SUBQUERIES 0 AND 0 USING TEMP B-TREE (UNION)" looks
> wrong, shouldn't it read "COMPOUND SUBQUERIES 2 AND 3 USING TEMP B-TREE
> (UNION)"?
>
> - Matthias-Christian
> _______________________________________________
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

Reply via email to