I think the difficulty here is that the optimizer is oriented toward the "low level" and mainly concerned with choosing indexes, processing order etc (see https://www.sqlite.org/optoverview.html ) and has sort of a narrow view of the task.
Approaching it from the other end and breaking down the SQL statement into a set of constraints over a set of relations linked by those constraints you can use ideas from constraint programming, such as your optimisation which is constraint propagation: adding redundant constraints that are logically implied by the existing constraints; the redundant constraints don't change the meaning of the query but serve to "prune" the query space thus reducing the work. This sort of approach could be implemented as a SQL to SQL transformation and would complement the existing query planner. Maybe there's a tool out there already which processes standard SQL in this way? On Wed, May 27, 2015 at 3:31 AM, Matthias-Christian Ott <ott at mirix.org> wrote: > I have the following (simplified and generalized) schema and query: > > CREATE TABLE nodes ( > id INTEGER NOT NULL PRIMARY KEY > ); > > CREATE TABLE edges ( > a INTEGER NOT NULL REFERENCES nodes (id), > b INTEGER NOT NULL REFERENCES nodes (id), > PRIMARY KEY (a, b) > ); > > WITH reachable (a, b) AS ( > SELECT a, b FROM edges > UNION ALL > SELECT reachable.a AS a, edges.b AS b > FROM reachable > INNER JOIN edges ON reachable.b = edges.a > ) > SELECT nodes.id, reachable.b > FROM nodes > INNER JOIN reachable ON a = id > WHERE id = 1; > > The query plan for the query looks like this: > > 2|0|0|SCAN TABLE edges > 3|0|0|SCAN TABLE reachable > 3|1|1|SEARCH TABLE edges USING COVERING INDEX sqlite_autoindex_edges_1 > (a=?) > 1|0|0|COMPOUND SUBQUERIES 0 AND 0 USING TEMP B-TREE (UNION) > 0|0|0|SEARCH TABLE nodes USING INTEGER PRIMARY KEY (rowid=?) > 0|1|1|SCAN SUBQUERY 1 > > The query first computes the reachable relation for all nodes and then > filters out the reachable nodes of node 1 which is clearly inefficient. > > If I'm not mistaken the query could be optimized to: > > WITH reachable (a, b) AS ( > SELECT a, b FROM edges WHERE a = 1 > UNION ALL > SELECT reachable.a AS a, edges.b AS b > FROM reachable > INNER JOIN edges ON reachable.b = edges.a > ) > SELECT nodes.id, reachable.b > FROM nodes > INNER JOIN reachable ON a = id > WHERE id = 1; > > The query plan for the optimized query looks like this: > > 2|0|0|SEARCH TABLE edges USING COVERING INDEX sqlite_autoindex_edges_1 > (a=?) > 3|0|0|SCAN TABLE reachable > 3|1|1|SEARCH TABLE edges USING COVERING INDEX sqlite_autoindex_edges_1 > (a=?) > 1|0|0|COMPOUND SUBQUERIES 0 AND 0 USING TEMP B-TREE (UNION) > 0|0|0|SEARCH TABLE nodes USING INTEGER PRIMARY KEY (rowid=?) > 0|1|1|SCAN SUBQUERY 1 > > It looks like the optimization can be generalized because you can pull > the selection into the initial select query of the CTE. Is there a > chance that SQLite could perform this optimization in one of the next > releases? > > I could tolerate with repeating the parameter but I want to create a > view out of it, so it's not really an option. Are there any alternatives > if the optimization will not be automated? > > Moreover, "COMPOUND SUBQUERIES 0 AND 0 USING TEMP B-TREE (UNION)" looks > wrong, shouldn't it read "COMPOUND SUBQUERIES 2 AND 3 USING TEMP B-TREE > (UNION)"? > > - Matthias-Christian > _______________________________________________ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >