I have the following (simplified and generalized) schema and query:

CREATE TABLE nodes (
  id INTEGER NOT NULL PRIMARY KEY
);

CREATE TABLE edges (
  a INTEGER NOT NULL REFERENCES nodes (id),
  b INTEGER NOT NULL REFERENCES nodes (id),
  PRIMARY KEY (a, b)
);

WITH reachable (a, b) AS (
  SELECT a, b FROM edges
  UNION ALL
  SELECT reachable.a AS a, edges.b AS b
  FROM reachable
  INNER JOIN edges ON reachable.b = edges.a
)
SELECT nodes.id, reachable.b
FROM nodes
INNER JOIN reachable ON a = id
WHERE id = 1;

The query plan for the query looks like this:

2|0|0|SCAN TABLE edges
3|0|0|SCAN TABLE reachable
3|1|1|SEARCH TABLE edges USING COVERING INDEX sqlite_autoindex_edges_1 (a=?)
1|0|0|COMPOUND SUBQUERIES 0 AND 0 USING TEMP B-TREE (UNION)
0|0|0|SEARCH TABLE nodes USING INTEGER PRIMARY KEY (rowid=?)
0|1|1|SCAN SUBQUERY 1

The query first computes the reachable relation for all nodes and then
filters out the reachable nodes of node 1 which is clearly inefficient.

If I'm not mistaken the query could be optimized to:

WITH reachable (a, b) AS (
  SELECT a, b FROM edges WHERE a = 1
  UNION ALL
  SELECT reachable.a AS a, edges.b AS b
  FROM reachable
  INNER JOIN edges ON reachable.b = edges.a
)
SELECT nodes.id, reachable.b
FROM nodes
INNER JOIN reachable ON a = id
WHERE id = 1;

The query plan for the optimized query looks like this:

2|0|0|SEARCH TABLE edges USING COVERING INDEX sqlite_autoindex_edges_1 (a=?)
3|0|0|SCAN TABLE reachable
3|1|1|SEARCH TABLE edges USING COVERING INDEX sqlite_autoindex_edges_1 (a=?)
1|0|0|COMPOUND SUBQUERIES 0 AND 0 USING TEMP B-TREE (UNION)
0|0|0|SEARCH TABLE nodes USING INTEGER PRIMARY KEY (rowid=?)
0|1|1|SCAN SUBQUERY 1

It looks like the optimization can be generalized because you can pull
the selection into the initial select query of the CTE. Is there a
chance that SQLite could perform this optimization in one of the next
releases?

I could tolerate with repeating the parameter but I want to create a
view out of it, so it's not really an option. Are there any alternatives
if the optimization will not be automated?

Moreover, "COMPOUND SUBQUERIES 0 AND 0 USING TEMP B-TREE (UNION)" looks
wrong, shouldn't it read "COMPOUND SUBQUERIES 2 AND 3 USING TEMP B-TREE
(UNION)"?

- Matthias-Christian

Reply via email to