Hi!
Sometimes it is desirable to limit the size of the queue¹ in a recursive
CTE when the
domain allows for it. That puts an upper bound on processing time, even
when the given table is huge.
This can be achieved by adding a LIMIT clause at every step (setup and
recursive).
But there is my problem: The setup step allows subqueries, but the
recursive step doesn't.
Is there a general solution available?
As a concrete example, consider the following table for a commenting system:
CREATE TABLE comment (
comment_id INTEGER PRIMARY KEY,
parent_comment_id INTEGER REFERENCES comment (comment_id),
created_at INTEGER NOT NULL -- timestamp, bigger means newer
);
CREATE INDEX comment_hierarchy ON comment (parent_comment_id, created_at
DESC);
The comments should be arranged by hierarchy and creation date, newest
first.
Only up to 100 comments should be shown. Example:
Comment 2 (2019-05-03)
- Comment 3 (2019-05-04)
- Comment 4 (2019-05-05)
-- Comment 5 (2019-05-05)
Comment 1 (2019-05-02)
The following query should accomplish this, but the optimization isn't
allowed (see comments inline):
WITH RECURSIVE
sorted_comment(comment_id, parent_comment_id, created_at, depth) AS (
-- Start with all comments at the root level.
-- Optimization: Only consider the latest 100 comments, since
that's the maximum amount we could pick from this level.
-- This guarantees good performance even when there are
millions of comments at the root level.
SELECT *
FROM (
SELECT comment_id, parent_comment_id, created_at, 0 AS depth
FROM comment
WHERE parent_comment_id IS NULL
ORDER BY created_at DESC
LIMIT 100
)
UNION ALL
-- Find all direct descendants of the current comment from the
queue.
-- Same optimization: There might be many comments at this
level, but we could oly ever consider up to the latest 100.
-- (Actually, we only need to consider (100 -
COUNT(sorted_comment)), but let's ignore that for now.)
-- NOTE: This doesn't actually work, Error: recursive reference
in a subquery: sorted_comment
SELECT *
FROM (
SELECT c.comment_id, c.parent_comment_id, c.created_at,
sc.depth + 1 AS depth
FROM comment AS c
JOIN sorted_comment AS sc
ON c.parent_comment_id = sc.comment_id
ORDER BY created_at DESC
LIMIT 100
)
-- Do a depth-first search, picking the latest comment from the
queue.
ORDER BY depth DESC, created_at DESC
LIMIT 100
)
SELECT comment_id, parent_comment_id, created_at, depth
FROM sorted_comment;
I would be very interested in a general solution that still allows for
the adjacency list design,
but I'm open to denormalization. :)
Kind regards,
Thomas
¹ As defined in the documentation here: https://sqlite.org/lang_with.html
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users