[sqlite] Limiting the result set size at each recursive step in a CTE

Thomas Zimmermann Tue, 07 May 2019 22:24:48 -0700

Hi!

Sometimes it is desirable to limit the size of the queue¹ in a recursiveCTE when thedomain allows for it. That puts an upper bound on processing time, evenwhen the given table is huge.This can be achieved by adding a LIMIT clause at every step (setup andrecursive).But there is my problem: The setup step allows subqueries, but therecursive step doesn't.

Is there a general solution available?


As a concrete example, consider the following table for a commenting system:

CREATE TABLE comment (
    comment_id INTEGER PRIMARY KEY,
    parent_comment_id INTEGER REFERENCES comment (comment_id),
    created_at INTEGER NOT NULL -- timestamp, bigger means newer
);

CREATE INDEX comment_hierarchy ON comment (parent_comment_id, created_atDESC);

The comments should be arranged by hierarchy and creation date, newestfirst.

Only up to 100 comments should be shown. Example:

Comment 2 (2019-05-03)
- Comment 3 (2019-05-04)
- Comment 4 (2019-05-05)
-- Comment 5 (2019-05-05)
Comment 1 (2019-05-02)

The following query should accomplish this, but the optimization isn'tallowed (see comments inline):


WITH RECURSIVE
    sorted_comment(comment_id, parent_comment_id, created_at, depth) AS (
        -- Start with all comments at the root level.

-- Optimization: Only consider the latest 100 comments, sincethat's the maximum amount we could pick from this level. -- This guarantees good performance even when there aremillions of comments at the root level.

        SELECT *
        FROM (
            SELECT comment_id, parent_comment_id, created_at, 0 AS depth
            FROM comment
            WHERE parent_comment_id IS NULL
            ORDER BY created_at DESC
            LIMIT 100
        )

        UNION ALL

-- Find all direct descendants of the current comment from thequeue. -- Same optimization: There might be many comments at thislevel, but we could oly ever consider up to the latest 100. -- (Actually, we only need to consider (100 -COUNT(sorted_comment)), but let's ignore that for now.) -- NOTE: This doesn't actually work, Error: recursive referencein a subquery: sorted_comment

        SELECT *
        FROM (

SELECT c.comment_id, c.parent_comment_id, c.created_at,sc.depth + 1 AS depth

            FROM comment AS c
            JOIN sorted_comment AS sc
            ON c.parent_comment_id = sc.comment_id
            ORDER BY created_at DESC
            LIMIT 100
        )

-- Do a depth-first search, picking the latest comment from thequeue.

        ORDER BY depth DESC, created_at DESC
        LIMIT 100
    )
SELECT comment_id, parent_comment_id, created_at, depth
FROM sorted_comment;

I would be very interested in a general solution that still allows forthe adjacency list design,

but I'm open to denormalization. :)

Kind regards,
Thomas

¹ As defined in the documentation here: https://sqlite.org/lang_with.html
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Limiting the result set size at each recursive step in a CTE

Reply via email to