Re: [sqlite] How to optimise a somewhat-recursive query?

Jonathan Moules Wed, 21 Mar 2018 10:40:20 -0700

So, I'm back to being stuck on this.

I have inserted 500,000 random urls (no extra lookups - still just1000), and now the query (as per the below reply) is back to beingsomewhat slow (I'm using sqlite 3.15) at about 0.6s per request.

After a couple of hours of investigation, it's only slow when there iseither no ORDER BY, or if I use DESC (which is what I need). If I useORDER BY u.url ASC - it's near instantaneous.

I've tried every possible combination of indexes I can think up,including of course with url_id DESC. I've also removed the ORDER BY inthe view (and set it to DESC as well), but that made absolutely nodifference.

I'm a little confused as to why I'm seeing this behaviour - my limitedunderstanding of the query-planner and readinghttps://sqlite.org/queryplanner.html - suggests that at least when usingindexes, it'll simply scan an index backwards if that looks like it'llhelp. I appreciate this is a result-set not an index, but in this casecould not the query planner realise that reading through the resultsbackwards would be faster than whatever it's doing?

And for that matter, shouldn't no ORDER BY be at least the same speed asORDER BY u.url_id ASC?


Thoughts welcome; Thanks!
Jonathan


On 2018-03-19 00:24, Jonathan Moules wrote:

Thanks Simon and Quan.
I'm not sure it's the view itself per-se - It takes 0.000s (time toosmall to measure) for just the full View to be run on this dataset.
It turns out the problem is simpler than that and no data changes areneeded. I did consider Quan Yong Zhai's option and gave it a try, butcouldn't fathom out the necessary SQL to get what I wanted (it'sgetting late).
Instead prompted by the replies here, I've changed the query veryslightly to the below which solves the problem:
SELECT
        u.url, l.error_code
    FROM
        urls u
    LEFT JOIN
        lookups l
        USING(url_id)
    LEFT JOIN
        (select * from v_most_recent_lookup_per_url where url_id in (
            select url_id from urls where url = 'example.com'
            )) recent
    -- By definition url's can) recent
        -- This is the important bit
-- Here we use the most recent lookup url_id to link to thesource_seed_id, so we only find its children
        -- Or alternatively itself
        ON u.source_seed_id = recent.url_id
            OR u.url_id = recent.url_id
    WHERE
-- For JSON-spider at least, Generic's are guaranteed to begeneric pages.
        l.is_generic_flag = 1
        AND
-- Must be "or equal to" so we can get the lookup of the verybase url.
        l.retrieval_datetime >= recent.retrieval_datetime
        AND
        DATETIME(recent.retrieval_datetime) > DATETIME('now', '-5 days')
    ORDER BY
        u.url_id DESC
    LIMIT 1;
To save readers having to compare manually, the difference is this: Iturned the "recent" alias item from the View into a subquery (stillusing the view), and then moved the "where url = example.com" part into there.The query is now literally two orders of magnitude faster, from 0.2sto 0.004s. No new indexes or anything, just that change.
Hopefully this will scale to full datasets; if it doesn't I may haveto consider the other suggestions, but for now this is aminimum-effort solution.
I'm not actually sure what SQLite was doing in the previous query tomake it take so long. , so I imagine there was some hideous recursinggoing on or something.
Scope for optimisation?

Thanks again,
Jonathan



_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] How to optimise a somewhat-recursive query?

Reply via email to