Re: [sqlite] How to optimise a somewhat-recursive query?

Jonathan Moules Sun, 18 Mar 2018 17:24:33 -0700

Thanks Simon and Quan.

I'm not sure it's the view itself per-se - It takes 0.000s (time toosmall to measure) for just the full View to be run on this dataset.

It turns out the problem is simpler than that and no data changes areneeded. I did consider Quan Yong Zhai's option and gave it a try, butcouldn't fathom out the necessary SQL to get what I wanted (it's gettinglate).

Instead prompted by the replies here, I've changed the query veryslightly to the below which solves the problem:


SELECT
        u.url, l.error_code
    FROM
        urls u
    LEFT JOIN
        lookups l
        USING(url_id)
    LEFT JOIN
        (select * from v_most_recent_lookup_per_url where url_id in (
            select url_id from urls where url = 'example.com'
            )) recent
    -- By definition url's can) recent
        -- This is the important bit

-- Here we use the most recent lookup url_id to link to thesource_seed_id, so we only find its children

        -- Or alternatively itself
        ON u.source_seed_id = recent.url_id
            OR u.url_id = recent.url_id
    WHERE

-- For JSON-spider at least, Generic's are guaranteed to begeneric pages.

        l.is_generic_flag = 1
        AND

-- Must be "or equal to" so we can get the lookup of the verybase url.

        l.retrieval_datetime >= recent.retrieval_datetime
        AND
        DATETIME(recent.retrieval_datetime) > DATETIME('now', '-5 days')
    ORDER BY
        u.url_id DESC
    LIMIT 1;

To save readers having to compare manually, the difference is this: Iturned the "recent" alias item from the View into a subquery (stillusing the view), and then moved the "where url = example.com" part in tothere.The query is now literally two orders of magnitude faster, from 0.2s to0.004s. No new indexes or anything, just that change.

Hopefully this will scale to full datasets; if it doesn't I may have toconsider the other suggestions, but for now this is a minimum-effortsolution.

I'm not actually sure what SQLite was doing in the previous query tomake it take so long. , so I imagine there was some hideous recursinggoing on or something.


Scope for optimisation?

Thanks again,
Jonathan

On 2018-03-18 23:37, Simon Slavin wrote:

On 18 Mar 2018, at 11:13pm, Jonathan Moules <[email protected]> 
wrote:

Given there's such a small amount of data at this point, I suspect the issue is 
more related to the recursion. I've tried creating these two indexes to 
facilicate that

Nice idea but I can see why it's not working.

You have an underlying problem: the format you're using to store your data makes it 
extremely difficult to extract the figures you want.  Quan Yong Zhai has the best idea 
I've seen: get rid of almost ¾ of the work you're doing by storing the last retrieval 
date in your "urls" table.

As an alternative, store the start your retrieval process by JOINing the two 
tables together.  Consider the result of this query

SELECT url_id, lookups.error_code
     FROM urls
     JOIN lookups ON lookups.url_id = urls.url_id AND lookup.retrieval_datetime 
= urls.retrieval_datetime
     WHERE DATETIME(urls.retrieval_datetime) > DATETIME('now', '-5 days')

and figure out what you would add to that to get your desired result.

Simon.
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users




_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] How to optimise a somewhat-recursive query?

Reply via email to