On Wed, 21 Mar 2018 17:39:45 +0000
Jonathan Moules <jonathan-li...@lightpear.com> escribió:

> So, I'm back to being stuck on this.
> I have inserted 500,000 random urls (no extra lookups - still just 
> 1000), and now the query (as per the below reply) is back to being 
> somewhat slow (I'm using sqlite 3.15) at about 0.6s per request.

After the huge insert, did you run ANALYZE?
 
> After a couple of hours of investigation, it's only slow when there is 
> either no ORDER BY, or if I use DESC (which is what I need). If I use 
> ORDER BY u.url ASC - it's near instantaneous.
> 
> I've tried every possible combination of indexes I can think up, 
> including of course with url_id DESC. I've also removed the ORDER BY in 
> the view (and set it to DESC as well), but that made absolutely no 
> difference.

Have your indexes in the last row the primary key or rowid?
 
> I'm a little confused as to why I'm seeing this behaviour - my limited 
> understanding of the query-planner and reading 
> https://sqlite.org/queryplanner.html - suggests that at least when using 
> indexes, it'll simply scan an index backwards if that looks like it'll 
> help. I appreciate this is a result-set not an index, but in this case 
> could not the query planner realise that reading through the results 
> backwards would be faster than whatever it's doing?
> 
> And for that matter, shouldn't no ORDER BY be at least the same speed as 
> ORDER BY u.url_id ASC?

What does an integrity check returns?

Try a Reindex and Analyze, all cases should be similar fast (or slow).

> Thoughts welcome; Thanks!
> Jonathan
> 
> 
> On 2018-03-19 00:24, Jonathan Moules wrote:
> > Thanks Simon and Quan.
> > I'm not sure it's the view itself per-se - It takes 0.000s (time too 
> > small to measure) for just the full View to be run on this dataset.
> >
> > It turns out the problem is simpler than that and no data changes are 
> > needed. I did consider Quan Yong Zhai's option and gave it a try, but 
> > couldn't fathom out the necessary SQL to get what I wanted (it's 
> > getting late).
> >
> > Instead prompted by the replies here, I've changed the query very 
> > slightly to the below which solves the problem:
> >
> > SELECT
> >         u.url, l.error_code
> >     FROM
> >         urls u
> >     LEFT JOIN
> >         lookups l
> >         USING(url_id)
> >     LEFT JOIN
> >         (select * from v_most_recent_lookup_per_url where url_id in (
> >             select url_id from urls where url = 'example.com'
> >             )) recent
> >     -- By definition url's can) recent
> >         -- This is the important bit
> >         -- Here we use the most recent lookup url_id to link to the 
> > source_seed_id, so we only find its children
> >         -- Or alternatively itself
> >         ON u.source_seed_id = recent.url_id
> >             OR u.url_id = recent.url_id
> >     WHERE
> >         -- For JSON-spider at least, Generic's are guaranteed to be 
> > generic pages.
> >         l.is_generic_flag = 1
> >         AND
> >         -- Must be "or equal to" so we can get the lookup of the very 
> > base url.
> >         l.retrieval_datetime >= recent.retrieval_datetime
> >         AND
> >         DATETIME(recent.retrieval_datetime) > DATETIME('now', '-5 days')
> >     ORDER BY
> >         u.url_id DESC
> >     LIMIT 1;
> >
> >
> > To save readers having to compare manually, the difference is this: I 
> > turned the "recent" alias item from the View into a subquery (still 
> > using the view), and then moved the "where url = example.com" part in 
> > to there.
> > The query is now literally two orders of magnitude faster, from 0.2s 
> > to 0.004s. No new indexes or anything, just that change.
> >
> > Hopefully this will scale to full datasets; if it doesn't I may have 
> > to consider the other suggestions, but for now this is a 
> > minimum-effort solution.
> >
> > I'm not actually sure what SQLite was doing in the previous query to 
> > make it take so long. , so I imagine there was some hideous recursing 
> > going on or something.
> >
> > Scope for optimisation?
> >
> > Thanks again,
> > Jonathan
> 
> 
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


-- 
Eduardo <eduardo.mor...@mobelservices.com>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to