On Wed, 21 Mar 2018 17:39:45 +0000 Jonathan Moules <jonathan-li...@lightpear.com> escribió:
> So, I'm back to being stuck on this. > I have inserted 500,000 random urls (no extra lookups - still just > 1000), and now the query (as per the below reply) is back to being > somewhat slow (I'm using sqlite 3.15) at about 0.6s per request. After the huge insert, did you run ANALYZE? > After a couple of hours of investigation, it's only slow when there is > either no ORDER BY, or if I use DESC (which is what I need). If I use > ORDER BY u.url ASC - it's near instantaneous. > > I've tried every possible combination of indexes I can think up, > including of course with url_id DESC. I've also removed the ORDER BY in > the view (and set it to DESC as well), but that made absolutely no > difference. Have your indexes in the last row the primary key or rowid? > I'm a little confused as to why I'm seeing this behaviour - my limited > understanding of the query-planner and reading > https://sqlite.org/queryplanner.html - suggests that at least when using > indexes, it'll simply scan an index backwards if that looks like it'll > help. I appreciate this is a result-set not an index, but in this case > could not the query planner realise that reading through the results > backwards would be faster than whatever it's doing? > > And for that matter, shouldn't no ORDER BY be at least the same speed as > ORDER BY u.url_id ASC? What does an integrity check returns? Try a Reindex and Analyze, all cases should be similar fast (or slow). > Thoughts welcome; Thanks! > Jonathan > > > On 2018-03-19 00:24, Jonathan Moules wrote: > > Thanks Simon and Quan. > > I'm not sure it's the view itself per-se - It takes 0.000s (time too > > small to measure) for just the full View to be run on this dataset. > > > > It turns out the problem is simpler than that and no data changes are > > needed. I did consider Quan Yong Zhai's option and gave it a try, but > > couldn't fathom out the necessary SQL to get what I wanted (it's > > getting late). > > > > Instead prompted by the replies here, I've changed the query very > > slightly to the below which solves the problem: > > > > SELECT > > u.url, l.error_code > > FROM > > urls u > > LEFT JOIN > > lookups l > > USING(url_id) > > LEFT JOIN > > (select * from v_most_recent_lookup_per_url where url_id in ( > > select url_id from urls where url = 'example.com' > > )) recent > > -- By definition url's can) recent > > -- This is the important bit > > -- Here we use the most recent lookup url_id to link to the > > source_seed_id, so we only find its children > > -- Or alternatively itself > > ON u.source_seed_id = recent.url_id > > OR u.url_id = recent.url_id > > WHERE > > -- For JSON-spider at least, Generic's are guaranteed to be > > generic pages. > > l.is_generic_flag = 1 > > AND > > -- Must be "or equal to" so we can get the lookup of the very > > base url. > > l.retrieval_datetime >= recent.retrieval_datetime > > AND > > DATETIME(recent.retrieval_datetime) > DATETIME('now', '-5 days') > > ORDER BY > > u.url_id DESC > > LIMIT 1; > > > > > > To save readers having to compare manually, the difference is this: I > > turned the "recent" alias item from the View into a subquery (still > > using the view), and then moved the "where url = example.com" part in > > to there. > > The query is now literally two orders of magnitude faster, from 0.2s > > to 0.004s. No new indexes or anything, just that change. > > > > Hopefully this will scale to full datasets; if it doesn't I may have > > to consider the other suggestions, but for now this is a > > minimum-effort solution. > > > > I'm not actually sure what SQLite was doing in the previous query to > > make it take so long. , so I imagine there was some hideous recursing > > going on or something. > > > > Scope for optimisation? > > > > Thanks again, > > Jonathan > > > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users -- Eduardo <eduardo.mor...@mobelservices.com> _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users