Addshore added a comment.
>> In the case that lead to this ticket, it was a remote client at Orange issuing a very high rate of these uncacheable queries It's not just Orange it would seem.. I took a quick look at the webrequest data for the WDQS updater UA and there are other locations also running the updater probably to keep their own copy of the query service up to date. Looking at March 1st 2019 58% on the uncachable requests to Special:EntityData for wikidata came from internal wdqs systems, the other 42% came from what seems to be another 9 or so external copies of the wdqs that are kept up to date with the live data using the updater. On 1st march there were 21,461,124 cache misses for the WDQS updater UA. To put that in perspective the total number of cache misses for wikidata on that day was 23,857,783 (passes were 54 million) Comparing this with the total number of of edits on wikidata in that day ~1.1 million, I see quite some uncached requests that could be cached. I understand that there are some issues with the wdqs updating for every single revision due to the performance of writing to SPARQL however. Taking a look at our internal hosts on the 1st march they seemed to make between 926,848 and 826,878 requests to Special:Entity data, so they avoided 200k - 300k requests to get data and thus also probably sparql writes each. Thinking purely from a varnish hit rate perspective it would make sense to remove the "random cache busting" (i guess not actually random as it is ts based nocache=<timestamp>) from the request and switch to asking for the specific revision id that is required. This would likely go from 21 million misses per day to just 1 million? (the initial 1 million requests to populate the cache for each revision being requested?) After talking with Stas this apparently makes updating within the updater harder etc as it might result in more writes to sparql? (I'd let stas talk more on that topic). Adding nocache=X doesn't actually mean the request will not be cached, it is still cached, just unlikely to be called by other users (probably wasting quite some varnish space?) It looks like we probably get ~10k cache hits even with the cache buster from the WDQS UA, maybe if the servers happen to be requesting the same entities during the same second. If we don't want to explicitly ask for a revision from the page, can we not use the latest revision id we know exists for the entity that we have, or some hash of it? to make for a nicer cache buster that could actually be shared among updaters both internal and external? The updating pattern within the wdqs itself could stay the same then? I guess this depends on if the wdqs updater knows what the latest revid is for the entity it is getting updates for? Another thing to consider here is in theory even when using the cache buster method the data the wdqs updater currently gets when passing nocache=ts may not be up to date due to maxlag, not sure if that has been considered in the updater process at all? It's not often that the maxlag but in the last months it has occasionally gone up to 5s or 20s (not sure if the wdqs updater normally requests data for an entity that quickly after an edit has been made? but if it does it could be getting out of dat data even with the cache busting. But perhaps the Last-Modified header is checked in the wdqs updater? if not, maybe it should be? (grepping through the code I couldn't find it) On the Wikibase side of things, this is a relatively cheap request to make as the revision look up is done from the big shared cache of wikidata entity revisions and the flavour=dump so wikibase itself in most cases will not make any expensive sql queries etc (but anything mediawiki does on start up will still happen). TASK DETAIL https://phabricator.wikimedia.org/T217897 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Addshore Cc: Addshore, Smalyshev, BBlack, Aklapper, Gehel, alaa_wmde, Legado_Shulgin, Nandana, thifranc, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Zppix, _jensen, rosalieper, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs