Addshore added a comment.

  >>   In the case that lead to this ticket, it was a remote client at Orange 
issuing a very high rate of these uncacheable queries
  
  It's not just Orange it would seem..
  I took a quick look at the webrequest data for the WDQS updater UA and there 
are other locations also running the updater probably to keep their own copy of 
the query service up to date.
  Looking at March 1st 2019 58% on the uncachable requests to 
Special:EntityData for wikidata came from internal wdqs systems, the other 42% 
came from what seems to be another 9 or so external copies of the wdqs that are 
kept up to date with the live data using the updater.
  
  On 1st march there were 21,461,124 cache misses for the WDQS updater UA.
  To put that in perspective the total number of cache misses for wikidata on 
that day was 23,857,783 (passes were 54 million)
  
  Comparing this with the total number of of edits on wikidata in that day ~1.1 
million, I see quite some uncached requests that could be cached.
  I understand that there are some issues with the wdqs updating for every 
single revision due to the performance of writing to SPARQL however.
  Taking a look at our internal hosts on the 1st march they seemed to make 
between 926,848 and 826,878 requests to Special:Entity data, so they avoided 
200k - 300k requests to get data and thus also probably sparql writes each.
  
  Thinking purely from a varnish hit rate perspective it would make sense to 
remove the "random cache busting" (i guess not actually random as it is ts 
based nocache=<timestamp>)  from the request and switch to asking for the 
specific revision id that is required.
  This would likely go from 21 million misses per day to just 1 million? (the 
initial 1 million requests to populate the cache for each revision being 
requested?)
  After talking with Stas this apparently makes updating within the updater 
harder etc as it might result in more writes to sparql? (I'd let stas talk more 
on that topic).
  
  Adding nocache=X doesn't actually mean the request will not be cached, it is 
still cached, just unlikely to be called by other users (probably wasting quite 
some varnish space?)
  It looks like we probably get ~10k cache hits even with the cache buster from 
the WDQS UA, maybe if the servers happen to be requesting the same entities 
during the same second.
  If we don't want to explicitly ask for a revision from the page, can we not 
use the latest revision id we know exists for the entity that we have, or some 
hash of it? to make for a nicer cache buster that could actually be shared 
among updaters both internal and external? The updating pattern within the wdqs 
itself could stay the same then? I guess this depends on if the wdqs updater 
knows what the latest revid is for the entity it is getting updates for?
  
  Another thing to consider here is in theory even when using the cache buster 
method the data the wdqs updater currently gets when passing nocache=ts may not 
be up to date due to maxlag, not sure if that has been considered in the 
updater process at all?
  It's not often that the maxlag but in the last months it has occasionally 
gone up to 5s or 20s (not sure if the wdqs updater normally requests data for 
an entity that quickly after an edit has been made? but if it does it could be 
getting out of dat data even with the cache busting. But perhaps the 
Last-Modified header is checked in the wdqs updater? if not, maybe it should 
be? (grepping through the code I couldn't find it)
  
  On the Wikibase side of things, this is a relatively cheap request to make as 
the revision look up is done from the big shared cache of wikidata entity 
revisions and the flavour=dump so wikibase itself in most cases will not make 
any expensive sql queries etc (but anything mediawiki does on start up will 
still happen).

TASK DETAIL
  https://phabricator.wikimedia.org/T217897

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Addshore, Smalyshev, BBlack, Aklapper, Gehel, alaa_wmde, Legado_Shulgin, 
Nandana, thifranc, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, 
merbst, LawExplorer, Zppix, _jensen, rosalieper, Jonas, Xmlizer, Wong128hk, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, 
Mbch331, Jay8g, fgiunchedi
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to