gerritbot added a comment.
Change 385400 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Lower hotTTR in matchesRegularExpression() to raise hit rate
https://gerrit.wikimedia.org/r/385400TASK DETAILhttps://phabricator.wikimedia.org/T173696EMAIL
gerritbot added a comment.
Change 385400 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/extensions/WikibaseQualityConstraints@master] Lower hotTTR in matchesRegularExpression() to raise hit rate
https://gerrit.wikimedia.org/r/385400TASK
aaron added a comment.
Probably hotTTR is way to high. It's really "expected time till refresh given 1 hit/sec". With 50/min, you'd get maybe 2 updates (new values) per regex. I'll put up a patch for that.TASK DETAILhttps://phabricator.wikimedia.org/T173696EMAIL
Lucas_Werkmeister_WMDE added a comment.
The time window is the one in the Grafana permalink. The two spikes are when I just fetched that URL with curl in a for i in {1..100} shell loop (synchronously), so it looks like those spikes are 100 requests over 2-3 minutes.TASK
aaron added a comment.
In T173696#3696294, @Lucas_Werkmeister_WMDE wrote:
I did a bunch of requests against https://www.wikidata.org/w/api.php?action="">, which checks a format constraint for “title”. It’s always the same regex and only a handful of different values (17). But while I could see a
Lucas_Werkmeister_WMDE added a comment.
I did a bunch of requests against https://www.wikidata.org/w/api.php?action="">, which checks a format constraint for “title”. It’s always the same regex and only a handful of different values (17). But while I could see a sharp rise in requests in Grafana
aaron added a comment.
In T173696#3690945, @Lucas_Werkmeister_WMDE wrote:
Reopening. This task is supposed to be for caching results in general, which isn’t done yet at all, though we had a lot of discussion on caching regex checks specifically here, which in hindsight should’ve been in a
gerritbot added a comment.
Change 379222 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Use per-regex cache map to cache regex check results
https://gerrit.wikimedia.org/r/379222TASK DETAILhttps://phabricator.wikimedia.org/T173696EMAIL
gerritbot added a comment.
Change 373918 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Cache regex check results
https://gerrit.wikimedia.org/r/373918TASK DETAILhttps://phabricator.wikimedia.org/T173696EMAIL
gerritbot added a comment.
Change 379222 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Aaron Schulz):
[mediawiki/extensions/WikibaseQualityConstraints@master] Use per-regex cache map to cache regex check results
https://gerrit.wikimedia.org/r/379222TASK
daniel added a comment.
@Krinkle wrote a while back:
On the other hand, considering its for internal execution of a single regex, ~300ms is a lot. I wonder if something like a "simple" PHP or Python subprocess would work (something that runs preg_match or re.match, using tight firejail with
Lucas_Werkmeister_WMDE added a comment.
One thing that could also be tweaked in that code is to track the last timestamp an entry was touched and prune ones older than a certain number of days. For busy regexes, that stops long-tail cruft from accumulating and reduce memcached I/O in bytes.
If I
aaron added a comment.
In T173696#3620700, @Lucas_Werkmeister_WMDE wrote:
Interesting idea! It feels a bit weird to implement logic like this on top of the cache (I thought that’s the cache’s job?), but you’re the expert :) it sounds like it makes a lot of sense, at least, since the set of
Lucas_Werkmeister_WMDE added a comment.
Interesting idea! It feels a bit weird to implement logic like this on top of the cache (I thought that’s the cache’s job?), but you’re the expert :) it sounds like it makes a lot of sense, at least, since the set of regexes is mostly static and the set of
aaron added a comment.
If want to avoid flooding cache with rarely used long-tail combinations, maybe something like this could be done:
$textHash = hash( 'sha256', $text );
$cacheMap = $this->cache->getWithSetCallback(
$this->cache->makeKey(
'WikibaseQualityConstraints', //
Jonas added a comment.
@Krinkle thanks for your input!
Maybe we should reopen T102752: [RFC] Workaround for checking the format constraintTASK DETAILhttps://phabricator.wikimedia.org/T173696EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JonasCc: Krinkle,
Krinkle added a comment.
In T173696#3569491, @Ladsgroup wrote:
Pinging @aaron and @Krinkle as they are the experts on this.
I left a few minor points at https://gerrit.wikimedia.org/r/373918.
I don't know much about the use of regular expressions in the particular context of Sparql and
Lucas_Werkmeister_WMDE added a comment.
I’ve started with the simplest thing we can cache: regex check results, which are valid forever.TASK DETAILhttps://phabricator.wikimedia.org/T173696EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To:
gerritbot added a comment.
Change 373918 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Cache regex check results
https://gerrit.wikimedia.org/r/373918TASK
daniel added a comment.
@Lucas_Werkmeister_WMDE good point!TASK DETAILhttps://phabricator.wikimedia.org/T173696EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Ladsgroup, Anomie, daniel, Aklapper, Jonas, Lucas_Werkmeister_WMDE, GoranSMilovanovic,
Lucas_Werkmeister_WMDE added a comment.
Perhaps we want to prioritize caching compliant constraint check results over violations? I would think those are more likely to stay in place.TASK DETAILhttps://phabricator.wikimedia.org/T173696EMAIL
Jonas added a comment.
Do you have some links to the docs perhaps?TASK DETAILhttps://phabricator.wikimedia.org/T173696EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JonasCc: daniel, Aklapper, Jonas, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden,
daniel added a comment.
Some thoughts/options for for caching:
web caches are easy to set up, but hard or impossible to purge
the result of a SPARQL based regexp evaluation could be cached forever in the WDQS web cache
the result of a SPARQL based instanceof check can be cached "for a while" in
23 matches
Mail list logo