Re: [Wikidata] Wikidata Query Service update lag

fn Thu, 14 Nov 2019 02:11:56 -0800

Besides waiting for the new updater, it may be useful to tell us, whatwe as users can do too. It is unclear to me what the problem is. Forinstance, at one point I was worried that the many parallel requests tothe SPARQL endpoint that we make in Scholia is a problem. As far as Iunderstand it is not a problem at all. Another issue could be the waythat we use Magnus Manske's Quickstatements and approve bots for highfrequency editing. Perhaps a better overview and constraints onlarge-scale editing could be discussed?

Yet another thought is the large discrepancy between Virginia and Texasdata centers as I could see on Grafana [1]. As far as I understand thehardware (and software) are the same. So why is there this largedifference? Rather than editing or BlazeGraph, could the issue be someform of network issue?

[1]https://grafana.wikimedia.org/d/000000489/wikidata-query-service?panelId=8&fullscreen&orgId=1&from=now-7d&to=now


/Finn



On 14/11/2019 10:50, Guillaume Lederrey wrote:

Hello all!
As you've probably noticed, the update lag on the public WDQS endpoint[1] is not doing well [2], with lag climbing to > 12h for some servers.We are tracking this on phabricator [3], subscribe to that task if youwant to stay informed.
To be perfectly honest, we don't have a good short term solution. Thegraph database that we are using at the moment (Blazegraph [4]) does noteasily support sharding, so even throwing hardware at the problem isn'treally an option.
We are working on a few medium term improvements:
* A dedicated updater service in Blazegraph, which should help increasethe update throughput [5]. Finger crossed, this should be ready forinitial deployment and testing by next week (no promise, we're doing thebest we can).* Some improvement in the parallelism of the updater [6]. This has justbeen identified. While it will probably also provide some improvement inthroughput, we haven't actually started working on that and we don'thave any numbers at this point.
Longer term:
We are hiring a new team member to work on WDQS. It will take some timeto get this person up to speed, but we should have more capacity toaddress the deeper issues of WDQS by January.
The 2 main points we want to address are:

* Finding a triple store that scales better than our current solution.
* Better understand what are the use cases on WDQS and see if we canprovide a technical solution that is better suited. Our intuition isthat some of the use cases that require synchronous (or quasisynchronous) updates would be better implemented outside of a triplestore. Honestly, we have no idea yet if this makes sense and what thosealternate solutions might be.
Thanks a lot for your patience during this tough time!

    Guillaume


[1] https://query.wikidata.org/
[2]https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&from=1571131796906&to=1573723796906&var-cluster_name=wdqs&panelId=8&fullscreen
[3] https://phabricator.wikimedia.org/T238229
[4] https://blazegraph.com/
[5] https://phabricator.wikimedia.org/T212826
[6] https://phabricator.wikimedia.org/T238045

--
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+1 / CET

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikidata Query Service update lag

Reply via email to