Hello all!

As you've probably noticed, the update lag on the public WDQS endpoint [1]
is not doing well [2], with lag climbing to > 12h for some servers. We are
tracking this on phabricator [3], subscribe to that task if you want to
stay informed.

To be perfectly honest, we don't have a good short term solution. The graph
database that we are using at the moment (Blazegraph [4]) does not easily
support sharding, so even throwing hardware at the problem isn't really an
option.

We are working on a few medium term improvements:

* A dedicated updater service in Blazegraph, which should help increase the
update throughput [5]. Finger crossed, this should be ready for initial
deployment and testing by next week (no promise, we're doing the best we
can).
* Some improvement in the parallelism of the updater [6]. This has just
been identified. While it will probably also provide some improvement in
throughput, we haven't actually started working on that and we don't have
any numbers at this point.

Longer term:

We are hiring a new team member to work on WDQS. It will take some time to
get this person up to speed, but we should have more capacity to address
the deeper issues of WDQS by January.

The 2 main points we want to address are:

* Finding a triple store that scales better than our current solution.
* Better understand what are the use cases on WDQS and see if we can
provide a technical solution that is better suited. Our intuition is that
some of the use cases that require synchronous (or quasi synchronous)
updates would be better implemented outside of a triple store. Honestly, we
have no idea yet if this makes sense and what those alternate solutions
might be.

Thanks a lot for your patience during this tough time!

   Guillaume


[1] https://query.wikidata.org/
[2]
https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&from=1571131796906&to=1573723796906&var-cluster_name=wdqs&panelId=8&fullscreen
[3] https://phabricator.wikimedia.org/T238229
[4] https://blazegraph.com/
[5] https://phabricator.wikimedia.org/T212826
[6] https://phabricator.wikimedia.org/T238045

-- 
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+1 / CET
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to