Thanks, Guillaume - this is very helpful, and it would be great to have similar information posted/ collected on other kinds of limits and potential approaches to addressing them.
Some weeks ago, we started a project to keep track of tsuch limits, and I have added pointers to your information there: https://www.wikidata.org/wiki/Wikidata:WikiProject_Limits_of_Wikidata . If anyone is aware of similar discussions for any of the other limits, please edit that page to include pointers to those discussions. Thanks! Daniel On Thu, Jun 6, 2019 at 9:33 PM Guillaume Lederrey <gleder...@wikimedia.org> wrote: > > Hello all! > > There has been a number of concerns raised about the performance and > scaling of Wikdata Query Service. We share those concerns and we are > doing our best to address them. Here is some info about what is going > on: > > In an ideal world, WDQS should: > > * scale in terms of data size > * scale in terms of number of edits > * have low update latency > * expose a SPARQL endpoint for queries > * allow anyone to run any queries on the public WDQS endpoint > * provide great query performance > * provide a high level of availability > > Scaling graph databases is a "known hard problem", and we are reaching > a scale where there are no obvious easy solutions to address all the > above constraints. At this point, just "throwing hardware at the > problem" is not an option anymore. We need to go deeper into the > details and potentially make major changes to the current architecture. > Some scaling considerations are discussed in [1]. This is going to take > time. > > Reasonably, addressing all of the above constraints is unlikely to > ever happen. Some of the constraints are non negotiable: if we can't > keep up with Wikidata in term of data size or number of edits, it does > not make sense to address query performance. On some constraints, we > will probably need to compromise. > > For example, the update process is asynchronous. It is by nature > expected to lag. In the best case, this lag is measured in minutes, > but can climb to hours occasionally. This is a case of prioritizing > stability and correctness (ingesting all edits) over update latency. > And while we can work to reduce the maximum latency, this will still > be an asynchronous process and needs to be considered as such. > > We currently have one Blazegraph expert working with us to address a > number of performance and stability issues. We > are planning to hire an additional engineer to help us support the > service in the long term. You can follow our current work in phabricator [2]. > > If anyone has experience with scaling large graph databases, please > reach out to us, we're always happy to share ideas! > > Thanks all for your patience! > > Guillaume > > [1] https://wikitech.wikimedia.org/wiki/Wikidata_query_service/ScalingStrategy > [2] https://phabricator.wikimedia.org/project/view/1239/ > > -- > Guillaume Lederrey > Engineering Manager, Search Platform > Wikimedia Foundation > UTC+2 / CEST > > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata