Re: [Wikidata] Scaling Wikidata Query Service

Amirouche Boubekki Fri, 07 Jun 2019 05:56:07 -0700

Le jeu. 6 juin 2019 à 21:33, Guillaume Lederrey <gleder...@wikimedia.org> a
écrit :


> Hello all!
>
> There has been a number of concerns raised about the performance and
> scaling of Wikdata Query Service. We share those concerns and we are
> doing our best to address them. Here is some info about what is going
> on:
>
> In an ideal world, WDQS should:
>
> * scale in terms of data size
> * scale in terms of number of edits
> * have low update latency
> * expose a SPARQL endpoint for queries
> * allow anyone to run any queries on the public WDQS endpoint
> * provide great query performance
> * provide a high level of availability
>

I will add that, in an ideal world, setting up wikidata ie. the interface
that allows edits and the entity search service and WDQS.

wikidata tools should be (more) accessible.


> Scaling graph databases is a "known hard problem", and we are reaching
> a scale where there are no obvious easy solutions to address all the
> above constraints. At this point, just "throwing hardware at the
> problem" is not an option anymore.


Reasonably, addressing all of the above constraints is unlikely to
> ever happen.


never say never ;-)


> For example, the update process is asynchronous. It is by nature
> expected to lag. In the best case, this lag is measured in minutes,
> but can climb to hours occasionally. This is a case of prioritizing
> stability and correctness (ingesting all edits) over update latency.
> And while we can work to reduce the maximum latency, this will still
> be an asynchronous process and needs to be considered as such.
>


> We currently have one Blazegraph expert working with us to address a
> number of performance and stability issues. We
> are planning to hire an additional engineer to help us support the
> service in the long term. You can follow our current work in phabricator
> [2].
>
> If anyone has experience with scaling large graph databases, please
> reach out to us, we're always happy to share ideas!
>

Good luck!


> Thanks all for your patience!
>
>    Guillaume
>
> [1]
> https://wikitech.wikimedia.org/wiki/Wikidata_query_service/ScalingStrategy


Here is my point of view regarding some discussion happening in the talk
page:

> Giving up on SPARQL.

There is an ongoing effort to draft a 1.2 <https://github.com/w3c/sparql-12>
version of the SPARQL. It is the right time to give some feedback.

Also, look at https://github.com/w3c/EasierRDF/

> JanusGraph <http://janusgraph.org/> (successor of Titan, now part
DataStax) - Written in java, using scalable data-storage (cassandra/hbase)
and indexing engines (ElasticSearch/SolR), queryable

That would make wikidata much less accessible. Even if JanusGraph has a
Oracle Berkeley backend. The full-text search and geospatial indices are in
yet-another-processus.

> I can't think of any other way than transforming the wikidata RDF
representation to a more suitable one for graph-properties engines

FWIW, OpenCog's AtomSpace has a neo4j backend but they do not use it.

Also, graph-properties engines makes slow to represent things like:

("wikidata", "used-by", "opencog")
("wikidata", "used-by", "google")

That is, one has to create an hyper-edge if you want to be able to query
those facts.


> [2] https://phabricator.wikimedia.org/project/view/1239/



Best regards,


Amirouche ~ amz3

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Scaling Wikidata Query Service

Reply via email to