Le jeu. 6 juin 2019 à 21:33, Guillaume Lederrey <gleder...@wikimedia.org> a écrit :
> Hello all! > > There has been a number of concerns raised about the performance and > scaling of Wikdata Query Service. We share those concerns and we are > doing our best to address them. Here is some info about what is going > on: > > In an ideal world, WDQS should: > > * scale in terms of data size > * scale in terms of number of edits > * have low update latency > * expose a SPARQL endpoint for queries > * allow anyone to run any queries on the public WDQS endpoint > * provide great query performance > * provide a high level of availability > I will add that, in an ideal world, setting up wikidata ie. the interface that allows edits and the entity search service and WDQS. wikidata tools should be (more) accessible. > Scaling graph databases is a "known hard problem", and we are reaching > a scale where there are no obvious easy solutions to address all the > above constraints. At this point, just "throwing hardware at the > problem" is not an option anymore. Reasonably, addressing all of the above constraints is unlikely to > ever happen. never say never ;-) > For example, the update process is asynchronous. It is by nature > expected to lag. In the best case, this lag is measured in minutes, > but can climb to hours occasionally. This is a case of prioritizing > stability and correctness (ingesting all edits) over update latency. > And while we can work to reduce the maximum latency, this will still > be an asynchronous process and needs to be considered as such. > > We currently have one Blazegraph expert working with us to address a > number of performance and stability issues. We > are planning to hire an additional engineer to help us support the > service in the long term. You can follow our current work in phabricator > [2]. > > If anyone has experience with scaling large graph databases, please > reach out to us, we're always happy to share ideas! > Good luck! > Thanks all for your patience! > > Guillaume > > [1] > https://wikitech.wikimedia.org/wiki/Wikidata_query_service/ScalingStrategy Here is my point of view regarding some discussion happening in the talk page: > Giving up on SPARQL. There is an ongoing effort to draft a 1.2 <https://github.com/w3c/sparql-12> version of the SPARQL. It is the right time to give some feedback. Also, look at https://github.com/w3c/EasierRDF/ > JanusGraph <http://janusgraph.org/> (successor of Titan, now part DataStax) - Written in java, using scalable data-storage (cassandra/hbase) and indexing engines (ElasticSearch/SolR), queryable That would make wikidata much less accessible. Even if JanusGraph has a Oracle Berkeley backend. The full-text search and geospatial indices are in yet-another-processus. > I can't think of any other way than transforming the wikidata RDF representation to a more suitable one for graph-properties engines FWIW, OpenCog's AtomSpace has a neo4j backend but they do not use it. Also, graph-properties engines makes slow to represent things like: ("wikidata", "used-by", "opencog") ("wikidata", "used-by", "google") That is, one has to create an hyper-edge if you want to be able to query those facts. > [2] https://phabricator.wikimedia.org/project/view/1239/ Best regards, Amirouche ~ amz3
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata