Re: [Wikidata] Status of Wikidata Query Service

Benno Fünfstück Fri, 07 Feb 2020 08:47:29 -0800

On 07.02.20 14:32, Guillaume Lederrey wrote:

Keeping all of Wikidata in a single graph is most probably not goingto work long term. We have not found examples of public SPARQLendpoints with > 10 B triples and there is probably a good reason forthat. We will probably need to split the graphs at some point. Wedon't know how yet (that's why we loaded the dumps into Hadoop, thatmight give us some more insight). We might expose a subgraph with onlytruthy statements. Or have language specific graphs, with onlylanguage specific labels. Or something completely different.

I have not looked in detail at query runtimes nor how blazegraphindexing works internally, however I noticed that in many cases queriesthat involve SPARQL property paths (and especially joins of those) takea long time to run. At the same time, I recently discovered that if weonly store which entity is connected to which other entity (withoutstoring the actual statement details, like property, qualifiers orranks), those only take up about 2GB compressed with Zstandard (Irepresented each connection as <32 bit int source entity> <32 bit intdestination entity>). Of course that discards a lot of importantinformation, but it made me wonder if there is perhaps something thatcould be done to more efficiently evaluate queries, given the relativelystrict schema the RDF representation of Wikidata adheres to? (Since itis generated from a more structured form, Statements). As an example,blazegraph doesn't know the relationship between wdt:Pxxx and p:Pxxx, oreven things like p:Pxxx/ps:Pxxx.

Another, somewhat related idea: perhaps it's possible to keep the SPARQLinterface for the frontend, but use a more efficient, splitrepresentation of the graph in the backend? Not sure how different thatwould be from the indexing that blazegraph does already, though.


Regards,

Benno

PS: appologies to Guillaume if you receive this mail twice, i clickedthe wrong button when replying





_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Status of Wikidata Query Service

Reply via email to