Re: [Wikidata] SPARQL CONSTRUCT results truncated

Neubert, Joachim Thu, 11 Feb 2016 07:26:14 -0800

Hi Marcus,

thank you very much, your code will be extremely helpful for solving my current 
need. And though not a Java programmer, I may be even able to adjust it to 
similar queries.

On the other side, it's some steps away from the promises of Linked data and 
SPARQL endpoints. I extremely value the wikidata endpoint for having the 
current data, so if I add some bit in the user interface, I can query for it 
immediately afterwards, and I can do this in a uniform way via standard SPARQL 
queries. I can imagine how hard that was to achieve.

And I completely agree that it's impossible to build a SPARQL endpoint which 
reliably serves arbitrary comlex queries for multiple users in finite time. 
(This is the reason why all our public endpoints at http://zbw.eu/beta/sparql/ 
are labeled beta.) And you easily can get at a point, where some ill-behaved 
query is run over and over again by some stupid program, and you have to be 
quite restrictive to keep your service up.

So an "unstable" endpoint with wider limits, as you suggested in your later 
mail, could be a great solution for this. In both instances, it would be nice 
if the policy and the actual limits could be documented, so users would know 
what to expect (and how to act appropriate as good citizens).

Thanks again for the code, and for taking up the discussion.

Cheers, Joachim

-----Ursprüngliche Nachricht-----
Von: Wikidata [mailto:wikidata-boun...@lists.wikimedia.org] Im Auftrag von 
Markus Krötzsch
Gesendet: Donnerstag, 11. Februar 2016 15:05
An: Discussion list for the Wikidata project.
Betreff: Re: [Wikidata] SPARQL CONSTRUCT results truncated

Hi Joachim,

Here is a short program that solves your problem:

https://github.com/Wikidata/Wikidata-Toolkit-Examples/blob/master/src/examples/DataExtractionProcessor.java

It is in Java, so, you need that (and Maven) to run it, but that's the only 
technical challenge ;-). You can run the program in various ways as described 
in the README:

https://github.com/Wikidata/Wikidata-Toolkit-Examples

The program I wrote puts everything into a CSV file, but you can of course also 
write RDF triples if you prefer this, or any other format you wish. The code 
should be easy to modify.

On a first run, the tool will download the current Wikidata dump, which takes a 
while (it's about 6G), but after this you can find and serialise all results in 
less than half an hour (for a processing rate of around 10K items/second). A 
regular laptop is enough to run it.

Cheers,

Markus

On 11.02.2016 01:34, Stas Malyshev wrote:
> Hi!
>
>> I try to extract all mappings from wikidata to the GND authority 
>> file, along with the according wikipedia pages, expecting roughly 
>> 500,000 to 1m triples as result.
>
> As a starting note, I don't think extracting 1M triples may be the 
> best way to use query service. If you need to do processing that 
> returns such big result sets - in millions - maybe processing the dump 
> - e.g. with wikidata toolkit at 
> https://github.com/Wikidata/Wikidata-Toolkit - would be better idea?
>
>> However, with various calls, I get much less triples (about 2,000 to 
>> 10,000). The output seems to be truncated in the middle of a statement, e.g.
>
> It may be some kind of timeout because of the quantity of the data 
> being sent. How long does such request take?
>

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] SPARQL CONSTRUCT results truncated

Reply via email to