Re: [Wikidata-l] Wikidata RDF export available

2013-08-13 Thread Kingsley Idehen
On 8/12/13 12:56 PM, Nicolas Torzec wrote: With respect to the RDF export I'd advocate for: 1) an RDF format with one fact per line. 2) the use of a mature/proven RDF generation framework. Yes, keep it simple, use Turtle. The additional benefit of Turtle is that is addresses a wide data consu

Re: [Wikidata-l] Wikidata RDF export available

2013-08-12 Thread Markus Krötzsch
On 12/08/13 17:56, Nicolas Torzec wrote: With respect to the RDF export I'd advocate for: 1) an RDF format with one fact per line. 2) the use of a mature/proven RDF generation framework. Optimizing too early based on a limited and/or biased view of the potential use cases may not be a good idea

Re: [Wikidata-l] Wikidata RDF export available

2013-08-12 Thread Nicolas Torzec
With respect to the RDF export I'd advocate for: 1) an RDF format with one fact per line. 2) the use of a mature/proven RDF generation framework. Optimizing too early based on a limited and/or biased view of the potential use cases may not be a good idea in the long run. I'd rather keep it simple

Re: [Wikidata-l] Wikidata RDF export available

2013-08-12 Thread Markus Krötzsch
On 11/08/13 22:29, Tom Morris wrote: On Sat, Aug 10, 2013 at 2:30 PM, Markus Krötzsch mailto:mar...@semantic-mediawiki.org>> wrote: Anyway, if you restrict yourself to tools that are installed by default on your system, then it will be difficult to do many interesting things with a 4

Re: [Wikidata-l] Wikidata RDF export available

2013-08-11 Thread Tom Morris
On Sat, Aug 10, 2013 at 2:30 PM, Markus Krötzsch < mar...@semantic-mediawiki.org> wrote: > Anyway, if you restrict yourself to tools that are installed by default on > your system, then it will be difficult to do many interesting things with a > 4.5G RDF file ;-) Seriously, the RDF dump is really

Re: [Wikidata-l] Wikidata RDF export available

2013-08-10 Thread Markus Krötzsch
Hi Tom, On 10/08/13 15:55, Tom Morris wrote: Given your "educating" people about software engineering principles, this may fall on deaf ears, but I too have a strong preference for the format with an independent line per triple. No worries. The eventual RDF export of Wikidata will most certain

Re: [Wikidata-l] Wikidata RDF export available

2013-08-10 Thread Tom Morris
Given your "educating" people about software engineering principles, this may fall on deaf ears, but I too have a strong preference for the format with an independent line per triple. On Sat, Aug 10, 2013 at 8:35 AM, Markus Krötzsch < markus.kroetz...@cs.ox.ac.uk> wrote: > > On 10/08/13 12:18, Seb

Re: [Wikidata-l] Wikidata RDF export available

2013-08-10 Thread Markus Krötzsch
Dear Sebastian, On 10/08/13 12:18, Sebastian Hellmann wrote: Hi Markus! Thank you very much. Regarding your last email: Of course, I am aware of your arguments in your last email, that the dump is not "official". Nevertheless, I am expecting you and others to code (or supervise) similar RDF dum

Re: [Wikidata-l] Wikidata RDF export available

2013-08-10 Thread Sebastian Hellmann
Hi Markus! Thank you very much. Regarding your last email: Of course, I am aware of your arguments in your last email, that the dump is not "official". Nevertheless, I am expecting you and others to code (or supervise) similar RDF dumping projects in the future. Here are two really important

Re: [Wikidata-l] Wikidata RDF export available

2013-08-10 Thread Markus Krötzsch
Good morning. I just found a bug that was caused by a bug in the Wikidata dumps (a value that should be a URI was not). This led to a few dozen lines with illegal qnames of the form "w: ". The updated script fixes this. Cheers, Markus On 09/08/13 18:15, Markus Krötzsch wrote: Hi Sebastian,

Re: [Wikidata-l] Wikidata RDF export available

2013-08-09 Thread Markus Krötzsch
Hi Sebastian, On 09/08/13 15:44, Sebastian Hellmann wrote: Hi Markus, we just had a look at your python code and created a dump. We are still getting a syntax error for the turtle dump. You mean "just" as in "at around 15:30 today" ;-)? The code is under heavy development, so changes are quit

Re: [Wikidata-l] Wikidata RDF export available

2013-08-09 Thread Paul A. Houle
mann Sent: Friday, August 9, 2013 10:44 AM To: Discussion list for the Wikidata project. Cc: Dimitris Kontokostas ; Jona Christopher Sahnwaldt Subject: Re: [Wikidata-l] Wikidata RDF export available Hi Markus, we just had a look at your python code and created a dump. We are still getting a synta

Re: [Wikidata-l] Wikidata RDF export available

2013-08-09 Thread Sebastian Hellmann
Hi Markus, we just had a look at your python code and created a dump. We are still getting a syntax error for the turtle dump. I saw, that you did not use a mature framework for serializing the turtle. Let me explain the problem: Over the last 4 years, I have seen about two dozen people (und

Re: [Wikidata-l] Wikidata RDF export available

2013-08-04 Thread Federico Leva (Nemo)
Markus Krötzsch, 04/08/2013 17:35: Are you sure? The file you linked has mappings from site ids to language codes, not from language codes to language codes. Do you mean to say: "If you take only the entries of the form 'XXXwiki' in the list, and extract a language code from the XXX, then you get

Re: [Wikidata-l] Wikidata RDF export available

2013-08-04 Thread Markus Krötzsch
On 04/08/13 13:17, Federico Leva (Nemo) wrote: Markus Krötzsch, 04/08/2013 12:32: * Wikidata uses "be-x-old" as a code, but MediaWiki messages for this language seem to use "be-tarask" as a language code. So there must be a mapping somewhere. Where? Where I linked it. Are you sure? The file

Re: [Wikidata-l] Wikidata RDF export available

2013-08-04 Thread Federico Leva (Nemo)
Markus Krötzsch, 04/08/2013 12:32: * Wikidata uses "be-x-old" as a code, but MediaWiki messages for this language seem to use "be-tarask" as a language code. So there must be a mapping somewhere. Where? Where I linked it. * MediaWiki's http://www.mediawiki.org/wiki/Manual:$wgDummyLanguageCode

Re: [Wikidata-l] Wikidata RDF export available

2013-08-04 Thread Markus Krötzsch
Let me top-post a question to the Wikidata dev team: Where can we find documentation on what the Wikidata internal language codes actually mean? In particular, how do you map the language selector to the internal codes? I noticed some puzzling details: * Wikidata uses "be-x-old" as a code, bu

Re: [Wikidata-l] Wikidata RDF export available

2013-08-03 Thread Federico Leva (Nemo)
Markus Krötzsch, 03/08/2013 15:48: (3) Limited language support. The script uses Wikidata's internal language codes for string literals in RDF. In some cases, this might not be correct. It would be great if somebody could create a mapping from Wikidata language codes to BCP47 language codes (let

Re: [Wikidata-l] Wikidata RDF export available

2013-08-03 Thread Markus Krötzsch
Update: the first bugs in the export have already been discovered -- and fixed in the script on github. The files I uploaded will be updated on Monday when I have a better upload again (the links file should be fine, the statements file requires a rather tolerant Turtle string literal parser, a

[Wikidata-l] Wikidata RDF export available

2013-08-03 Thread Markus Krötzsch
Hi, I am happy to report that an initial, yet fully functional RDF export for Wikidata is now available. The exports can be created using the wda-export-data.py script of the wda toolkit [1]. This script downloads recent Wikidata database dumps and processes them to create RDF/Turtle files. V