On 4 May 2013 17:12, Daniel Kinzler <daniel.kinz...@wikimedia.de> wrote: > On 04.05.2013 12:05, Jona Christopher Sahnwaldt wrote: >> On 26 April 2013 17:15, Daniel Kinzler <daniel.kinz...@wikimedia.de> wrote: >>> *internal* JSON representation, which is different from what the API >>> returns, >>> and may change at any time without notice. >> >> Somewhat off-topic: I didn't know you have different JSON >> representations. I'm curious and I'd be happy about a few quick >> answers... >> >> - How many are there? Just the two, internal and external? > > Yes, these two. > >> - Which JSON representations do the API and the XML dump provide? Will >> they do so in the future? > > The XML dump provides the internal representations (since it's a dump of the > raw > page content). The API uses the external representation. > > This is pretty much dictated by the nature of the dumps and the API, so it > will > stay that way. However, we plan to add more types of dumps, including: > > * a plain JSON dump (using the external representation) > * an RDF/XML dump > > It's not sure yet when or even if we'll provide these, but we are considering > it. > >> - Are the API and XML dump representations stable? Or should we expect >> some changes? > > The internal representation is unstable and subject to changes without notice. > In fact, it may even change to something other than JSON. I don't think it's > even documented anywhere outside the source code. > > The external representation is pretty stable, though not final yet. We will > definitely make additions to it, and some (hopefully minor) structural changes > may be necessary. We'll try to stay largely backwards compatible, but can't > promise full stability yet. > > Also, the external representation uses the API framework for generating the > actual JSON, and may be subject to changes imposed by that framework. > > > Unfortunately, this means that there are currently no dumps with a reliable > representation of our data. You need to a) use the API or b) use the unstable > internal JSON or c) wait for "real" data dumps.
Thanks for the clarification. Not the best news, but not terribly bad either. We will produce a DBpedia release pretty soon, I don't think we can wait for the "real" dumps. The inter-language links are an important part of DBpedia, so we have to extract data from almost all Wikidata items. I don't think it's sensible to make ~10 million calls to the API to download the external JSON format, so we will have to use the XML dumps and thus the internal format. But I think it's not a big deal that it's not that stable: we parse the JSON into an AST anyway. It just means that we will have to use a more abstract AST, which I was planning to do anyway. As long as the semantics of the internal format will remain more or less the same - it will contain the labels, the language links, the properties, etc. - it's no big deal if the syntax changes, even if it's not JSON anymore. Christopher > > -- daniel _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l