On 4 May 2013 17:12, Daniel Kinzler <daniel.kinz...@wikimedia.de> wrote:
> On 04.05.2013 12:05, Jona Christopher Sahnwaldt wrote:
>> On 26 April 2013 17:15, Daniel Kinzler <daniel.kinz...@wikimedia.de> wrote:
>>> *internal* JSON representation, which is different from what the API 
>>> returns,
>>> and may change at any time without notice.
>>
>> Somewhat off-topic: I didn't know you have different JSON
>> representations. I'm curious and I'd be happy about a few quick
>> answers...
>>
>> - How many are there? Just the two, internal and external?
>
> Yes, these two.
>
>> - Which JSON representations do the API and the XML dump provide? Will
>> they do so in the future?
>
> The XML dump provides the internal representations (since it's a dump of the 
> raw
> page content). The API uses the external representation.
>
> This is pretty much dictated by the nature of the dumps and the API, so it 
> will
> stay that way. However, we plan to add more types of dumps, including:
>
> * a plain JSON dump (using the external representation)
> * an RDF/XML dump
>
> It's not sure yet when or even if we'll provide these, but we are considering 
> it.
>
>> - Are the API and XML dump representations stable? Or should we expect
>> some changes?
>
> The internal representation is unstable and subject to changes without notice.
> In fact, it may even change to something other than JSON. I don't think it's
> even documented anywhere outside the source code.
>
> The external representation is pretty stable, though not final yet. We will
> definitely make additions to it, and some (hopefully minor) structural changes
> may be necessary. We'll try to stay largely backwards compatible, but can't
> promise full stability yet.
>
> Also, the external representation uses the API framework for generating the
> actual JSON, and may be subject to changes imposed by that framework.
>
>
> Unfortunately, this means that there are currently no dumps with a reliable
> representation of our data. You need to a) use the API or b) use the unstable
> internal JSON or c) wait for "real" data dumps.

Thanks for the clarification. Not the best news, but not terribly bad either.

We will produce a DBpedia release pretty soon, I don't think we can
wait for the "real" dumps. The inter-language links are an important
part of DBpedia, so we have to extract data from almost all Wikidata
items. I don't think it's sensible to make ~10 million calls to the
API to download the external JSON format, so we will have to use the
XML dumps and thus the internal format. But I think it's not a big
deal that it's not that stable: we parse the JSON into an AST anyway.
It just means that we will have to use a more abstract AST, which I
was planning to do anyway. As long as the semantics of the internal
format will remain more or less the same - it will contain the labels,
the language links, the properties, etc. - it's no big deal if the
syntax changes, even if it's not JSON anymore.

Christopher

>
> -- daniel

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to