Hi gnosygnu!

The JSON in the XML dumps is the raw contents of the storage backend. It can't
be changed retroactively, and re-encoding everything on the fly would be too
expensive. Also, the JSON embedded in the XML files is not officially supported
as a stable interface of Wikibase. The JSON format in the XML files can change
without notice, and you may encounter different representations even within the
same dump.

I recommend to use the JSON dumps, they contain our data in canonical form. To
avoid downloading redundant information, you can use one of the
wikidatawiki-20161120-stub-* dumps instead of the full page dumps. These don't
contain the actual page content, just meta-data.

Caveat: there is currently no dump that contains the JSON of old revisions of
entities in canonical form. You can only get them individually from
Special:EntityData, e.g.
<https://www.wikidata.org/wiki/Special:EntityData/Q23.json?oldid=30279>

HTH
-- daniel

Am 26.11.2016 um 02:13 schrieb gnosygnu:
> Hi everyone. I have a question about the Wikidata xml dump, but I'm
> posting this question here, because it looks more related to Wikidata.
> 
> In short, it seems that the "pages-articles.xml" does not include the
> datatype property for snaks. For example, the xml dump does not list a
> datatype for Q38 (Italy) and P41 (flag image). In contrast, the json
> dump does list a datatype of "commonsMedia".
> 
> Can this datatype property be included in future xml dumps? The
> alternative would be to download two large and redundant dumps (xml
> and json) in order to reconstruct a Wikidata instance.
> 
> More information is provided below the break. Let me know if you need
> anything else.
> 
> Thanks.
> 
> ----
> 
> Here's an excerpt from the xml data dump for Q38 (Italy) and P41 (flag
> image). Notice that there is no "datatype" property
>   // 
> https://dumps.wikimedia.org/wikidatawiki/20161120/wikidatawiki-20161120-pages-articles.xml.bz2
>   "mainsnak": {
>     "snaktype": "value",
>     "property": "P41",
>     "hash": "a3bd1e026c51f5e0bdf30b2323a7a1fb913c9863",
>     "datavalue": {
>       "value": "Flag of Italy.svg",
>       "type": "string"
>     }
>   },
> 
> Meanwhile, the API and the JSON dump lists a datatype property of
> "commonsMedia":
>   // https://www.wikidata.org/w/api.php?action=wbgetentities&ids=q38
>   // 
> https://dumps.wikimedia.org/wikidatawiki/entities/20161114/wikidata-20161114-all.json.bz2
>   "P41": [{
>     "mainsnak": {
>       "snaktype": "value",
>       "property": "P41",
>       "datavalue": {
>         "value": "Flag of Italy.svg",
>         "type": "string"
>       },
>       "datatype": "commonsMedia"
>     },
> 
> As far as I can tell, the Turtle (ttl) dump does not list a datatype
> property either, but this may be because I don't understand its
> format.
>   wd:Q38 p:P41 wds:q38-574446A6-FD05-47AE-86E3-AA745993B65D .
>   wds:q38-574446A6-FD05-47AE-86E3-AA745993B65D a wikibase:Statement,
>       wikibase:BestRank ;
>     wikibase:rank wikibase:NormalRank ;
>     ps:P41 
> <http://commons.wikimedia.org/wiki/Special:FilePath/Flag%20of%20Italy.svg>
> ;
>     pq:P580 "1946-06-19T00:00:00Z"^^xsd:dateTime ;
>     pqv:P580 wdv:204e90b1bce9f96d6d4ff632a8da0ecc .
> 
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to