On Thu, Feb 26, 2015 at 2:52 PM, Markus Kroetzsch
<markus.kroetz...@tu-dresden.de> wrote:
> Hi,
>
> It's that time of the year again when I am sending a reminder that we still
> have broken JSON in the dump files ;-). As usual, the problem is that empty
> maps {} are serialized wrongly as empty lists []. I am not sure if there is
> any open bug that tracks this, so I am sending an email. There was one, but
> it was closed [1].
>
> As you know (I had sent an email a while ago), there are some remaining
> problems of this kind in the JSON dump, and also in the live exported JSON,
> e.g.,
>
> https://www.wikidata.org/wiki/Special:EntityData/Q4383128.json
> (uses [] as a value for snaks: this item has a reference with an empty list
> of snaks, which is an error by itself)
>
> However, the situation is considerably worse in the XML dumps, which have
> seen less usage since we have JSON, but as it turns out are still preferred
> by some users. Surprisingly (to me), the JSON content in the XML dumps is
> still not the same as in the JSON dumps. A large part of the records in the
> XML dump is broken because of the map-vs-list issue.
>
> For example, the latest dump of current revisions [2] has countless
> instances of the problem. The first is in the item Q3261 (empty list for
> claims), but you can easily find more by grepping for things like
>
> &quot;claims&quot;:[]
>
> It seems that all empty maps are serialized wrongly in this dump (aliases,
> descriptions, claims, ...). In contrast, the site's export simply omits the
> key of empty maps entirely, see
>
> https://www.wikidata.org/wiki/Special:EntityData/Q3261.json
>
> The JSON in the JSON dumps is the same.
>
> Cheers,
>
> Markus
>
>
> [1] https://github.com/wmde/WikibaseDataModelSerialization/issues/77
> [2]
> http://dumps.wikimedia.org/wikidatawiki/20150207/wikidatawiki-20150207-pages-meta-current.xml.bz2

Sorry Markus. This was still on my agenda but I've been pushing this
off for too long. I'll bring it up in our planning meeting next
Wednesday. If you could open a ticket for it on Phabricator that'd be
awesome.
As for general issues with dumps not being generated and so on:
Unfortunately the whole Wikimedia dumps infrastructure has a bus
factor of 1 and this became an issue over the last months.
Improvements for the whole Wikimedia dumps infrastructure are being
tracked at https://phabricator.wikimedia.org/T88991 and the Wikidata
specific improvements are tracked at
https://phabricator.wikimedia.org/T88728 If you have issues that are
not there yet please do file them.


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to