Please see https://gerrit.wikimedia.org/r/#/c/229099/ and
https://gerrit.wikimedia.org/r/#/c/229100/ for the change to master and the
currently deployed branch.
This will be merged and back ported today and a new dump created

I'm also going to follow this up by writing some more integration tests for
our json dumps to spot this kind of thing!

On 4 August 2015 at 11:26, Lydia Pintscher <lydia.pintsc...@wikimedia.de>
wrote:

> On Tue, Aug 4, 2015 at 12:20 PM, Markus Krötzsch
> <mar...@semantic-mediawiki.org> wrote:
> > Hi,
> >
> > The recent Wikidata JSON dumps again contain huge amounts of broken JSON
> > where empty maps are serialized as [] instead of using {}. Just grep for
> >
> > "claims":[]
> > or
> > "aliases":[]
> > or
> > any other key that requires a map
> >
> > to find many examples. The scope of the problem is massive. Basically all
> > entity documents that include some empty map are broken, which is almost
> > every entity document in
> > http://dumps.wikimedia.org/other/wikidata/20150803.json.gz. Concretely,
> > there are around 15.7 million entities with [] for aliases.
> >
> > This is critically breaking the consumption of Wikidata content for all
> > model-based JSON parsers, including Wikidata Toolkit.
> >
> > The bug used to occur only in XML dumps, but now also affects the JSON
> dumps
> > in the same way. In previous JSON dumps, the problem was avoided by
> omitting
> > empyt maps altogether (no keys, no values), which is better because it
> > allows implementations to fall back to the obvious default. This is still
> > done in the Web API, e.g.,
> > https://www.wikidata.org/wiki/Special:EntityData/Q12062430.json
> >
> > It would be nice to test the export code before deploying it.
>
> Sorry for that. Adam and Marius are working on a fix right now.
> They'll report back in a bit.
>
>
> Cheers
> Lydia
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 
Addshore
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to