Please see https://gerrit.wikimedia.org/r/#/c/229099/ and https://gerrit.wikimedia.org/r/#/c/229100/ for the change to master and the currently deployed branch. This will be merged and back ported today and a new dump created
I'm also going to follow this up by writing some more integration tests for our json dumps to spot this kind of thing! On 4 August 2015 at 11:26, Lydia Pintscher <lydia.pintsc...@wikimedia.de> wrote: > On Tue, Aug 4, 2015 at 12:20 PM, Markus Krötzsch > <mar...@semantic-mediawiki.org> wrote: > > Hi, > > > > The recent Wikidata JSON dumps again contain huge amounts of broken JSON > > where empty maps are serialized as [] instead of using {}. Just grep for > > > > "claims":[] > > or > > "aliases":[] > > or > > any other key that requires a map > > > > to find many examples. The scope of the problem is massive. Basically all > > entity documents that include some empty map are broken, which is almost > > every entity document in > > http://dumps.wikimedia.org/other/wikidata/20150803.json.gz. Concretely, > > there are around 15.7 million entities with [] for aliases. > > > > This is critically breaking the consumption of Wikidata content for all > > model-based JSON parsers, including Wikidata Toolkit. > > > > The bug used to occur only in XML dumps, but now also affects the JSON > dumps > > in the same way. In previous JSON dumps, the problem was avoided by > omitting > > empyt maps altogether (no keys, no values), which is better because it > > allows implementations to fall back to the obvious default. This is still > > done in the Web API, e.g., > > https://www.wikidata.org/wiki/Special:EntityData/Q12062430.json > > > > It would be nice to test the export code before deploying it. > > Sorry for that. Adam and Marius are working on a fix right now. > They'll report back in a bit. > > > Cheers > Lydia > > -- > Lydia Pintscher - http://about.me/lydia.pintscher > Product Manager for Wikidata > > Wikimedia Deutschland e.V. > Tempelhofer Ufer 23-24 > 10963 Berlin > www.wikimedia.de > > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. > > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg > unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das > Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. > > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > -- Addshore
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata