Re: [Wikidata] Critical JSON bugs back in production; all dumps broken now

2015-08-04 Thread Markus Krötzsch

On 04.08.2015 12:33, Addshore wrote:

Please see https://gerrit.wikimedia.org/r/#/c/229099/ and
https://gerrit.wikimedia.org/r/#/c/229100/ for the change to master and
the currently deployed branch.
This will be merged and back ported today and a new dump created

I'm also going to follow this up by writing some more integration tests
for our json dumps to spot this kind of thing!


That's great news! Many thanks.

Markus



On 4 August 2015 at 11:26, Lydia Pintscher lydia.pintsc...@wikimedia.de
mailto:lydia.pintsc...@wikimedia.de wrote:

On Tue, Aug 4, 2015 at 12:20 PM, Markus Krötzsch
mar...@semantic-mediawiki.org
mailto:mar...@semantic-mediawiki.org wrote:
 Hi,

 The recent Wikidata JSON dumps again contain huge amounts of broken JSON
 where empty maps are serialized as [] instead of using {}. Just grep for

 claims:[]
 or
 aliases:[]
 or
 any other key that requires a map

 to find many examples. The scope of the problem is massive. Basically all
 entity documents that include some empty map are broken, which is almost
 every entity document in
http://dumps.wikimedia.org/other/wikidata/20150803.json.gz. Concretely,
 there are around 15.7 million entities with [] for aliases.

 This is critically breaking the consumption of Wikidata content for all
 model-based JSON parsers, including Wikidata Toolkit.

 The bug used to occur only in XML dumps, but now also affects the JSON 
dumps
 in the same way. In previous JSON dumps, the problem was avoided by 
omitting
 empyt maps altogether (no keys, no values), which is better because it
 allows implementations to fall back to the obvious default. This is still
 done in the Web API, e.g.,
https://www.wikidata.org/wiki/Special:EntityData/Q12062430.json

 It would be nice to test the export code before deploying it.

Sorry for that. Adam and Marius are working on a fix right now.
They'll report back in a bit.


Cheers
Lydia

--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de http://www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985
tel:27%2F681%2F51985.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org mailto:Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




--
Addshore


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Critical JSON bugs back in production; all dumps broken now

2015-08-04 Thread Lydia Pintscher
On Tue, Aug 4, 2015 at 12:20 PM, Markus Krötzsch
mar...@semantic-mediawiki.org wrote:
 Hi,

 The recent Wikidata JSON dumps again contain huge amounts of broken JSON
 where empty maps are serialized as [] instead of using {}. Just grep for

 claims:[]
 or
 aliases:[]
 or
 any other key that requires a map

 to find many examples. The scope of the problem is massive. Basically all
 entity documents that include some empty map are broken, which is almost
 every entity document in
 http://dumps.wikimedia.org/other/wikidata/20150803.json.gz. Concretely,
 there are around 15.7 million entities with [] for aliases.

 This is critically breaking the consumption of Wikidata content for all
 model-based JSON parsers, including Wikidata Toolkit.

 The bug used to occur only in XML dumps, but now also affects the JSON dumps
 in the same way. In previous JSON dumps, the problem was avoided by omitting
 empyt maps altogether (no keys, no values), which is better because it
 allows implementations to fall back to the obvious default. This is still
 done in the Web API, e.g.,
 https://www.wikidata.org/wiki/Special:EntityData/Q12062430.json

 It would be nice to test the export code before deploying it.

Sorry for that. Adam and Marius are working on a fix right now.
They'll report back in a bit.


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata