On Sun, Apr 20, 2014 at 6:14 AM, Stuart A. Yeates <syea...@gmail.com> wrote: > > On 20/04/2014 11:05 AM, "Gerard Meijssen" <gerard.meijs...@gmail.com> wrote: >> >> What I do know is that at Wikidata we harvest information from all >> Wikipedias. It does include en,wp but it is not exclusively so. It does >> include the Russian, the Chinese, the Arabic ... all Wikipedias. As you >> know, the first operational task for Wikidata is to replace the old inter >> language links. A next objective is to include all the information that is >> currently held in info boxes. > > What process does wikidata have when different wikis have different policies > about what should appear by default in infoboxes. In particular when a > policy calls for discretion or human judgement?
It doesnt have good processes or good policies, and does have a lot of bots automatically importing data from every wiki. And this causes the problem you are concerned about Stuart. here is a sample query of transgender/transsexual people in Wikidata. http://tools.wmflabs.org/wikidata-todo/autolist.html?props=P21&cat_name=Transgender_and_transsexual_men&cat_lang=en&cat_project=wikipedia&cat_depth=12& They should either make no claims about sex and gender, or have a 'sex/gender' (property 21, or P21) that includes 'transgender' (e.g. Q1052281 = transgender woman), or English Wikipedia is wrong... https://www.wikidata.org/wiki/Q1052281 It is trivial to find examples where the only P21 claim is female (Q6581072) or male (Q6581097). e.g. Buck Angel BeneBot* adds 'male animal' https://www.wikidata.org/w/index.php?title=Q958281&diff=9027854&oldid=9027850 legobot changes it to '[human] male' https://www.wikidata.org/w/index.php?title=Q958281&diff=14944633&oldid=9027854 a non-bot (now blocked on Korean Wikipedia) tried to change this to 'hefemale', and was reverted by Sk!d 12 hours later. https://www.wikidata.org/w/index.php?title=Q958281&diff=49577100&oldid=42751691 https://www.wikidata.org/w/index.php?title=Q958281&diff=49664043&oldid=49577100 Obviously 'hefemale' is not the best term, but it should have been corrected to be the more appropriate and more precise trans man. Here are other bots (SamoaBot & VIAFbot) importing the wrong value from various datasets. https://www.wikidata.org/w/index.php?title=Q5144952&diff=50988273&oldid=29777905 https://www.wikidata.org/w/index.php?title=Q4709895&diff=52762663&oldid=48588196 Here is a human contributor doing it using Widar (semi-automated tool) https://www.wikidata.org/w/index.php?title=Q6118283&diff=115936683&oldid=114200218 These errors are typically all over a year old, without being corrected. -- John Vandenberg _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l