Re: [Wikidata-l] Statistics
Hoi, This is my analysis of the situation with several strategies to remedy the situation. I am really interested in your reaction and yes, fallback is in there but there has to be something to fallback to. That is currently missing. Thanks, Gerard http://ultimategerardm.blogspot.nl/2013/10/wikdata-needs-378000-labels.html On 18 October 2013 22:27, Lydia Pintscher lydia.pintsc...@wikimedia.dewrote: On Fri, Oct 18, 2013 at 7:26 AM, Gerard Meijssen gerard.meijs...@gmail.com wrote: Hoi, I do not know if you have seen the statistics compiled by Magnus [1]. They are up to date and useful. I blogged about it [2]. As far as I am concerned, the biggest challenge we face is the lack of labels. Given that 280+ languages are represented in Wikidata it clearly demonstrates that Wikidata is useless as it is for most languages. Please tell me that I am wrong and explain why. This is correct to a certain degree. However we have language fallbacks on the roadmap which will significantly help improve the situation. Liangent has put a lot of effort into this over the summer during Google Summer of Code. The other thing is that there is clearly a number of items which are more used than others. My theory is that they are also the ones that are more complete. If there is no label in a small language for a very obscure item than this is less bad as when there is none for a much-used item. Not all items are created equal. We should keep that in mind when interpreting statistics. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Statistics
toolspam The Terminator [1] can show you the most linked-to (~important) items with no label (term, hence the name) in major languages. /toolspam [1] http://tools.wmflabs.org/wikidata-terminator/index.php On Fri, Oct 18, 2013 at 9:27 PM, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: On Fri, Oct 18, 2013 at 7:26 AM, Gerard Meijssen gerard.meijs...@gmail.com wrote: Hoi, I do not know if you have seen the statistics compiled by Magnus [1]. They are up to date and useful. I blogged about it [2]. As far as I am concerned, the biggest challenge we face is the lack of labels. Given that 280+ languages are represented in Wikidata it clearly demonstrates that Wikidata is useless as it is for most languages. Please tell me that I am wrong and explain why. This is correct to a certain degree. However we have language fallbacks on the roadmap which will significantly help improve the situation. Liangent has put a lot of effort into this over the summer during Google Summer of Code. The other thing is that there is clearly a number of items which are more used than others. My theory is that they are also the ones that are more complete. If there is no label in a small language for a very obscure item than this is less bad as when there is none for a much-used item. Not all items are created equal. We should keep that in mind when interpreting statistics. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- undefined ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Statistics
i took one example and am lost already, pegasus, listed on top with 5 labels without description: http://tools.wmflabs.org/wikidata-terminator/index.php?lang=determ=Pegasusdoit=1 then i take one with a description Sternbild knapp nördlich des Himmelsäquators: https://www.wikidata.org/wiki/Q8864 and, i do not see this description, nor can i figure out where this description came from. where did i make the error? rupert On Sat, Oct 19, 2013 at 2:08 PM, Magnus Manske magnusman...@googlemail.com wrote: toolspam The Terminator [1] can show you the most linked-to (~important) items with no label (term, hence the name) in major languages. /toolspam [1] http://tools.wmflabs.org/wikidata-terminator/index.php On Fri, Oct 18, 2013 at 9:27 PM, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: On Fri, Oct 18, 2013 at 7:26 AM, Gerard Meijssen gerard.meijs...@gmail.com wrote: Hoi, I do not know if you have seen the statistics compiled by Magnus [1]. They are up to date and useful. I blogged about it [2]. As far as I am concerned, the biggest challenge we face is the lack of labels. Given that 280+ languages are represented in Wikidata it clearly demonstrates that Wikidata is useless as it is for most languages. Please tell me that I am wrong and explain why. This is correct to a certain degree. However we have language fallbacks on the roadmap which will significantly help improve the situation. Liangent has put a lot of effort into this over the summer during Google Summer of Code. The other thing is that there is clearly a number of items which are more used than others. My theory is that they are also the ones that are more complete. If there is no label in a small language for a very obscure item than this is less bad as when there is none for a much-used item. Not all items are created equal. We should keep that in mind when interpreting statistics. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- undefined ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Statistics
You used German (de) on the Terminator page. Have you switched your Wikidata language to de accordingly? On Sat, Oct 19, 2013 at 1:35 PM, rupert THURNER rupert.thur...@gmail.comwrote: i took one example and am lost already, pegasus, listed on top with 5 labels without description: http://tools.wmflabs.org/wikidata-terminator/index.php?lang=determ=Pegasusdoit=1 then i take one with a description Sternbild knapp nördlich des Himmelsäquators: https://www.wikidata.org/wiki/Q8864 and, i do not see this description, nor can i figure out where this description came from. where did i make the error? rupert On Sat, Oct 19, 2013 at 2:08 PM, Magnus Manske magnusman...@googlemail.com wrote: toolspam The Terminator [1] can show you the most linked-to (~important) items with no label (term, hence the name) in major languages. /toolspam [1] http://tools.wmflabs.org/wikidata-terminator/index.php On Fri, Oct 18, 2013 at 9:27 PM, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: On Fri, Oct 18, 2013 at 7:26 AM, Gerard Meijssen gerard.meijs...@gmail.com wrote: Hoi, I do not know if you have seen the statistics compiled by Magnus [1]. They are up to date and useful. I blogged about it [2]. As far as I am concerned, the biggest challenge we face is the lack of labels. Given that 280+ languages are represented in Wikidata it clearly demonstrates that Wikidata is useless as it is for most languages. Please tell me that I am wrong and explain why. This is correct to a certain degree. However we have language fallbacks on the roadmap which will significantly help improve the situation. Liangent has put a lot of effort into this over the summer during Google Summer of Code. The other thing is that there is clearly a number of items which are more used than others. My theory is that they are also the ones that are more complete. If there is no label in a small language for a very obscure item than this is less bad as when there is none for a much-used item. Not all items are created equal. We should keep that in mind when interpreting statistics. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- undefined ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- undefined ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Statistics
maybe the most important question first: is it the goal that human editors extend / correct this data in wikidata, or is there a feed? if it is really humans who should enter data: thanks for the hint magnus, i can see it now, hallelujah. i'd have never in my life the idea to change the GUI language in preferences to de to change the contents language. and, it takes 8 clicks on a smaller screen - enough so i would not do it more than one time every 5 years :) rupert On Sat, Oct 19, 2013 at 7:38 PM, Magnus Manske magnusman...@googlemail.com wrote: You used German (de) on the Terminator page. Have you switched your Wikidata language to de accordingly? On Sat, Oct 19, 2013 at 1:35 PM, rupert THURNER rupert.thur...@gmail.com wrote: i took one example and am lost already, pegasus, listed on top with 5 labels without description: http://tools.wmflabs.org/wikidata-terminator/index.php?lang=determ=Pegasusdoit=1 then i take one with a description Sternbild knapp nördlich des Himmelsäquators: https://www.wikidata.org/wiki/Q8864 and, i do not see this description, nor can i figure out where this description came from. where did i make the error? rupert On Sat, Oct 19, 2013 at 2:08 PM, Magnus Manske magnusman...@googlemail.com wrote: toolspam The Terminator [1] can show you the most linked-to (~important) items with no label (term, hence the name) in major languages. /toolspam [1] http://tools.wmflabs.org/wikidata-terminator/index.php On Fri, Oct 18, 2013 at 9:27 PM, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: On Fri, Oct 18, 2013 at 7:26 AM, Gerard Meijssen gerard.meijs...@gmail.com wrote: Hoi, I do not know if you have seen the statistics compiled by Magnus [1]. They are up to date and useful. I blogged about it [2]. As far as I am concerned, the biggest challenge we face is the lack of labels. Given that 280+ languages are represented in Wikidata it clearly demonstrates that Wikidata is useless as it is for most languages. Please tell me that I am wrong and explain why. This is correct to a certain degree. However we have language fallbacks on the roadmap which will significantly help improve the situation. Liangent has put a lot of effort into this over the summer during Google Summer of Code. The other thing is that there is clearly a number of items which are more used than others. My theory is that they are also the ones that are more complete. If there is no label in a small language for a very obscure item than this is less bad as when there is none for a much-used item. Not all items are created equal. We should keep that in mind when interpreting statistics. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- undefined ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- undefined ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Statistics
On Fri, Oct 18, 2013 at 7:26 AM, Gerard Meijssen gerard.meijs...@gmail.com wrote: Hoi, I do not know if you have seen the statistics compiled by Magnus [1]. They are up to date and useful. I blogged about it [2]. As far as I am concerned, the biggest challenge we face is the lack of labels. Given that 280+ languages are represented in Wikidata it clearly demonstrates that Wikidata is useless as it is for most languages. Please tell me that I am wrong and explain why. This is correct to a certain degree. However we have language fallbacks on the roadmap which will significantly help improve the situation. Liangent has put a lot of effort into this over the summer during Google Summer of Code. The other thing is that there is clearly a number of items which are more used than others. My theory is that they are also the ones that are more complete. If there is no label in a small language for a very obscure item than this is less bad as when there is none for a much-used item. Not all items are created equal. We should keep that in mind when interpreting statistics. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l