Nice tool! To understand the statistics better. If a claim has two sources, one wikipedia and one other, how does that show up in the statistics?
The reason I'm wondering is because I would normally care if a claim is sourced or not (but not by how many sources) and whether it is sourced by only Wikipedias or anything else. E.g. 1) a statment with 10 claims each sourced is "better" than one with 10 claims where one claim has 10 sources. 2) a statement with a wiki source + another source is "better" than on with just a wiki source and just as "good" as one without the wiki source. Also is wiki ref/source Wikipedia only or any Wikimedia project? Whilst (last I checked) the others were only 70,000 refs compared to the 21 million from Wikipedia they might be significant for certain domains and are just as "bad". Cheers, André On 9 Dec 2015 10:37, "Gerard Meijssen" <gerard.meijs...@gmail.com> wrote: > Hoi, > What would be nice is to have an option to understand progress from one > dump to the next like you can with the Statistics by Magnus. Magnus also > has data on sources but this is more global. > Thanks, > GerardM > > On 8 December 2015 at 21:41, Markus Krötzsch < > mar...@semantic-mediawiki.org> wrote: > >> Hi Amir, >> >> Very nice, thanks! I like the general approach of having a stand-alone >> tool for analysing the data, and maybe pointing you to issues. Like a >> dashboard for Wikidata editors. >> >> What backend technology are you using to produce these results? Is this >> live data or dumped data? One could also get those numbers from the SPARQL >> endpoint, but performance might be problematic (since you compute averages >> over all items; a custom approach would of course be much faster but then >> you have the data update problem). >> >> An obvious feature request would be to display entity ids as links to the >> appropriate page, and maybe with their labels (in a language of your >> choice). >> >> But overall very nice. >> >> Regards, >> >> Markus >> >> >> On 08.12.2015 18:48, Amir Ladsgroup wrote: >> >>> Hey, >>> There has been several discussion regarding quality of information in >>> Wikidata. I wanted to work on quality of wikidata but we don't have any >>> source of good information to see where we are ahead and where we are >>> behind. So I thought the best thing I can do is to make something to >>> show people how exactly sourced our data is with details. So here we >>> have *http://tools.wmflabs.org/wd-analyst/index.php* >>> >>> You can give only a property (let's say P31) and it gives you the four >>> most used values + analyze of sources and quality in overall (check this >>> out <http://tools.wmflabs.org/wd-analyst/index.php?p=P31>) >>> and then you can see about ~33% of them are sources which 29.1% of >>> them are based on Wikipedia. >>> You can give a property and multiple values you want. Let's say you want >>> to compare P27:Q183 (Country of citizenship: Germany) and P27:Q30 (US) >>> Check this out >>> <http://tools.wmflabs.org/wd-analyst/index.php?p=P27&q=Q30|Q183>. And >>> you can see US biographies are more abundant (300K over 200K) but German >>> biographies are more descriptive (3.8 description per item over 3.2 >>> description over item) >>> >>> One important note: Compare P31:Q5 (a trivial statement) 46% of them are >>> not sourced at all and 49% of them are based on Wikipedia **but* *get >>> this statistics for population properties (P1082 >>> <http://tools.wmflabs.org/wd-analyst/index.php?p=P1082>) It's not a >>> trivial statement and we need to be careful about them. It turns out >>> there are slightly more than one reference per statement and only 4% of >>> them are based on Wikipedia. So we can relax and enjoy these >>> highly-sourced data. >>> >>> Requests: >>> >>> * Please tell me whether do you want this tool at all >>> * Please suggest more ways to analyze and catch unsourced materials >>> >>> Future plan (if you agree to keep using this tool): >>> >>> * Support more datatypes (e.g. date of birth based on year, >>> coordinates) >>> * Sitelink-based and reference-based analysis (to check how much of >>> articles of, let's say, Chinese Wikipedia are unsourced) >>> >>> * Free-style analysis: There is a database for this tool that can be >>> used for way more applications. You can get the most unsourced >>> statements of P31 and then you can go to fix them. I'm trying to >>> build a playground for this kind of tasks) >>> >>> I hope you like this and rock on! >>> <http://tools.wmflabs.org/wd-analyst/index.php?p=P136&q=Q11399> >>> Best >>> >>> >>> _______________________________________________ >>> Wikidata mailing list >>> Wikidata@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>> >>> >> >> _______________________________________________ >> Wikidata mailing list >> Wikidata@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikidata >> > > > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > >
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata