Hey, I made some significant changes based on feedbacks * Per suggestion of Nemo_bis I added reference-based analysis: Here's <http://tools.wmflabs.org/wd-analyst/ref.php?p=P143&q=Q328|Q11920&pp=P31> an example * I added limit parameter which you can get more results if you want (both for reference-based and property-based analysis) for example: http://tools.wmflabs.org/wd-analyst/index.php?p=P31&q=&limit=50 (Maximum acceptable value is 50) * Per suggestion of André I added a column to the database and results which gives you number of percentage of unsourced statements. Obviously it doesn't apply to reference-based analysis. for example https://tools.wmflabs.org/wd-analyst/index.php?p=P1082&q= shows only 2% of statements of population are unsourced
For Gerard suggestion. It's definitely a good idea but problem is it's technically hard because every week it makes the databse twice as big. We can store only a limited number (e.g. last three weeks) or apply this to a limited number of value-pair properties. I'm looking to find out which one is better. Best On Thu, Dec 10, 2015 at 12:13 AM André Costa <andre.co...@wikimedia.se> wrote: > Nice tool! > > To understand the statistics better. > If a claim has two sources, one wikipedia and one other, how does that > show up in the statistics? > > The reason I'm wondering is because I would normally care if a claim is > sourced or not (but not by how many sources) and whether it is sourced by > only Wikipedias or anything else. > > E.g. > 1) a statment with 10 claims each sourced is "better" than one with 10 > claims where one claim has 10 sources. > 2) a statement with a wiki source + another source is "better" than on > with just a wiki source and just as "good" as one without the wiki source. > > Also is wiki ref/source Wikipedia only or any Wikimedia project? Whilst > (last I checked) the others were only 70,000 refs compared to the 21 > million from Wikipedia they might be significant for certain domains and > are just as "bad". > > Cheers, > André > On 9 Dec 2015 10:37, "Gerard Meijssen" <gerard.meijs...@gmail.com> wrote: > >> Hoi, >> What would be nice is to have an option to understand progress from one >> dump to the next like you can with the Statistics by Magnus. Magnus also >> has data on sources but this is more global. >> Thanks, >> GerardM >> >> On 8 December 2015 at 21:41, Markus Krötzsch < >> mar...@semantic-mediawiki.org> wrote: >> >>> Hi Amir, >>> >>> Very nice, thanks! I like the general approach of having a stand-alone >>> tool for analysing the data, and maybe pointing you to issues. Like a >>> dashboard for Wikidata editors. >>> >>> What backend technology are you using to produce these results? Is this >>> live data or dumped data? One could also get those numbers from the SPARQL >>> endpoint, but performance might be problematic (since you compute averages >>> over all items; a custom approach would of course be much faster but then >>> you have the data update problem). >>> >>> An obvious feature request would be to display entity ids as links to >>> the appropriate page, and maybe with their labels (in a language of your >>> choice). >>> >>> But overall very nice. >>> >>> Regards, >>> >>> Markus >>> >>> >>> On 08.12.2015 18:48, Amir Ladsgroup wrote: >>> >>>> Hey, >>>> There has been several discussion regarding quality of information in >>>> Wikidata. I wanted to work on quality of wikidata but we don't have any >>>> source of good information to see where we are ahead and where we are >>>> behind. So I thought the best thing I can do is to make something to >>>> show people how exactly sourced our data is with details. So here we >>>> have *http://tools.wmflabs.org/wd-analyst/index.php* >>>> >>>> You can give only a property (let's say P31) and it gives you the four >>>> most used values + analyze of sources and quality in overall (check this >>>> out <http://tools.wmflabs.org/wd-analyst/index.php?p=P31>) >>>> and then you can see about ~33% of them are sources which 29.1% of >>>> them are based on Wikipedia. >>>> You can give a property and multiple values you want. Let's say you want >>>> to compare P27:Q183 (Country of citizenship: Germany) and P27:Q30 (US) >>>> Check this out >>>> <http://tools.wmflabs.org/wd-analyst/index.php?p=P27&q=Q30|Q183>. And >>>> you can see US biographies are more abundant (300K over 200K) but German >>>> biographies are more descriptive (3.8 description per item over 3.2 >>>> description over item) >>>> >>>> One important note: Compare P31:Q5 (a trivial statement) 46% of them are >>>> not sourced at all and 49% of them are based on Wikipedia **but* *get >>>> this statistics for population properties (P1082 >>>> <http://tools.wmflabs.org/wd-analyst/index.php?p=P1082>) It's not a >>>> trivial statement and we need to be careful about them. It turns out >>>> there are slightly more than one reference per statement and only 4% of >>>> them are based on Wikipedia. So we can relax and enjoy these >>>> highly-sourced data. >>>> >>>> Requests: >>>> >>>> * Please tell me whether do you want this tool at all >>>> * Please suggest more ways to analyze and catch unsourced materials >>>> >>>> Future plan (if you agree to keep using this tool): >>>> >>>> * Support more datatypes (e.g. date of birth based on year, >>>> coordinates) >>>> * Sitelink-based and reference-based analysis (to check how much of >>>> articles of, let's say, Chinese Wikipedia are unsourced) >>>> >>>> * Free-style analysis: There is a database for this tool that can be >>>> used for way more applications. You can get the most unsourced >>>> statements of P31 and then you can go to fix them. I'm trying to >>>> build a playground for this kind of tasks) >>>> >>>> I hope you like this and rock on! >>>> <http://tools.wmflabs.org/wd-analyst/index.php?p=P136&q=Q11399> >>>> Best >>>> >>>> >>>> _______________________________________________ >>>> Wikidata mailing list >>>> Wikidata@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>>> >>>> >>> >>> _______________________________________________ >>> Wikidata mailing list >>> Wikidata@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>> >> >> >> _______________________________________________ >> Wikidata mailing list >> Wikidata@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikidata >> >> _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata >
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata