Hey Markus, On Wed, Dec 9, 2015 at 12:12 AM Markus Krötzsch < mar...@semantic-mediawiki.org> wrote:
> Hi Amir, > > Very nice, thanks! I like the general approach of having a stand-alone > tool for analysing the data, and maybe pointing you to issues. Like a > dashboard for Wikidata editors. > > What backend technology are you using to produce these results? Is this > live data or dumped data? One could also get those numbers from the > SPARQL endpoint, but performance might be problematic (since you compute > averages over all items; a custom approach would of course be much > faster but then you have the data update problem). > I build a database based on weekly JSON dumps. we would have some delay in the data but computationally it's fast. Using Wikidata database directly makes performance so poor that it becomes a good attack point. > An obvious feature request would be to display entity ids as links to > the appropriate page, and maybe with their labels (in a language of your > choice). > > Done. :) > But overall very nice. > > Regards, > > Markus > > > On 08.12.2015 18:48, Amir Ladsgroup wrote: > > Hey, > > There has been several discussion regarding quality of information in > > Wikidata. I wanted to work on quality of wikidata but we don't have any > > source of good information to see where we are ahead and where we are > > behind. So I thought the best thing I can do is to make something to > > show people how exactly sourced our data is with details. So here we > > have *http://tools.wmflabs.org/wd-analyst/index.php* > > > > You can give only a property (let's say P31) and it gives you the four > > most used values + analyze of sources and quality in overall (check this > > out <http://tools.wmflabs.org/wd-analyst/index.php?p=P31>) > > and then you can see about ~33% of them are sources which 29.1% of > > them are based on Wikipedia. > > You can give a property and multiple values you want. Let's say you want > > to compare P27:Q183 (Country of citizenship: Germany) and P27:Q30 (US) > > Check this out > > <http://tools.wmflabs.org/wd-analyst/index.php?p=P27&q=Q30|Q183>. And > > you can see US biographies are more abundant (300K over 200K) but German > > biographies are more descriptive (3.8 description per item over 3.2 > > description over item) > > > > One important note: Compare P31:Q5 (a trivial statement) 46% of them are > > not sourced at all and 49% of them are based on Wikipedia **but* *get > > this statistics for population properties (P1082 > > <http://tools.wmflabs.org/wd-analyst/index.php?p=P1082>) It's not a > > trivial statement and we need to be careful about them. It turns out > > there are slightly more than one reference per statement and only 4% of > > them are based on Wikipedia. So we can relax and enjoy these > > highly-sourced data. > > > > Requests: > > > > * Please tell me whether do you want this tool at all > > * Please suggest more ways to analyze and catch unsourced materials > > > > Future plan (if you agree to keep using this tool): > > > > * Support more datatypes (e.g. date of birth based on year, > coordinates) > > * Sitelink-based and reference-based analysis (to check how much of > > articles of, let's say, Chinese Wikipedia are unsourced) > > > > * Free-style analysis: There is a database for this tool that can be > > used for way more applications. You can get the most unsourced > > statements of P31 and then you can go to fix them. I'm trying to > > build a playground for this kind of tasks) > > > > I hope you like this and rock on! > > <http://tools.wmflabs.org/wd-analyst/index.php?p=P136&q=Q11399> > > Best > > > > > > _______________________________________________ > > Wikidata mailing list > > Wikidata@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikidata > > > > > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata >
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata