Hey Markus,

On Wed, Dec 9, 2015 at 12:12 AM Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> Hi Amir,
>
> Very nice, thanks! I like the general approach of having a stand-alone
> tool for analysing the data, and maybe pointing you to issues. Like a
> dashboard for Wikidata editors.
>
> What backend technology are you using to produce these results? Is this
> live data or dumped data? One could also get those numbers from the
> SPARQL endpoint, but performance might be problematic (since you compute
> averages over all items; a custom approach would of course be much
> faster but then you have the data update problem).
>
I build a database based on weekly JSON dumps. we would have some delay in
the data but computationally it's fast. Using Wikidata database directly
makes performance so poor that it becomes a good attack point.


> An obvious feature request would be to display entity ids as links to
> the appropriate page, and maybe with their labels (in a language of your
> choice).
>
> Done. :)

> But overall very nice.
>
> Regards,
>
> Markus
>
>
> On 08.12.2015 18:48, Amir Ladsgroup wrote:
> > Hey,
> > There has been several discussion regarding quality of information in
> > Wikidata. I wanted to work on quality of wikidata but we don't have any
> > source of good information to see where we are ahead and where we are
> > behind. So I thought the best thing I can do is to make something to
> > show people how exactly sourced our data is with details. So here we
> > have *http://tools.wmflabs.org/wd-analyst/index.php*
> >
> > You can give only a property (let's say P31) and it gives you the four
> > most used values + analyze of sources and quality in overall (check this
> > out <http://tools.wmflabs.org/wd-analyst/index.php?p=P31>)
> >   and then you can see about ~33% of them are sources which 29.1% of
> > them are based on Wikipedia.
> > You can give a property and multiple values you want. Let's say you want
> > to compare P27:Q183 (Country of citizenship: Germany) and P27:Q30 (US)
> > Check this out
> > <http://tools.wmflabs.org/wd-analyst/index.php?p=P27&q=Q30|Q183>. And
> > you can see US biographies are more abundant (300K over 200K) but German
> > biographies are more descriptive (3.8 description per item over 3.2
> > description over item)
> >
> > One important note: Compare P31:Q5 (a trivial statement) 46% of them are
> > not sourced at all and 49% of them are based on Wikipedia **but* *get
> > this statistics for population properties (P1082
> > <http://tools.wmflabs.org/wd-analyst/index.php?p=P1082>) It's not a
> > trivial statement and we need to be careful about them. It turns out
> > there are slightly more than one reference per statement and only 4% of
> > them are based on Wikipedia. So we can relax and enjoy these
> > highly-sourced data.
> >
> > Requests:
> >
> >   * Please tell me whether do you want this tool at all
> >   * Please suggest more ways to analyze and catch unsourced materials
> >
> > Future plan (if you agree to keep using this tool):
> >
> >   * Support more datatypes (e.g. date of birth based on year,
> coordinates)
> >   * Sitelink-based and reference-based analysis (to check how much of
> >     articles of, let's say, Chinese Wikipedia are unsourced)
> >
> >   * Free-style analysis: There is a database for this tool that can be
> >     used for way more applications. You can get the most unsourced
> >     statements of P31 and then you can go to fix them. I'm trying to
> >     build a playground for this kind of tasks)
> >
> > I hope you like this and rock on!
> > <http://tools.wmflabs.org/wd-analyst/index.php?p=P136&q=Q11399>
> > Best
> >
> >
> > _______________________________________________
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to