I also published the source code (it's based on python and PHP) PRs are
welcome
https://github.com/Ladsgroup/wd-analyst

On Wed, Dec 9, 2015 at 7:20 AM Amir Ladsgroup <ladsgr...@gmail.com> wrote:

> Hey Markus,
>
> On Wed, Dec 9, 2015 at 12:12 AM Markus Krötzsch <
> mar...@semantic-mediawiki.org> wrote:
>
>> Hi Amir,
>>
>> Very nice, thanks! I like the general approach of having a stand-alone
>> tool for analysing the data, and maybe pointing you to issues. Like a
>> dashboard for Wikidata editors.
>>
>> What backend technology are you using to produce these results? Is this
>> live data or dumped data? One could also get those numbers from the
>> SPARQL endpoint, but performance might be problematic (since you compute
>> averages over all items; a custom approach would of course be much
>> faster but then you have the data update problem).
>>
> I build a database based on weekly JSON dumps. we would have some delay in
> the data but computationally it's fast. Using Wikidata database directly
> makes performance so poor that it becomes a good attack point.
>
>
>> An obvious feature request would be to display entity ids as links to
>> the appropriate page, and maybe with their labels (in a language of your
>> choice).
>>
>> Done. :)
>
>> But overall very nice.
>>
>> Regards,
>>
>> Markus
>>
>>
>> On 08.12.2015 18:48, Amir Ladsgroup wrote:
>> > Hey,
>> > There has been several discussion regarding quality of information in
>> > Wikidata. I wanted to work on quality of wikidata but we don't have any
>> > source of good information to see where we are ahead and where we are
>> > behind. So I thought the best thing I can do is to make something to
>> > show people how exactly sourced our data is with details. So here we
>> > have *http://tools.wmflabs.org/wd-analyst/index.php*
>> >
>> > You can give only a property (let's say P31) and it gives you the four
>> > most used values + analyze of sources and quality in overall (check this
>> > out <http://tools.wmflabs.org/wd-analyst/index.php?p=P31>)
>> >   and then you can see about ~33% of them are sources which 29.1% of
>> > them are based on Wikipedia.
>> > You can give a property and multiple values you want. Let's say you want
>> > to compare P27:Q183 (Country of citizenship: Germany) and P27:Q30 (US)
>> > Check this out
>> > <http://tools.wmflabs.org/wd-analyst/index.php?p=P27&q=Q30|Q183>. And
>> > you can see US biographies are more abundant (300K over 200K) but German
>> > biographies are more descriptive (3.8 description per item over 3.2
>> > description over item)
>> >
>> > One important note: Compare P31:Q5 (a trivial statement) 46% of them are
>> > not sourced at all and 49% of them are based on Wikipedia **but* *get
>> > this statistics for population properties (P1082
>> > <http://tools.wmflabs.org/wd-analyst/index.php?p=P1082>) It's not a
>> > trivial statement and we need to be careful about them. It turns out
>> > there are slightly more than one reference per statement and only 4% of
>> > them are based on Wikipedia. So we can relax and enjoy these
>> > highly-sourced data.
>> >
>> > Requests:
>> >
>> >   * Please tell me whether do you want this tool at all
>> >   * Please suggest more ways to analyze and catch unsourced materials
>> >
>> > Future plan (if you agree to keep using this tool):
>> >
>> >   * Support more datatypes (e.g. date of birth based on year,
>> coordinates)
>> >   * Sitelink-based and reference-based analysis (to check how much of
>> >     articles of, let's say, Chinese Wikipedia are unsourced)
>> >
>> >   * Free-style analysis: There is a database for this tool that can be
>> >     used for way more applications. You can get the most unsourced
>> >     statements of P31 and then you can go to fix them. I'm trying to
>> >     build a playground for this kind of tasks)
>> >
>> > I hope you like this and rock on!
>> > <http://tools.wmflabs.org/wd-analyst/index.php?p=P136&q=Q11399>
>> > Best
>> >
>> >
>> > _______________________________________________
>> > Wikidata mailing list
>> > Wikidata@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>> >
>>
>>
>> _______________________________________________
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to