Re: [Wikidata] Wikidata Analyst, a tool to comprehensively analyze quality of Wikidata

André Costa Wed, 09 Dec 2015 12:45:09 -0800

Nice tool!

To understand the statistics better.
If a claim has two sources, one wikipedia and one other, how does that show
up in the statistics?


The reason I'm wondering is because I would normally care if a claim is
sourced or not (but not by how many sources) and whether it is sourced by
only Wikipedias or anything else.

E.g.
1) a statment with 10 claims each sourced is "better" than one with 10
claims where one claim has 10 sources.
2) a statement with a wiki source + another source is "better" than on with
just a wiki source and just as "good" as one without the wiki source.

Also is wiki ref/source Wikipedia only or any Wikimedia project? Whilst
(last I checked) the others were only 70,000 refs compared to the 21
million from Wikipedia they might be significant for certain domains and
are just as "bad".

Cheers,
André
On 9 Dec 2015 10:37, "Gerard Meijssen" <gerard.meijs...@gmail.com> wrote:

> Hoi,
> What would be nice is to have an option to understand progress from one
> dump to the next like you can with the Statistics by Magnus. Magnus also
> has data on sources but this is more global.
> Thanks,
>      GerardM
>
> On 8 December 2015 at 21:41, Markus Krötzsch <
> mar...@semantic-mediawiki.org> wrote:
>
>> Hi Amir,
>>
>> Very nice, thanks! I like the general approach of having a stand-alone
>> tool for analysing the data, and maybe pointing you to issues. Like a
>> dashboard for Wikidata editors.
>>
>> What backend technology are you using to produce these results? Is this
>> live data or dumped data? One could also get those numbers from the SPARQL
>> endpoint, but performance might be problematic (since you compute averages
>> over all items; a custom approach would of course be much faster but then
>> you have the data update problem).
>>
>> An obvious feature request would be to display entity ids as links to the
>> appropriate page, and maybe with their labels (in a language of your
>> choice).
>>
>> But overall very nice.
>>
>> Regards,
>>
>> Markus
>>
>>
>> On 08.12.2015 18:48, Amir Ladsgroup wrote:
>>
>>> Hey,
>>> There has been several discussion regarding quality of information in
>>> Wikidata. I wanted to work on quality of wikidata but we don't have any
>>> source of good information to see where we are ahead and where we are
>>> behind. So I thought the best thing I can do is to make something to
>>> show people how exactly sourced our data is with details. So here we
>>> have *http://tools.wmflabs.org/wd-analyst/index.php*
>>>
>>> You can give only a property (let's say P31) and it gives you the four
>>> most used values + analyze of sources and quality in overall (check this
>>> out <http://tools.wmflabs.org/wd-analyst/index.php?p=P31>)
>>>   and then you can see about ~33% of them are sources which 29.1% of
>>> them are based on Wikipedia.
>>> You can give a property and multiple values you want. Let's say you want
>>> to compare P27:Q183 (Country of citizenship: Germany) and P27:Q30 (US)
>>> Check this out
>>> <http://tools.wmflabs.org/wd-analyst/index.php?p=P27&q=Q30|Q183>. And
>>> you can see US biographies are more abundant (300K over 200K) but German
>>> biographies are more descriptive (3.8 description per item over 3.2
>>> description over item)
>>>
>>> One important note: Compare P31:Q5 (a trivial statement) 46% of them are
>>> not sourced at all and 49% of them are based on Wikipedia **but* *get
>>> this statistics for population properties (P1082
>>> <http://tools.wmflabs.org/wd-analyst/index.php?p=P1082>) It's not a
>>> trivial statement and we need to be careful about them. It turns out
>>> there are slightly more than one reference per statement and only 4% of
>>> them are based on Wikipedia. So we can relax and enjoy these
>>> highly-sourced data.
>>>
>>> Requests:
>>>
>>>   * Please tell me whether do you want this tool at all
>>>   * Please suggest more ways to analyze and catch unsourced materials
>>>
>>> Future plan (if you agree to keep using this tool):
>>>
>>>   * Support more datatypes (e.g. date of birth based on year,
>>> coordinates)
>>>   * Sitelink-based and reference-based analysis (to check how much of
>>>     articles of, let's say, Chinese Wikipedia are unsourced)
>>>
>>>   * Free-style analysis: There is a database for this tool that can be
>>>     used for way more applications. You can get the most unsourced
>>>     statements of P31 and then you can go to fix them. I'm trying to
>>>     build a playground for this kind of tasks)
>>>
>>> I hope you like this and rock on!
>>> <http://tools.wmflabs.org/wd-analyst/index.php?p=P136&q=Q11399>
>>> Best
>>>
>>>
>>> _______________________________________________
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>> _______________________________________________
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikidata Analyst, a tool to comprehensively analyze quality of Wikidata

Reply via email to