Hey,
I made some significant changes based on feedbacks

* Per suggestion of Nemo_bis I added reference-based analysis: Here's
<http://tools.wmflabs.org/wd-analyst/ref.php?p=P143&q=Q328|Q11920&pp=P31>
an example
* I added limit parameter which you can get more results if you want (both
for reference-based and property-based analysis) for example:
http://tools.wmflabs.org/wd-analyst/index.php?p=P31&q=&limit=50 (Maximum
acceptable value is 50)
* Per suggestion of André I added a column to the database and results
which gives you number of percentage of unsourced statements. Obviously it
doesn't apply to reference-based analysis. for example
https://tools.wmflabs.org/wd-analyst/index.php?p=P1082&q= shows only 2% of
statements of population are unsourced

For Gerard suggestion. It's definitely a good idea but problem is it's
technically hard because every week it makes the databse twice as big. We
can store only a limited number (e.g. last three weeks) or apply this to a
limited number of value-pair properties. I'm looking to find out which one
is better.

Best


On Thu, Dec 10, 2015 at 12:13 AM André Costa <andre.co...@wikimedia.se>
wrote:

> Nice tool!
>
> To understand the statistics better.
> If a claim has two sources, one wikipedia and one other, how does that
> show up in the statistics?
>
> The reason I'm wondering is because I would normally care if a claim is
> sourced or not (but not by how many sources) and whether it is sourced by
> only Wikipedias or anything else.
>
> E.g.
> 1) a statment with 10 claims each sourced is "better" than one with 10
> claims where one claim has 10 sources.
> 2) a statement with a wiki source + another source is "better" than on
> with just a wiki source and just as "good" as one without the wiki source.
>
> Also is wiki ref/source Wikipedia only or any Wikimedia project? Whilst
> (last I checked) the others were only 70,000 refs compared to the 21
> million from Wikipedia they might be significant for certain domains and
> are just as "bad".
>
> Cheers,
> André
> On 9 Dec 2015 10:37, "Gerard Meijssen" <gerard.meijs...@gmail.com> wrote:
>
>> Hoi,
>> What would be nice is to have an option to understand progress from one
>> dump to the next like you can with the Statistics by Magnus. Magnus also
>> has data on sources but this is more global.
>> Thanks,
>>      GerardM
>>
>> On 8 December 2015 at 21:41, Markus Krötzsch <
>> mar...@semantic-mediawiki.org> wrote:
>>
>>> Hi Amir,
>>>
>>> Very nice, thanks! I like the general approach of having a stand-alone
>>> tool for analysing the data, and maybe pointing you to issues. Like a
>>> dashboard for Wikidata editors.
>>>
>>> What backend technology are you using to produce these results? Is this
>>> live data or dumped data? One could also get those numbers from the SPARQL
>>> endpoint, but performance might be problematic (since you compute averages
>>> over all items; a custom approach would of course be much faster but then
>>> you have the data update problem).
>>>
>>> An obvious feature request would be to display entity ids as links to
>>> the appropriate page, and maybe with their labels (in a language of your
>>> choice).
>>>
>>> But overall very nice.
>>>
>>> Regards,
>>>
>>> Markus
>>>
>>>
>>> On 08.12.2015 18:48, Amir Ladsgroup wrote:
>>>
>>>> Hey,
>>>> There has been several discussion regarding quality of information in
>>>> Wikidata. I wanted to work on quality of wikidata but we don't have any
>>>> source of good information to see where we are ahead and where we are
>>>> behind. So I thought the best thing I can do is to make something to
>>>> show people how exactly sourced our data is with details. So here we
>>>> have *http://tools.wmflabs.org/wd-analyst/index.php*
>>>>
>>>> You can give only a property (let's say P31) and it gives you the four
>>>> most used values + analyze of sources and quality in overall (check this
>>>> out <http://tools.wmflabs.org/wd-analyst/index.php?p=P31>)
>>>>   and then you can see about ~33% of them are sources which 29.1% of
>>>> them are based on Wikipedia.
>>>> You can give a property and multiple values you want. Let's say you want
>>>> to compare P27:Q183 (Country of citizenship: Germany) and P27:Q30 (US)
>>>> Check this out
>>>> <http://tools.wmflabs.org/wd-analyst/index.php?p=P27&q=Q30|Q183>. And
>>>> you can see US biographies are more abundant (300K over 200K) but German
>>>> biographies are more descriptive (3.8 description per item over 3.2
>>>> description over item)
>>>>
>>>> One important note: Compare P31:Q5 (a trivial statement) 46% of them are
>>>> not sourced at all and 49% of them are based on Wikipedia **but* *get
>>>> this statistics for population properties (P1082
>>>> <http://tools.wmflabs.org/wd-analyst/index.php?p=P1082>) It's not a
>>>> trivial statement and we need to be careful about them. It turns out
>>>> there are slightly more than one reference per statement and only 4% of
>>>> them are based on Wikipedia. So we can relax and enjoy these
>>>> highly-sourced data.
>>>>
>>>> Requests:
>>>>
>>>>   * Please tell me whether do you want this tool at all
>>>>   * Please suggest more ways to analyze and catch unsourced materials
>>>>
>>>> Future plan (if you agree to keep using this tool):
>>>>
>>>>   * Support more datatypes (e.g. date of birth based on year,
>>>> coordinates)
>>>>   * Sitelink-based and reference-based analysis (to check how much of
>>>>     articles of, let's say, Chinese Wikipedia are unsourced)
>>>>
>>>>   * Free-style analysis: There is a database for this tool that can be
>>>>     used for way more applications. You can get the most unsourced
>>>>     statements of P31 and then you can go to fix them. I'm trying to
>>>>     build a playground for this kind of tasks)
>>>>
>>>> I hope you like this and rock on!
>>>> <http://tools.wmflabs.org/wd-analyst/index.php?p=P136&q=Q11399>
>>>> Best
>>>>
>>>>
>>>> _______________________________________________
>>>> Wikidata mailing list
>>>> Wikidata@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>
>>
>> _______________________________________________
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to