Re: [Wikidata] Nobel Prizes and consensus in Wikidata

2019-09-27 Thread Magnus Sälgö
FYI we have SPARQL Federation with Nobelprize.com see 
T200668

[cid:3673ede4-ebd4-443c-8e45-e5b655f39ac7]
T200668 Set up Nobel Data as federated search with 
Wikidata
Feedback Hans Mehlin - Nobel Media AB. Kul! Wikidata är högt upp min min 
önskelista. Vet att jag har fullt upp med annat till mitten av oktober. Hoppas 
sedan få mandat att arbeta mer med våra datamängder.
phabricator.wikimedia.org

And a Listeria list that every night compare Wikidata <-> with Nobelprize.com
https://www.wikidata.org/wiki/User:Salgo60/ListeriaNobelData3

Regards
Magnus Sälgö
Stockholm, Sweden
salg...@msn.com


From: Wikidata  on behalf of Peter 
Patel-Schneider 
Sent: Saturday, September 28, 2019 6:36 AM
To: wikidata@lists.wikimedia.org 
Subject: Re: [Wikidata] Nobel Prizes and consensus in Wikidata

Indeed.   Thanks for the example.  I'll probably incorporate it in my
talk at WikidataCon.


As far as I know there is no general method for nudging towards
consensus for cases like these.  The onus appears to me to be on whoever
is entering the information to look for similar situations and model
them all the same.  (In this case it appears that a recent change to the
Nobel Peace Prize was made to remove it being a subclass of Nobel Prize,
actually reducing commonality.)

But what can be done in the future?  One way to go is to ask that
editors be more careful when editing items that might belong to a group,
and try to model them the same as other members of the group.  Another
way to go is to ask that editors be more careful when editing items that
have parts/instances/subclasses and check that all the other items are
modeled the same way.

I prefer something similar to the second way, where editors of classes
and properties (or just about anything that is going to be the common
target of a property, but instance and subclass and subproperty seem to
me to be the most important such properties) are asked to be careful to
specify the relationship between the class or property and the other
items that target it.  So whoever does major editing on Nobel Prize
should add a comment on the relationship between the various Nobel
Prizes and Nobel Prize. (Having such information is quite common for
concepts in Cyc.)

Actually Nobel Prize isn't the greatest example for my preference
because there doesn't seem to be any Wikidata items for the even the
famous Nobel Prizes.   Suppose there was a Wikidata item for Einstein's
Nobel Prize in Physics.  Then its relationship to Nobel Prize would
provide guidance for the relationship between the Nobel Prize in Physics
and Nobel Prizes itself.


I find modeling deficiencies like this in lots of places in Wikidata.
That's not a severe problem if you have the resources of Google to throw
at curating Wikidata information.  But if you don't have this level of
resources available for curating Wikidata information then these sorts
of infelicities are a significant barrier to using Wikidata.


Peter F. Patel-Schneider



On 9/27/19 12:34 PM, Aidan Hogan wrote:
> Hey all,
>
> Andra recently mentioned about finding laureates in Wikidata, and it
> reminded me that some weeks ago I was trying to come up with a SPARQL
> query to find all Nobel Prize Winners in Wikidata.
>
> What I ended up with was:
>
> SELECT ?winner
> WHERE {
>   ?winner wdt:P166 ?prize .
>   ?prize (wdt:P361|wdt:P31|wdt:P279) wd:Q7191 .
> }
>
>
> More specifically, looking into the data I found:
>
> Nobel Peace Prize (Q35637)
>  part of (P361)
>   Nobel Prize (Q7191) .
>
> Nobel Prize in Literature (Q37922)
>  subclass of (P279)
>   Nobel Prize (Q7191) .
>
> Nobel Prize in Economics (Q47170)
>  instance of (P31)
>Nobel Prize (Q7191) ;
>  part of (P361)
>Nobel Prize (Q7191) .
>
> Nobel Prize in Chemistry (Q44585)
>  instance of (P31)
>Nobel Prize (Q7191) ;
>  part of (P361)
>Nobel Prize (Q7191) .
>
> Nobel Prize in Physics (Q38104)
>  subclass of (P31)
>Nobel Prize (Q7191) ;
>  part of (P361)
>Nobel Prize (Q7191) .
>
> In summary, of the six types of Nobel prizes, three different
> properties are used in five different combinations to state that they
> "are", in fact, Nobel prizes. :)
>
> Now while it would be interesting to discuss the relative merits of
> P31 vs. P279 vs. P361 vs. some combination thereof in this case and
> similar such cases, I guess I am more interested in the general
> problem of the lack of consensus that such a case exhibits.
>
> What processes (be they social, technical, or some combination
> thereof) are currently in place to reach consensus in these cases in
> Wikidata?
>
> What could be put in place in future to highlight and reach consensus?
>
> Or is the idea more to leave the burden of "integrating" different
> viewpoints to the consumer (e.g., to the person writing the query)?
>
> (Of course these are all "million dollar questions" that have been
> with the Semantic Web

Re: [Wikidata] Nobel Prizes and consensus in Wikidata

2019-09-27 Thread Peter Patel-Schneider
Indeed.   Thanks for the example.  I'll probably incorporate it in my 
talk at WikidataCon.



As far as I know there is no general method for nudging towards 
consensus for cases like these.  The onus appears to me to be on whoever 
is entering the information to look for similar situations and model 
them all the same.  (In this case it appears that a recent change to the 
Nobel Peace Prize was made to remove it being a subclass of Nobel Prize, 
actually reducing commonality.)


But what can be done in the future?  One way to go is to ask that 
editors be more careful when editing items that might belong to a group, 
and try to model them the same as other members of the group.  Another 
way to go is to ask that editors be more careful when editing items that 
have parts/instances/subclasses and check that all the other items are 
modeled the same way.


I prefer something similar to the second way, where editors of classes 
and properties (or just about anything that is going to be the common 
target of a property, but instance and subclass and subproperty seem to 
me to be the most important such properties) are asked to be careful to 
specify the relationship between the class or property and the other 
items that target it.  So whoever does major editing on Nobel Prize 
should add a comment on the relationship between the various Nobel 
Prizes and Nobel Prize. (Having such information is quite common for 
concepts in Cyc.)


Actually Nobel Prize isn't the greatest example for my preference 
because there doesn't seem to be any Wikidata items for the even the 
famous Nobel Prizes.   Suppose there was a Wikidata item for Einstein's 
Nobel Prize in Physics.  Then its relationship to Nobel Prize would 
provide guidance for the relationship between the Nobel Prize in Physics 
and Nobel Prizes itself.



I find modeling deficiencies like this in lots of places in Wikidata.  
That's not a severe problem if you have the resources of Google to throw 
at curating Wikidata information.  But if you don't have this level of 
resources available for curating Wikidata information then these sorts 
of infelicities are a significant barrier to using Wikidata.



Peter F. Patel-Schneider



On 9/27/19 12:34 PM, Aidan Hogan wrote:

Hey all,

Andra recently mentioned about finding laureates in Wikidata, and it 
reminded me that some weeks ago I was trying to come up with a SPARQL 
query to find all Nobel Prize Winners in Wikidata.


What I ended up with was:

SELECT ?winner
WHERE {
  ?winner wdt:P166 ?prize .
  ?prize (wdt:P361|wdt:P31|wdt:P279) wd:Q7191 .
}


More specifically, looking into the data I found:

Nobel Peace Prize (Q35637)
 part of (P361)
  Nobel Prize (Q7191) .

Nobel Prize in Literature (Q37922)
 subclass of (P279)
  Nobel Prize (Q7191) .

Nobel Prize in Economics (Q47170)
 instance of (P31)
   Nobel Prize (Q7191) ;
 part of (P361)
   Nobel Prize (Q7191) .

Nobel Prize in Chemistry (Q44585)
 instance of (P31)
   Nobel Prize (Q7191) ;
 part of (P361)
   Nobel Prize (Q7191) .

Nobel Prize in Physics (Q38104)
 subclass of (P31)
   Nobel Prize (Q7191) ;
 part of (P361)
   Nobel Prize (Q7191) .

In summary, of the six types of Nobel prizes, three different 
properties are used in five different combinations to state that they 
"are", in fact, Nobel prizes. :)


Now while it would be interesting to discuss the relative merits of 
P31 vs. P279 vs. P361 vs. some combination thereof in this case and 
similar such cases, I guess I am more interested in the general 
problem of the lack of consensus that such a case exhibits.


What processes (be they social, technical, or some combination 
thereof) are currently in place to reach consensus in these cases in 
Wikidata?


What could be put in place in future to highlight and reach consensus?

Or is the idea more to leave the burden of "integrating" different 
viewpoints to the consumer (e.g., to the person writing the query)?


(Of course these are all "million dollar questions" that have been 
with the Semantic Web since the beginning, but I am curious about what 
is being done or can be done in the specific context of Wikidata to 
foster consensus and reduce heterogeneity in such cases.)


Best,
Aidan

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Nobel Prizes and consensus in Wikidata

2019-09-27 Thread Thad Guidry
Why not flip the question around and instead find a better predicate using
the fantastic Wikidata Property Explorer
 and type in search tree for
"award" and click the results in the tree ?

I found these that are useful:

https://www.wikidata.org/entity/P166
https://www.wikidata.org/entity/P1027
https://www.wikidata.org/entity/P1411

Thad
https://www.linkedin.com/in/thadguidry/


On Fri, Sep 27, 2019 at 2:34 PM Aidan Hogan  wrote:

> Hey all,
>
> Andra recently mentioned about finding laureates in Wikidata, and it
> reminded me that some weeks ago I was trying to come up with a SPARQL
> query to find all Nobel Prize Winners in Wikidata.
>
> What I ended up with was:
>
> SELECT ?winner
> WHERE {
>?winner wdt:P166 ?prize .
>?prize (wdt:P361|wdt:P31|wdt:P279) wd:Q7191 .
> }
>
>
> More specifically, looking into the data I found:
>
> Nobel Peace Prize (Q35637)
>   part of (P361)
>Nobel Prize (Q7191) .
>
> Nobel Prize in Literature (Q37922)
>   subclass of (P279)
>Nobel Prize (Q7191) .
>
> Nobel Prize in Economics (Q47170)
>   instance of (P31)
> Nobel Prize (Q7191) ;
>   part of (P361)
> Nobel Prize (Q7191) .
>
> Nobel Prize in Chemistry (Q44585)
>   instance of (P31)
> Nobel Prize (Q7191) ;
>   part of (P361)
> Nobel Prize (Q7191) .
>
> Nobel Prize in Physics (Q38104)
>   subclass of (P31)
> Nobel Prize (Q7191) ;
>   part of (P361)
> Nobel Prize (Q7191) .
>
> In summary, of the six types of Nobel prizes, three different properties
> are used in five different combinations to state that they "are", in
> fact, Nobel prizes. :)
>
> Now while it would be interesting to discuss the relative merits of P31
> vs. P279 vs. P361 vs. some combination thereof in this case and similar
> such cases, I guess I am more interested in the general problem of the
> lack of consensus that such a case exhibits.
>
> What processes (be they social, technical, or some combination thereof)
> are currently in place to reach consensus in these cases in Wikidata?
>
> What could be put in place in future to highlight and reach consensus?
>
> Or is the idea more to leave the burden of "integrating" different
> viewpoints to the consumer (e.g., to the person writing the query)?
>
> (Of course these are all "million dollar questions" that have been with
> the Semantic Web since the beginning, but I am curious about what is
> being done or can be done in the specific context of Wikidata to foster
> consensus and reduce heterogeneity in such cases.)
>
> Best,
> Aidan
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Nobel Prizes and consensus in Wikidata

2019-09-27 Thread Aidan Hogan

Hey all,

Andra recently mentioned about finding laureates in Wikidata, and it 
reminded me that some weeks ago I was trying to come up with a SPARQL 
query to find all Nobel Prize Winners in Wikidata.


What I ended up with was:

SELECT ?winner
WHERE {
  ?winner wdt:P166 ?prize .
  ?prize (wdt:P361|wdt:P31|wdt:P279) wd:Q7191 .
}


More specifically, looking into the data I found:

Nobel Peace Prize (Q35637)
 part of (P361)
  Nobel Prize (Q7191) .

Nobel Prize in Literature (Q37922)
 subclass of (P279)
  Nobel Prize (Q7191) .

Nobel Prize in Economics (Q47170)
 instance of (P31)
   Nobel Prize (Q7191) ;
 part of (P361)
   Nobel Prize (Q7191) .

Nobel Prize in Chemistry (Q44585)
 instance of (P31)
   Nobel Prize (Q7191) ;
 part of (P361)
   Nobel Prize (Q7191) .

Nobel Prize in Physics (Q38104)
 subclass of (P31)
   Nobel Prize (Q7191) ;
 part of (P361)
   Nobel Prize (Q7191) .

In summary, of the six types of Nobel prizes, three different properties 
are used in five different combinations to state that they "are", in 
fact, Nobel prizes. :)


Now while it would be interesting to discuss the relative merits of P31 
vs. P279 vs. P361 vs. some combination thereof in this case and similar 
such cases, I guess I am more interested in the general problem of the 
lack of consensus that such a case exhibits.


What processes (be they social, technical, or some combination thereof) 
are currently in place to reach consensus in these cases in Wikidata?


What could be put in place in future to highlight and reach consensus?

Or is the idea more to leave the burden of "integrating" different 
viewpoints to the consumer (e.g., to the person writing the query)?


(Of course these are all "million dollar questions" that have been with 
the Semantic Web since the beginning, but I am curious about what is 
being done or can be done in the specific context of Wikidata to foster 
consensus and reduce heterogeneity in such cases.)


Best,
Aidan

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Google's stake in Wikidata and Wikipedia

2019-09-27 Thread Gerard Meijssen
Hoi,
I totally reject the assertion was so bad. I have always had the opinion
that the main issue was an atrocious user interface. Add to this the people
that have Wikipedia notions about quality. They have and had a detrimental
effect on both the quantity and quality of Wikidata.

When you add the functionality that is being build by the datawranglers at
DBpedia, it becomes easy/easier to compare the data from Wikipedias with
Wikidata (and why not Freebase) add what has consensus and curate the
differences. This will enable a true datasense of quality and allows us to
provide a much improved service.
Thanks,
  GerardM

On Fri, 27 Sep 2019 at 15:54, Marco Fossati  wrote:

> Hey Sebastian,
>
> On 9/20/19 10:22 AM, Sebastian Hellmann wrote:
> > Not much of Freebase did end up in Wikidata.
>
> Dropping here some pointers to shed light on the migration of Freebase
> to Wikidata, since I was partially involved in the process:
> 1. WikiProject [1];
> 2. the paper behind [2];
> 3. datasets to be migrated [3].
>
> I can confirm that the migration has stalled: as of today, *528
> thousands* Freebase statements were curated by the community, out of *10
> million* ones. By 'curated', I mean approved or rejected.
> These numbers come from two queries against the primary sources tool
> database.
>
> The stall is due to several causes: in my opinion, the most important
> one was the bad quality of sources [4,5] coming from the Knowledge Vault
> project [6].
>
> Cheers,
>
> Marco
>
> [1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase
> [2]
>
> http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44818.pdf
> [3]
> https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data
> [4]
>
> https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/2017#Quality_of_sources
> [5]
>
> https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_Addition_of_References_to_Wikidata_Statements#A_whitelist_for_sources
> [6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Google's stake in Wikidata and Wikipedia

2019-09-27 Thread Sebastian Hellmann

Hi Marco,

I think, I looked at it some years ago and it still sounds like less 
than 5% made it, which is what I remember.


-- Sebastian

On 27.09.19 15:53, Marco Fossati wrote:

Hey Sebastian,

On 9/20/19 10:22 AM, Sebastian Hellmann wrote:

Not much of Freebase did end up in Wikidata.


Dropping here some pointers to shed light on the migration of Freebase 
to Wikidata, since I was partially involved in the process:

1. WikiProject [1];
2. the paper behind [2];
3. datasets to be migrated [3].

I can confirm that the migration has stalled: as of today, *528 
thousands* Freebase statements were curated by the community, out of 
*10 million* ones. By 'curated', I mean approved or rejected.
These numbers come from two queries against the primary sources tool 
database.


The stall is due to several causes: in my opinion, the most important 
one was the bad quality of sources [4,5] coming from the Knowledge 
Vault project [6].


Cheers,

Marco

[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase
[2] 
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44818.pdf
[3] 
https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data
[4] 
https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/2017#Quality_of_sources
[5] 
https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_Addition_of_References_to_Wikidata_Statements#A_whitelist_for_sources

[6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf


--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 


Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Google's stake in Wikidata and Wikipedia

2019-09-27 Thread Marco Fossati

Hey Sebastian,

On 9/20/19 10:22 AM, Sebastian Hellmann wrote:

Not much of Freebase did end up in Wikidata.


Dropping here some pointers to shed light on the migration of Freebase 
to Wikidata, since I was partially involved in the process:

1. WikiProject [1];
2. the paper behind [2];
3. datasets to be migrated [3].

I can confirm that the migration has stalled: as of today, *528 
thousands* Freebase statements were curated by the community, out of *10 
million* ones. By 'curated', I mean approved or rejected.
These numbers come from two queries against the primary sources tool 
database.


The stall is due to several causes: in my opinion, the most important 
one was the bad quality of sources [4,5] coming from the Knowledge Vault 
project [6].


Cheers,

Marco

[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase
[2] 
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44818.pdf
[3] 
https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data
[4] 
https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/2017#Quality_of_sources
[5] 
https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_Addition_of_References_to_Wikidata_Statements#A_whitelist_for_sources

[6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata