Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-27 Thread Marielle Volz
If you want to find all humans on wikidata, find all items with the
property instance of (p35) equal to human (q5). There is no need
to infer this from things like having the parent property, that's a
terrible way to do things. Items that are instances of different items
use the same properties all the time, you shouldn't be inferring
anything about the class of an item based on the properties it has.

If you are worried about horses being put in a genealogical tree with
humans, that would require someone to put a horse as a parent of a
human or vice versa. That's an problem with an invalid relationship
being added, not the property itself.

On Wed, Aug 26, 2015 at 6:43 PM, Svavar Kjarrval sva...@kjarrval.is wrote:


 On mið 26.ágú 2015 13:58, Peter F. Patel-Schneider wrote:
 I don't think that P21 (https://www.wikidata.org/wiki/Property:P21, sex or
 gender) is a subclass of P31 (https://www.wikidata.org/wiki/Property:P31,
 instance of).  Properties aren't subclasses in general.

 Perhaps you meant to talk about https://www.wikidata.org/wiki/Property:P21
 (sex or gender) being related via (https://www.wikidata.org/wiki/Property:P31
 (instance of) to https://www.wikidata.org/wiki/Q18608871 (Wikidata property
 for items about people).   This indicates that the property should only be
 used on people, even though the description of the property itself talks 
 about
 its use on animals.

 It appears that Wikidata is not very consistent internally.

 peter

 Sorry, I'm not used to the Wikidata lingo.

 To further explain my point (to which I think you have already agreed to):
 If I were to produce a code which makes assumptions based on such
 relations, the code would come to the contradiction that a non-human
 with a P21 relation is a human, if it were to recursively travel via in
 the hierarchy of declarations. P21 is declared with a P31-Q18608871 and
 Q18608871 is in turn declared P1269-Q5. Unless special precautions
 would be taken, anyone trying to generate an exhaustive list of all
 humans on Wikidata (without relying solely on the direct declaration on
 each item), they might find themselves with non-humans on that list due
 to travelling backwards via such relations.

 In essence, it seems like P21 either wrongfully allows definitions of
 genders of non-humans or that the property is too broad for a
 declaration of P31-Q18608871.

 - Svavar Kjarrval


 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-27 Thread Gerard Meijssen
Hoi,
Absolutely..

When full genealogy information is available, you do not need special words
that indicate whatever. It is only when this is not the case that you need
to specify what type of link there is. This can be specific like maternal
uncle or paternal aunt. This makes a practical difference in several
cultures and is THEREFORE significant. Again, it is only of relevance when
it cannot be inferred.
Thanks,
  GerardM

On 27 August 2015 at 11:08, Marielle Volz marielle.v...@gmail.com wrote:

 If you want to find all humans on wikidata, find all items with the
 property instance of (p35) equal to human (q5). There is no need
 to infer this from things like having the parent property, that's a
 terrible way to do things. Items that are instances of different items
 use the same properties all the time, you shouldn't be inferring
 anything about the class of an item based on the properties it has.

 If you are worried about horses being put in a genealogical tree with
 humans, that would require someone to put a horse as a parent of a
 human or vice versa. That's an problem with an invalid relationship
 being added, not the property itself.

 On Wed, Aug 26, 2015 at 6:43 PM, Svavar Kjarrval sva...@kjarrval.is
 wrote:
 
 
  On mið 26.ágú 2015 13:58, Peter F. Patel-Schneider wrote:
  I don't think that P21 (https://www.wikidata.org/wiki/Property:P21,
 sex or
  gender) is a subclass of P31 (
 https://www.wikidata.org/wiki/Property:P31,
  instance of).  Properties aren't subclasses in general.
 
  Perhaps you meant to talk about
 https://www.wikidata.org/wiki/Property:P21
  (sex or gender) being related via (
 https://www.wikidata.org/wiki/Property:P31
  (instance of) to https://www.wikidata.org/wiki/Q18608871 (Wikidata
 property
  for items about people).   This indicates that the property should only
 be
  used on people, even though the description of the property itself
 talks about
  its use on animals.
 
  It appears that Wikidata is not very consistent internally.
 
  peter
 
  Sorry, I'm not used to the Wikidata lingo.
 
  To further explain my point (to which I think you have already agreed
 to):
  If I were to produce a code which makes assumptions based on such
  relations, the code would come to the contradiction that a non-human
  with a P21 relation is a human, if it were to recursively travel via in
  the hierarchy of declarations. P21 is declared with a P31-Q18608871 and
  Q18608871 is in turn declared P1269-Q5. Unless special precautions
  would be taken, anyone trying to generate an exhaustive list of all
  humans on Wikidata (without relying solely on the direct declaration on
  each item), they might find themselves with non-humans on that list due
  to travelling backwards via such relations.
 
  In essence, it seems like P21 either wrongfully allows definitions of
  genders of non-humans or that the property is too broad for a
  declaration of P31-Q18608871.
 
  - Svavar Kjarrval
 
 
  ___
  Wikidata mailing list
  Wikidata@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikidata
 

 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-27 Thread Ole Palnatoke Andersen
Well, I decided to be bold (that is often the road to reversion, but
let's get the ball rolling):

Tarok[1] now has Pay Dirt[2] as his father.
B.B.S. Sugarlight[3] now has Sugarsweet Sid[4] as his mother, and she
has Sugarcane Hanover[5] as her father.

1 https://www.wikidata.org/wiki/Q12338810
2 https://www.wikidata.org/wiki/Q12331109
3 https://www.wikidata.org/wiki/Q20872428
4 https://www.wikidata.org/wiki/Q20873813
5 https://www.wikidata.org/wiki/Q12003911

When I asked about this on Facebook, the first answer was Random
guess: Check out Secretariat. My guess is that it has been registered
thoroughly.
Now the quest is to connect Secretariat, Tarok and Sugarcane Hanover.. :-)


On Wed, Aug 26, 2015 at 9:24 PM, Joe Filceolaire filceola...@gmail.com wrote:
 Every other ontology mixes humans with fictional characters and with groups
 of humans and possibly fictional humans (biblical characters for instance).
 Wikidata has gone to a lot of trouble to try to untangle these into separate
 classes. Anyone trying to get an exhaustive list of humans and not using
 instance of:human deserves everything he gets.

 P21 (sex or gender) is very explicitly specified as being usable for humans
 and for other creatures. At the request of some languages we have separate
 items for 'female human' and for 'female creature' (we have the same for
 male), 'Female human' is 'subclass of:female creature'. Relying on P21 to
 tell if something is or is not human is not recommended as it will probably
 miss out all the humans who are neither male nor female - wikidata has about
 a dozen other values that can be used with this property.

 Father (P22) and mother (P25) can perfectly well be used for non-humans and
 if the current constraints on these properties flag this as a problem then
 the constraints will have to be updated. I expect to see extensive pedigrees
 for racehorses entered in Wikidata. Note that there is a proposal under
 consideration to replace P22 and P25 with a single 'parent' property.

 Hope this helps

 Joe


 On Wed, 26 Aug 2015 18:44 Svavar Kjarrval sva...@kjarrval.is wrote:



 On mið 26.ágú 2015 13:58, Peter F. Patel-Schneider wrote:
  I don't think that P21 (https://www.wikidata.org/wiki/Property:P21, sex
  or
  gender) is a subclass of P31
  (https://www.wikidata.org/wiki/Property:P31,
  instance of).  Properties aren't subclasses in general.
 
  Perhaps you meant to talk about
  https://www.wikidata.org/wiki/Property:P21
  (sex or gender) being related via
  (https://www.wikidata.org/wiki/Property:P31
  (instance of) to https://www.wikidata.org/wiki/Q18608871 (Wikidata
  property
  for items about people).   This indicates that the property should only
  be
  used on people, even though the description of the property itself talks
  about
  its use on animals.
 
  It appears that Wikidata is not very consistent internally.
 
  peter
 
 Sorry, I'm not used to the Wikidata lingo.

 To further explain my point (to which I think you have already agreed to):
 If I were to produce a code which makes assumptions based on such
 relations, the code would come to the contradiction that a non-human
 with a P21 relation is a human, if it were to recursively travel via in
 the hierarchy of declarations. P21 is declared with a P31-Q18608871 and
 Q18608871 is in turn declared P1269-Q5. Unless special precautions
 would be taken, anyone trying to generate an exhaustive list of all
 humans on Wikidata (without relying solely on the direct declaration on
 each item), they might find themselves with non-humans on that list due
 to travelling backwards via such relations.

 In essence, it seems like P21 either wrongfully allows definitions of
 genders of non-humans or that the property is too broad for a
 declaration of P31-Q18608871.

 - Svavar Kjarrval

 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata


 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata




-- 
http://palnatoke.org * @palnatoke * +4522934588

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-26 Thread Svavar Kjarrval


On mið 26.ágú 2015 13:58, Peter F. Patel-Schneider wrote:
 I don't think that P21 (https://www.wikidata.org/wiki/Property:P21, sex or
 gender) is a subclass of P31 (https://www.wikidata.org/wiki/Property:P31,
 instance of).  Properties aren't subclasses in general.

 Perhaps you meant to talk about https://www.wikidata.org/wiki/Property:P21
 (sex or gender) being related via (https://www.wikidata.org/wiki/Property:P31
 (instance of) to https://www.wikidata.org/wiki/Q18608871 (Wikidata property
 for items about people).   This indicates that the property should only be
 used on people, even though the description of the property itself talks about
 its use on animals.

 It appears that Wikidata is not very consistent internally.

 peter

Sorry, I'm not used to the Wikidata lingo.

To further explain my point (to which I think you have already agreed to):
If I were to produce a code which makes assumptions based on such
relations, the code would come to the contradiction that a non-human
with a P21 relation is a human, if it were to recursively travel via in
the hierarchy of declarations. P21 is declared with a P31-Q18608871 and
Q18608871 is in turn declared P1269-Q5. Unless special precautions
would be taken, anyone trying to generate an exhaustive list of all
humans on Wikidata (without relying solely on the direct declaration on
each item), they might find themselves with non-humans on that list due
to travelling backwards via such relations.

In essence, it seems like P21 either wrongfully allows definitions of
genders of non-humans or that the property is too broad for a
declaration of P31-Q18608871.

- Svavar Kjarrval



signature.asc
Description: OpenPGP digital signature
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-26 Thread Ole Palnatoke Andersen
I've just completed #100wikidays, and my 100th article was about a
horse: https://www.wikidata.org/wiki/Q12003911 That horse is the
grandfather of https://www.wikidata.org/wiki/Q20872428, but should I
use the same properties as for humans?

We also have https://www.wikidata.org/wiki/Q12331109 and
https://www.wikidata.org/wiki/Q12338810, who were father and son.
Again: Do we have animal properties, or do we use the same as for
humans?

Regards,
Ole

On Mon, Aug 24, 2015 at 10:55 PM, Andrew Gray andrew.g...@dunelm.org.uk wrote:
 Having gone and written the RFC
 (https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Merging_relationship_properties)
 I've just discovered that we *did* have this discussion in 2013:

 https://www.wikidata.org/w/index.php?title=Wikidata%3AProperties_for_deletiondiff=44470851oldid=44465708

 - and it was suggested we come back to it after Phase III. I think
 the existing state of arbitrary access should be able to solve this
 problem, so I've added some notes about this.

 Comments welcome; I'll circulate notifications onwiki tonight.

 Andrew.

 On 24 August 2015 at 14:02, Lukas Benedix bene...@zedat.fu-berlin.de wrote:
 +1 for genderless family relationship properties.

 Lukas

 Hi all,

 Thanks again for your comments. It looks like:

 a) there's interest in simplifying this;

 b) creating automatic inferences is possibly desirable but will need a
 lot of work and thought.

 I'll put together an RFC onwiki about merging the gendered
 relationship properties, which will address the first part of the
 issue, and we can continue to think about how best to approach the
 second.

 Andrew.

 On 17 August 2015 at 12:29, Andrew Gray andrew.g...@dunelm.org.uk wrote:
 Hi all,

 I've recently been thinking about how we handle family/genealogical
 relationships in Wikidata - this is, potentially, a really valuable
 source of information for researchers to have available in a
 structured form, especially now we're bringing together so many
 biographical databases.

 We currently have the following properties to link people together:

 * spouses (P26) and cohabitants (P451) - not gendered
 * parents (P22/P25) and step-parents (P43/P44) - gendered
 * siblings (P7/P9) - gendered
 * children (P40) - not gendered (and oddly no step-children?)
 * a generic related to (P1038) for more distant relationships

 There's two big things that jump out here.

 ** First, gender. Parents are split by gender while children are not
 (we have mother/father not son/daughter). Siblings are likewise
 gendered, and spouses are not. These are all very early properties -
 does anyone remember how we got this way?

 This makes for some odd results. For example, if we want to using our
 data to identify all the male-line *descendants* of a person, we have
 to do some complicated inference from [P40 + target is male]. However,
 to identify all the male-line *ancestors*, we can just run back up the
 P22 chain. It feels quite strange to have this difference, and I
 wonder if we should standardise one way or the other - split P40 or
 merge the others.

 In some ways, merging seems more elegant. We do have fairly good
 gender metadata (and getting better all the time!), so we can still do
 gender-specific relationship searches where needed. It also avoids
 having to force a binary gender approach - we are in the odd position
 of being able to give a nuanced entry in P21 but can only say if
 someone is a sister or brother.

 ** Secondly, symmetry. Siblings, spouses, and parent-child pairs are
 by definition symmetric. If A has P26:B, then B should also have
 P26:A. The gendered cases are a little more complicated, as if A has
 P40:B, then B has P22:A or P25:A, but there is still a degree of
 symmetry - one of those must be true.

 However, Wikidata doesn't really help us make use of this symmetry. If
 I list A as spouse of B, I need to add (separately) that B is spouse
 of A. If they have four children C, D, E, and F, this gets very
 complicated - we have six articles with *30* links between them, all
 of which need to be made manually. It feels like automatically making
 symmetric links for these properties would save a lot of work, and
 produce a much more reliable dataset.

 I believe we decided early on not to do symmetric links because it
 would swamp commonly linked articles (imagine what Q5 would look like
 by now!). On the other hand, these are properties with a very narrowly
 defined scope, and we actively *want* them to be comprehensively
 symmetric - every parent article should list all their children on
 Wikidata, and every child article should list their parent and all
 their siblings.

 Perhaps it's worth reconsidering whether to allow symmetry for a
 specifically defined class of properties - would an automatically
 symmetric P26 really swamp the system? It would be great if the system
 could match up relationships and fill in missing parent/child,
 sibling, and spouse links. I can't be the only one 

Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-26 Thread Joe Filceolaire
Use the same properties for family relationships of animals and humans

Sex /gender is the only property that has values for female / male
creatures different from the values for male / female humans.

Joe

On Wed, 26 Aug 2015 12:45 Ole Palnatoke Andersen palnat...@gmail.com
wrote:

 I've just completed #100wikidays, and my 100th article was about a
 horse: https://www.wikidata.org/wiki/Q12003911 That horse is the
 grandfather of https://www.wikidata.org/wiki/Q20872428, but should I
 use the same properties as for humans?

 We also have https://www.wikidata.org/wiki/Q12331109 and
 https://www.wikidata.org/wiki/Q12338810, who were father and son.
 Again: Do we have animal properties, or do we use the same as for
 humans?

 Regards,
 Ole

 On Mon, Aug 24, 2015 at 10:55 PM, Andrew Gray andrew.g...@dunelm.org.uk
 wrote:
  Having gone and written the RFC
  (
 https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Merging_relationship_properties
 )
  I've just discovered that we *did* have this discussion in 2013:
 
 
 https://www.wikidata.org/w/index.php?title=Wikidata%3AProperties_for_deletiondiff=44470851oldid=44465708
 
  - and it was suggested we come back to it after Phase III. I think
  the existing state of arbitrary access should be able to solve this
  problem, so I've added some notes about this.
 
  Comments welcome; I'll circulate notifications onwiki tonight.
 
  Andrew.
 
  On 24 August 2015 at 14:02, Lukas Benedix bene...@zedat.fu-berlin.de
 wrote:
  +1 for genderless family relationship properties.
 
  Lukas
 
  Hi all,
 
  Thanks again for your comments. It looks like:
 
  a) there's interest in simplifying this;
 
  b) creating automatic inferences is possibly desirable but will need a
  lot of work and thought.
 
  I'll put together an RFC onwiki about merging the gendered
  relationship properties, which will address the first part of the
  issue, and we can continue to think about how best to approach the
  second.
 
  Andrew.
 
  On 17 August 2015 at 12:29, Andrew Gray andrew.g...@dunelm.org.uk
 wrote:
  Hi all,
 
  I've recently been thinking about how we handle family/genealogical
  relationships in Wikidata - this is, potentially, a really valuable
  source of information for researchers to have available in a
  structured form, especially now we're bringing together so many
  biographical databases.
 
  We currently have the following properties to link people together:
 
  * spouses (P26) and cohabitants (P451) - not gendered
  * parents (P22/P25) and step-parents (P43/P44) - gendered
  * siblings (P7/P9) - gendered
  * children (P40) - not gendered (and oddly no step-children?)
  * a generic related to (P1038) for more distant relationships
 
  There's two big things that jump out here.
 
  ** First, gender. Parents are split by gender while children are not
  (we have mother/father not son/daughter). Siblings are likewise
  gendered, and spouses are not. These are all very early properties -
  does anyone remember how we got this way?
 
  This makes for some odd results. For example, if we want to using our
  data to identify all the male-line *descendants* of a person, we have
  to do some complicated inference from [P40 + target is male]. However,
  to identify all the male-line *ancestors*, we can just run back up the
  P22 chain. It feels quite strange to have this difference, and I
  wonder if we should standardise one way or the other - split P40 or
  merge the others.
 
  In some ways, merging seems more elegant. We do have fairly good
  gender metadata (and getting better all the time!), so we can still do
  gender-specific relationship searches where needed. It also avoids
  having to force a binary gender approach - we are in the odd position
  of being able to give a nuanced entry in P21 but can only say if
  someone is a sister or brother.
 
  ** Secondly, symmetry. Siblings, spouses, and parent-child pairs are
  by definition symmetric. If A has P26:B, then B should also have
  P26:A. The gendered cases are a little more complicated, as if A has
  P40:B, then B has P22:A or P25:A, but there is still a degree of
  symmetry - one of those must be true.
 
  However, Wikidata doesn't really help us make use of this symmetry. If
  I list A as spouse of B, I need to add (separately) that B is spouse
  of A. If they have four children C, D, E, and F, this gets very
  complicated - we have six articles with *30* links between them, all
  of which need to be made manually. It feels like automatically making
  symmetric links for these properties would save a lot of work, and
  produce a much more reliable dataset.
 
  I believe we decided early on not to do symmetric links because it
  would swamp commonly linked articles (imagine what Q5 would look like
  by now!). On the other hand, these are properties with a very narrowly
  defined scope, and we actively *want* them to be comprehensively
  symmetric - every parent article should list all their children on
  Wikidata, 

Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-26 Thread Peter F. Patel-Schneider
I am a relative [sic] outsider to Wikidata and I just tried to answer this
question by looking at wikidata.

It turns out that there is information in Wikidata that indicates that
https://www.wikidata.org/wiki/Property:P22 (father) is only to be used on
people.  Look at https://www.wikidata.org/wiki/Property_talk:P22, where both
the type and the value type are person (Q215627), fictional character
(Q95074).  Similar restrictions are in place for
https://www.wikidata.org/wiki/Property:P1038 (relative).

So I would say that, no, you should not use these properties on horses.

Whether this is a good thing or not is a separate matter.  I do note that
there do not appear to be any Wikidata properties that can be used for
parent-offspring relationships for horses.  Neither
https://www.wikidata.org/wiki/Property:P22 (father) nor
https://www.wikidata.org/wiki/Property:P1038 (relative) have super-properties.

peter

On 08/26/2015 04:45 AM, Ole Palnatoke Andersen wrote:
 I've just completed #100wikidays, and my 100th article was about a
 horse: https://www.wikidata.org/wiki/Q12003911 That horse is the
 grandfather of https://www.wikidata.org/wiki/Q20872428, but should I
 use the same properties as for humans?
 
 We also have https://www.wikidata.org/wiki/Q12331109 and
 https://www.wikidata.org/wiki/Q12338810, who were father and son.
 Again: Do we have animal properties, or do we use the same as for
 humans?
 
 Regards,
 Ole
 
 On Mon, Aug 24, 2015 at 10:55 PM, Andrew Gray andrew.g...@dunelm.org.uk 
 wrote:
 Having gone and written the RFC
 (https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Merging_relationship_properties)
 I've just discovered that we *did* have this discussion in 2013:

 https://www.wikidata.org/w/index.php?title=Wikidata%3AProperties_for_deletiondiff=44470851oldid=44465708

 - and it was suggested we come back to it after Phase III. I think
 the existing state of arbitrary access should be able to solve this
 problem, so I've added some notes about this.

 Comments welcome; I'll circulate notifications onwiki tonight.

 Andrew.

 On 24 August 2015 at 14:02, Lukas Benedix bene...@zedat.fu-berlin.de wrote:
 +1 for genderless family relationship properties.

 Lukas

 Hi all,

 Thanks again for your comments. It looks like:

 a) there's interest in simplifying this;

 b) creating automatic inferences is possibly desirable but will need a
 lot of work and thought.

 I'll put together an RFC onwiki about merging the gendered
 relationship properties, which will address the first part of the
 issue, and we can continue to think about how best to approach the
 second.

 Andrew.

 On 17 August 2015 at 12:29, Andrew Gray andrew.g...@dunelm.org.uk wrote:
 Hi all,

 I've recently been thinking about how we handle family/genealogical
 relationships in Wikidata - this is, potentially, a really valuable
 source of information for researchers to have available in a
 structured form, especially now we're bringing together so many
 biographical databases.

 We currently have the following properties to link people together:

 * spouses (P26) and cohabitants (P451) - not gendered
 * parents (P22/P25) and step-parents (P43/P44) - gendered
 * siblings (P7/P9) - gendered
 * children (P40) - not gendered (and oddly no step-children?)
 * a generic related to (P1038) for more distant relationships

 There's two big things that jump out here.

 ** First, gender. Parents are split by gender while children are not
 (we have mother/father not son/daughter). Siblings are likewise
 gendered, and spouses are not. These are all very early properties -
 does anyone remember how we got this way?

 This makes for some odd results. For example, if we want to using our
 data to identify all the male-line *descendants* of a person, we have
 to do some complicated inference from [P40 + target is male]. However,
 to identify all the male-line *ancestors*, we can just run back up the
 P22 chain. It feels quite strange to have this difference, and I
 wonder if we should standardise one way or the other - split P40 or
 merge the others.

 In some ways, merging seems more elegant. We do have fairly good
 gender metadata (and getting better all the time!), so we can still do
 gender-specific relationship searches where needed. It also avoids
 having to force a binary gender approach - we are in the odd position
 of being able to give a nuanced entry in P21 but can only say if
 someone is a sister or brother.

 ** Secondly, symmetry. Siblings, spouses, and parent-child pairs are
 by definition symmetric. If A has P26:B, then B should also have
 P26:A. The gendered cases are a little more complicated, as if A has
 P40:B, then B has P22:A or P25:A, but there is still a degree of
 symmetry - one of those must be true.

 However, Wikidata doesn't really help us make use of this symmetry. If
 I list A as spouse of B, I need to add (separately) that B is spouse
 of A. If they have four children C, D, E, and F, this gets very
 complicated 

Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-26 Thread Peter F. Patel-Schneider


On 08/26/2015 06:16 AM, Svavar Kjarrval wrote:
 On mið 26.ágú 2015 11:45, Ole Palnatoke Andersen wrote:
 I've just completed #100wikidays, and my 100th article was about a
 horse: https://www.wikidata.org/wiki/Q12003911 That horse is the
 grandfather of https://www.wikidata.org/wiki/Q20872428, but should I
 use the same properties as for humans?

 We also have https://www.wikidata.org/wiki/Q12331109 and
 https://www.wikidata.org/wiki/Q12338810, who were father and son.
 Again: Do we have animal properties, or do we use the same as for
 humans?

 P21 is a subclass of P31 with Q18608871 which indicates in machine
 readable interpretation that it is about the gender of people, yet the
 descriptions assume items can be associated with P21 to include gender
 of animals. Yeah, I can understand the confusion. :/
 
 - Svavar Kjarrval
 
 

I don't think that P21 (https://www.wikidata.org/wiki/Property:P21, sex or
gender) is a subclass of P31 (https://www.wikidata.org/wiki/Property:P31,
instance of).  Properties aren't subclasses in general.

Perhaps you meant to talk about https://www.wikidata.org/wiki/Property:P21
(sex or gender) being related via (https://www.wikidata.org/wiki/Property:P31
(instance of) to https://www.wikidata.org/wiki/Q18608871 (Wikidata property
for items about people).   This indicates that the property should only be
used on people, even though the description of the property itself talks about
its use on animals.

It appears that Wikidata is not very consistent internally.

peter





___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-26 Thread James Heald

On 26/08/2015 23:35, Svavar Kjarrval wrote:



On mið 26.ágú 2015 19:24, Joe Filceolaire wrote:


Every other ontology mixes humans with fictional characters and with
groups of humans and possibly fictional humans (biblical characters
for instance). Wikidata has gone to a lot of trouble to try to
untangle these into separate classes. Anyone trying to get an
exhaustive list of humans and not using instance of:human deserves
everything he gets.

P21 (sex or gender) is very explicitly specified as being usable for
humans and for other creatures. At the request of some languages we
have separate items for 'female human' and for 'female creature' (we
have the same for male), 'Female human' is 'subclass of:female
creature'. Relying on P21 to tell if something is or is not human is
not recommended as it will probably miss out all the humans who are
neither male nor female - wikidata has about a dozen other values that
can be used with this property.

Father (P22) and mother (P25) can perfectly well be used for
non-humans and if the current constraints on these properties flag
this as a problem then the constraints will have to be updated. I
expect to see extensive pedigrees for racehorses entered in Wikidata.
Note that there is a proposal under consideration to replace P22 and
P25 with a single 'parent' property.


   Hope this helps

Joe



For me, it doesn't help. One of the purposes of Wikidata is that it
should also be machine readable. If I were trying to, for example,
travel recursively through the declarations to find deep common facts
about some group of items, it would take much more work than necessary
if I have to hunt down and code around a lot of wrongly categorised
trees and special cases in the data structure.

One other example is Stubbs, the current mayor of Talkeetna, (Q7627362)
which happens to be a cat. The Wikidata item for Stubbs has the
declaration P31-Q146 (cat). However, it also has the definition
P31-Q30185 (mayor), a subclass of Q2285706 (head of government) which
is a subclass of Q82955 (politician) and that's finally a subclass of Q5
(human). One might suggest that since the item for Stubbs is
specifically declared as a cat, that definition has priority (or some
variation of that logic). The problem is that a machine cannot
automatically understand that. Without special programming and/or a way
to define contradictions like that in Wikidata, both facts are assumed
to be correct. The machine might not even know that there is a
contradiction at all so the machine, in its inferences, will assume
Stubbs is both a human and a cat.

- Svavar Kjarrval



There are a *lot* of problems with P279 (subclass), right across Wikidata.

These will only be corrected once people start doing searches in a 
systematic way and addressing the anomalies they find.


In this case, politician (Q82955) should *not* be a subclass of human 
(Q5), instead it should be a subclass of something like occupation 
(Q13516667), or alternatively perhaps profession (Q28640).



My understanding is that currently there are a vast number of incorrect 
subclass relationships in the project, messing up tree searches, and so 
far it is something that has simply not yet been systematically addressed.


  -- James.


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-26 Thread Peter F. Patel-Schneider


On 08/26/2015 06:01 PM, Svavar Kjarrval wrote:
 On mið 26.ágú 2015 23:05, James Heald wrote:
 There are a *lot* of problems with P279 (subclass), right across
 Wikidata.

 These will only be corrected once people start doing searches in a
 systematic way and addressing the anomalies they find.

 In this case, politician (Q82955) should *not* be a subclass of human
 (Q5), instead it should be a subclass of something like occupation
 (Q13516667), or alternatively perhaps profession (Q28640).


 My understanding is that currently there are a vast number of
 incorrect subclass relationships in the project, messing up tree
 searches, and so far it is something that has simply not yet been
 systematically addressed.

   -- James.


 For now, what's the best way to find (and perhaps correct) incorrect
 declarations like these?
 
 If I were to just change items for commonly used items like politician
 (Q82955) it might be construed as vandalism or someone who doesn't care
 about or understand the Stubbs-declared-as-a-human problem might just
 add that declaration back later.
 
 When it comes to the gender property (P21), the human readable
 description indicates that it's to define genders in general, yet it's
 declared as an instance of an item (Q18608871) which only applies to
 humans, which of course has consequences further up in the hierarchy
 since the maintainers of item Q18608871 faithfully assume it only
 applies to humans.

Well, the situation with respect to  Wikidata property for items about people
(Q18608871) is very difficult.   There is absolutely no machine-interpretable
information associated with this class that can be used to deterimine that
instances of it are only supposed to be used for people.  So, at the bare
minimum, such machine-interpretable information needs to be added.

Then there is the issue that there is no theory of how the
machine-interpretable information that is associated with entities in Wikidata
is to be processed.   All the processing is currently done using
uninterpretable procedures.  For example, on
https://www.wikidata.org/wiki/Property_talk:P22 there is information that is
used to control some piece of code that checks to see that the subject of
https://www.wikidata.org/wiki/Property:P21 belongs to person (Q215627) or
fictional character (Q95074).  However, there is no theory showing how this
interacts with other parts of Wikidata, even such inherent parts of Wikidata
as https://www.wikidata.org/wiki/Property:P31

In fact, there is even difficulty of determining simple truth in Wikidata.
Two sources can conflict, and Wikidata is not in the position of being an
arbiter for such conflicts, certainly not in general.  To make the situation
even more complex, Wikidata has a temporal aspect as well and has a need to
admit exceptions to general statements.

So what can be done?  Any solution is going to be tricky.  That is not to say
that some solutions cannot be found by looking at systems and standards that
are already being used for storing large amounts of complex information.
However, any solution is going to have to be carefully tailored to meet the
requirements of Wikidata and Wikidatans.  (Is there an official term for the
people who are putting Wikidata and Wikidata information together?)

There is also a big chicken-and-egg problem here - a good solution to reliable
machine-interpretation of Wikidata information requires, for example,
consistent use of instance of, subclass, and subproperty; but what counts as a
consistent use of these fundamental properties depends on a formal theory of
what they mean.


I, for one, would find even just the attempt to solve this problem vastly
interesting, and I have been doing some exploration as to what might be
needed.  My company is interested in using Wikidata as a source of background
information, but finds that the lack of a good theory of Wikidata information
is problematic, so I have some cover for spending time on this problem.

Anyway, if there is interest in machine interpretation of Wikidata
information, if only to detect potential anomalies, I, and probably others,
would be motivated to spend more time on trying to come up with potential
solutions, hopefully in a collaborative effort that includes not just
theoreticians but also Wikidatans.

 In the case of the hierarchy Stubbs is associated with the maintainers
 have assumed all mayors are, without exception, humans or they somehow
 thought that if there were exceptions to this, the machines could
 somehow detect and apply them in each case. Both of those methods are, I
 think we agree, are wrong and we should find out why it's happening.
 
 Is there a tool where one can put in a Wikidata item and it extracts
 declarations based on higher properties like subclass or instance of?
 Like if I were to input the item for Stubbs, it would travel the
 hierarchy and tell me what would be assumed about Stubbs based on the
 declarations further up in the tree.

Yes, it is 

Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-26 Thread Thad Guidry
​[snip]

Yes the class tree needs work, especially the higher levels, and copying
 someone else's High Level Ontology seems to be impractical as they all seem
 to be copyrighted. I hope that this is the kind of thing that will get
 better with time. We will see.

​​

Joe

​


​
Freebase's is not copyrighted ! :)

In Freebase, we had the notion of a community curated mutex that applied
simple rules and visually would warn a user when they applied an instance
of (Freebase Type or Class) to an entity that was also typed by a mutexed
Class.

For instance,  Pets ARE NOT Humans
And other simple things like that.

Wikidata might want to begin creating something like that.  Q82955 NOT
ALLOWED AS Q5   or whatever works for you guys.

The Big Mama Mutex in Freebase was the primary one and others...all of
which is still in the Freebase graph data itself.  Use it as a starting
point if you want.

Thad
+ThadGuidry https://www.google.com/+ThadGuidry​

​
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-24 Thread Lukas Benedix
+1 for genderless family relationship properties.

Lukas

 Hi all,

 Thanks again for your comments. It looks like:

 a) there's interest in simplifying this;

 b) creating automatic inferences is possibly desirable but will need a
 lot of work and thought.

 I'll put together an RFC onwiki about merging the gendered
 relationship properties, which will address the first part of the
 issue, and we can continue to think about how best to approach the
 second.

 Andrew.

 On 17 August 2015 at 12:29, Andrew Gray andrew.g...@dunelm.org.uk wrote:
 Hi all,

 I've recently been thinking about how we handle family/genealogical
 relationships in Wikidata - this is, potentially, a really valuable
 source of information for researchers to have available in a
 structured form, especially now we're bringing together so many
 biographical databases.

 We currently have the following properties to link people together:

 * spouses (P26) and cohabitants (P451) - not gendered
 * parents (P22/P25) and step-parents (P43/P44) - gendered
 * siblings (P7/P9) - gendered
 * children (P40) - not gendered (and oddly no step-children?)
 * a generic related to (P1038) for more distant relationships

 There's two big things that jump out here.

 ** First, gender. Parents are split by gender while children are not
 (we have mother/father not son/daughter). Siblings are likewise
 gendered, and spouses are not. These are all very early properties -
 does anyone remember how we got this way?

 This makes for some odd results. For example, if we want to using our
 data to identify all the male-line *descendants* of a person, we have
 to do some complicated inference from [P40 + target is male]. However,
 to identify all the male-line *ancestors*, we can just run back up the
 P22 chain. It feels quite strange to have this difference, and I
 wonder if we should standardise one way or the other - split P40 or
 merge the others.

 In some ways, merging seems more elegant. We do have fairly good
 gender metadata (and getting better all the time!), so we can still do
 gender-specific relationship searches where needed. It also avoids
 having to force a binary gender approach - we are in the odd position
 of being able to give a nuanced entry in P21 but can only say if
 someone is a sister or brother.

 ** Secondly, symmetry. Siblings, spouses, and parent-child pairs are
 by definition symmetric. If A has P26:B, then B should also have
 P26:A. The gendered cases are a little more complicated, as if A has
 P40:B, then B has P22:A or P25:A, but there is still a degree of
 symmetry - one of those must be true.

 However, Wikidata doesn't really help us make use of this symmetry. If
 I list A as spouse of B, I need to add (separately) that B is spouse
 of A. If they have four children C, D, E, and F, this gets very
 complicated - we have six articles with *30* links between them, all
 of which need to be made manually. It feels like automatically making
 symmetric links for these properties would save a lot of work, and
 produce a much more reliable dataset.

 I believe we decided early on not to do symmetric links because it
 would swamp commonly linked articles (imagine what Q5 would look like
 by now!). On the other hand, these are properties with a very narrowly
 defined scope, and we actively *want* them to be comprehensively
 symmetric - every parent article should list all their children on
 Wikidata, and every child article should list their parent and all
 their siblings.

 Perhaps it's worth reconsidering whether to allow symmetry for a
 specifically defined class of properties - would an automatically
 symmetric P26 really swamp the system? It would be great if the system
 could match up relationships and fill in missing parent/child,
 sibling, and spouse links. I can't be the only one who regularly adds
 one half of the relationship and forgets to include the other!

 A bot looking at all of these and filling in the gaps might be a
 useful approach... but it would break down if someone tries to remove
 one of the symmetric entries without also removing the other, as the
 bot would probably (eventually) fill it back in. Ultimately, an
 automatic symmetry would seem best.

 Thoughts on either of these? If there is interest I will write up a
 formal proposal on-wiki.

 --
 - Andrew Gray
   andrew.g...@dunelm.org.uk



 --
 - Andrew Gray
   andrew.g...@dunelm.org.uk

 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-24 Thread Andrew Gray
Having gone and written the RFC
(https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Merging_relationship_properties)
I've just discovered that we *did* have this discussion in 2013:

https://www.wikidata.org/w/index.php?title=Wikidata%3AProperties_for_deletiondiff=44470851oldid=44465708

- and it was suggested we come back to it after Phase III. I think
the existing state of arbitrary access should be able to solve this
problem, so I've added some notes about this.

Comments welcome; I'll circulate notifications onwiki tonight.

Andrew.

On 24 August 2015 at 14:02, Lukas Benedix bene...@zedat.fu-berlin.de wrote:
 +1 for genderless family relationship properties.

 Lukas

 Hi all,

 Thanks again for your comments. It looks like:

 a) there's interest in simplifying this;

 b) creating automatic inferences is possibly desirable but will need a
 lot of work and thought.

 I'll put together an RFC onwiki about merging the gendered
 relationship properties, which will address the first part of the
 issue, and we can continue to think about how best to approach the
 second.

 Andrew.

 On 17 August 2015 at 12:29, Andrew Gray andrew.g...@dunelm.org.uk wrote:
 Hi all,

 I've recently been thinking about how we handle family/genealogical
 relationships in Wikidata - this is, potentially, a really valuable
 source of information for researchers to have available in a
 structured form, especially now we're bringing together so many
 biographical databases.

 We currently have the following properties to link people together:

 * spouses (P26) and cohabitants (P451) - not gendered
 * parents (P22/P25) and step-parents (P43/P44) - gendered
 * siblings (P7/P9) - gendered
 * children (P40) - not gendered (and oddly no step-children?)
 * a generic related to (P1038) for more distant relationships

 There's two big things that jump out here.

 ** First, gender. Parents are split by gender while children are not
 (we have mother/father not son/daughter). Siblings are likewise
 gendered, and spouses are not. These are all very early properties -
 does anyone remember how we got this way?

 This makes for some odd results. For example, if we want to using our
 data to identify all the male-line *descendants* of a person, we have
 to do some complicated inference from [P40 + target is male]. However,
 to identify all the male-line *ancestors*, we can just run back up the
 P22 chain. It feels quite strange to have this difference, and I
 wonder if we should standardise one way or the other - split P40 or
 merge the others.

 In some ways, merging seems more elegant. We do have fairly good
 gender metadata (and getting better all the time!), so we can still do
 gender-specific relationship searches where needed. It also avoids
 having to force a binary gender approach - we are in the odd position
 of being able to give a nuanced entry in P21 but can only say if
 someone is a sister or brother.

 ** Secondly, symmetry. Siblings, spouses, and parent-child pairs are
 by definition symmetric. If A has P26:B, then B should also have
 P26:A. The gendered cases are a little more complicated, as if A has
 P40:B, then B has P22:A or P25:A, but there is still a degree of
 symmetry - one of those must be true.

 However, Wikidata doesn't really help us make use of this symmetry. If
 I list A as spouse of B, I need to add (separately) that B is spouse
 of A. If they have four children C, D, E, and F, this gets very
 complicated - we have six articles with *30* links between them, all
 of which need to be made manually. It feels like automatically making
 symmetric links for these properties would save a lot of work, and
 produce a much more reliable dataset.

 I believe we decided early on not to do symmetric links because it
 would swamp commonly linked articles (imagine what Q5 would look like
 by now!). On the other hand, these are properties with a very narrowly
 defined scope, and we actively *want* them to be comprehensively
 symmetric - every parent article should list all their children on
 Wikidata, and every child article should list their parent and all
 their siblings.

 Perhaps it's worth reconsidering whether to allow symmetry for a
 specifically defined class of properties - would an automatically
 symmetric P26 really swamp the system? It would be great if the system
 could match up relationships and fill in missing parent/child,
 sibling, and spouse links. I can't be the only one who regularly adds
 one half of the relationship and forgets to include the other!

 A bot looking at all of these and filling in the gaps might be a
 useful approach... but it would break down if someone tries to remove
 one of the symmetric entries without also removing the other, as the
 bot would probably (eventually) fill it back in. Ultimately, an
 automatic symmetry would seem best.

 Thoughts on either of these? If there is interest I will write up a
 formal proposal on-wiki.

 --
 - Andrew Gray
   andrew.g...@dunelm.org.uk



 --

Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-22 Thread Thomas Douillard
Another example where things might get complicated is the common office -
head of office problem.

For example in french governments, ministries change all the time in scopes
depending on the president and the prime minister.

We can have one Minister of veterants and of slipper, with it
corresponding minister, that will become minister on slippers and panties
on the next one.

While it's pretty sure there will be a minister of foreign affairs, hence
it's pretty clear that there will be an item for the corresponding minister.

So a contributor can face problems like do I use the construction

* Michel Michu office held: minister of: slippers
* Michel Michu office held: minister of slipper
* Michel Michu office held: minister of: ministry of slipper

Do I create the item for the ministry of slippers ? for the minister as an
office ?

I think inferences could help him by stating all these are more or less a
way to say the same thing, and help the query writing at the same time.


2015-08-17 14:47 GMT+02:00 Markus Kroetzsch markus.kroetz...@tu-dresden.de
:

 Hi Andrew,

 I am very interested in this, especially in the second aspect (how to
 handle symmetry). There are many cases where we have two or more ways to
 say the same thing on Wikidata (symmetric properties are only one case). It
 would be useful to draw these inferences so that they can used for queries
 and maybe also in the UI.

 This can also help to solve some of the other problems you mention: for
 those who would like to have properties son and daughter, one could
 infer their values automatically from other statements, without editors
 having to maintain this data at all.

 A possible way to maintain these statements on wiki would be to use a
 special reference to encode that they have been inferred (and from what).
 This would make it possible to maintain them automatically without the
 problem of human editors ending up wrestling with bots ;-) Moreover, it
 would not require any change in the software on which Wikidata is running.

 For the cases you mentioned, I don't think that there is a problem with
 too many inferred statements. There are surely cases where it would not be
 practical (in the current system) to store inferred data, but family
 relationships are usually not problematic. In fact, they are very useful to
 human readers.

 Of course, the community needs to fully control what is inferred, and this
 has to be done in-wiki. We already have symmetry information in
 constraints, but for useful inference we might have to be stricter. The
 current constraints also cover some not-so-strict cases where exceptions
 are likely (e.g., most people have only one gender, but this is not a
 strong rule; on the other hand, one is always the child of one's mother by
 definition).

 One also has to be careful with qualifiers etc. For example, the start end
 end of a spouse statement should be copied to its symmetric version, but
 there might also be qualifiers that should not be copied like this. I would
 like to work on a proposal for how to specify such things. It would be good
 to coordinate there.

 A first step (even before adding any statement to Wikidata) could be to
 add inferred information to the query services and RDF exports. This will
 make it easier to solve part of the problem first without having too many
 discussions in parallel.

 Best regards,

 Markus



 On 17.08.2015 13:29, Andrew Gray wrote:

 Hi all,

 I've recently been thinking about how we handle family/genealogical
 relationships in Wikidata - this is, potentially, a really valuable
 source of information for researchers to have available in a
 structured form, especially now we're bringing together so many
 biographical databases.

 We currently have the following properties to link people together:

 * spouses (P26) and cohabitants (P451) - not gendered
 * parents (P22/P25) and step-parents (P43/P44) - gendered
 * siblings (P7/P9) - gendered
 * children (P40) - not gendered (and oddly no step-children?)
 * a generic related to (P1038) for more distant relationships

 There's two big things that jump out here.

 ** First, gender. Parents are split by gender while children are not
 (we have mother/father not son/daughter). Siblings are likewise
 gendered, and spouses are not. These are all very early properties -
 does anyone remember how we got this way?

 This makes for some odd results. For example, if we want to using our
 data to identify all the male-line *descendants* of a person, we have
 to do some complicated inference from [P40 + target is male]. However,
 to identify all the male-line *ancestors*, we can just run back up the
 P22 chain. It feels quite strange to have this difference, and I
 wonder if we should standardise one way or the other - split P40 or
 merge the others.

 In some ways, merging seems more elegant. We do have fairly good
 gender metadata (and getting better all the time!), so we can still do
 gender-specific relationship searches 

Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-20 Thread Markus Krötzsch

On 20.08.2015 14:51, Andrew Gray wrote:

As someone with an extensive collection of Hindi-speaking relatives, I
agree entirely with the complexity here. Never did a language have
such specialised ways of identifying your relations :-)


This is in fact exactly where inferred relations can make life easier. 
Instead of storing many different culture-specific properties on 
Wikidata (which would lead to a lengthy page with a lot of 
culture-specific relations), one can infer their values from existing 
data on the fly. It is not necessary to show these inferences to all 
users in all contexts, but one can offer them to users who are 
interested in this (e.g., in Reasonator, based on the language setting).


There are still some steps needed until we can have this, but I can see 
a great chance there to make Wikidata more adapted to the cultural 
diversity of its users while keeping the underlying data simple.


Markus



However, we already seem to manage fine with simple relation
properties like spouse or child, without significant language
complications, and as long as all we're doing is putting these on more
items rather than inferring more complex relationships, I think we
should be okay.

Andrew.

On 17 August 2015 at 17:58, Gerard Meijssen gerard.meijs...@gmail.com wrote:

Hoi,
When you make these inferences, you have to appreciate how English oriented
they are. In many cultures there are specific names for older sisters,
brothers and younger sisters and brothers. There are names for uncles aunts
from mother's side that differ from those of father's side.

Inferences are language specific. They may have a place but they are not
obvious when you look at a scale of Wikidata.
Thanks,
   GerardM

On 17 August 2015 at 14:47, Markus Kroetzsch
markus.kroetz...@tu-dresden.de wrote:


Hi Andrew,

I am very interested in this, especially in the second aspect (how to
handle symmetry). There are many cases where we have two or more ways to say
the same thing on Wikidata (symmetric properties are only one case). It
would be useful to draw these inferences so that they can used for queries
and maybe also in the UI.

This can also help to solve some of the other problems you mention: for
those who would like to have properties son and daughter, one could
infer their values automatically from other statements, without editors
having to maintain this data at all.

A possible way to maintain these statements on wiki would be to use a
special reference to encode that they have been inferred (and from what).
This would make it possible to maintain them automatically without the
problem of human editors ending up wrestling with bots ;-) Moreover, it
would not require any change in the software on which Wikidata is running.

For the cases you mentioned, I don't think that there is a problem with
too many inferred statements. There are surely cases where it would not be
practical (in the current system) to store inferred data, but family
relationships are usually not problematic. In fact, they are very useful to
human readers.

Of course, the community needs to fully control what is inferred, and this
has to be done in-wiki. We already have symmetry information in constraints,
but for useful inference we might have to be stricter. The current
constraints also cover some not-so-strict cases where exceptions are likely
(e.g., most people have only one gender, but this is not a strong rule; on
the other hand, one is always the child of one's mother by definition).

One also has to be careful with qualifiers etc. For example, the start end
end of a spouse statement should be copied to its symmetric version, but
there might also be qualifiers that should not be copied like this. I would
like to work on a proposal for how to specify such things. It would be good
to coordinate there.

A first step (even before adding any statement to Wikidata) could be to
add inferred information to the query services and RDF exports. This will
make it easier to solve part of the problem first without having too many
discussions in parallel.

Best regards,

Markus



On 17.08.2015 13:29, Andrew Gray wrote:


Hi all,

I've recently been thinking about how we handle family/genealogical
relationships in Wikidata - this is, potentially, a really valuable
source of information for researchers to have available in a
structured form, especially now we're bringing together so many
biographical databases.

We currently have the following properties to link people together:

* spouses (P26) and cohabitants (P451) - not gendered
* parents (P22/P25) and step-parents (P43/P44) - gendered
* siblings (P7/P9) - gendered
* children (P40) - not gendered (and oddly no step-children?)
* a generic related to (P1038) for more distant relationships

There's two big things that jump out here.

** First, gender. Parents are split by gender while children are not
(we have mother/father not son/daughter). Siblings are likewise
gendered, and spouses are not. These are 

Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-20 Thread Federico Leva (Nemo)

Andrew Gray, 17/08/2015 13:29:

In some ways, merging seems more elegant. We do have fairly good
gender metadata (and getting better all the time!), so we can still do
gender-specific relationship searches where needed. It also avoids
having to force a binary gender approach - we are in the odd position
of being able to give a nuanced entry in P21 but can only say if
someone is a sister or brother.


I think this is quite important. I think properties should focus on one 
thing at a time and there is no need to state both gender and family 
relationship in the same statement.


Also, are we really sure we don't currently have linguistic issues? I 
bet there is at least one language in the world where sister and 
brother are not two distinct words.


Nemo

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-19 Thread Markus Krötzsch

Hi all,

There have been some discussions here already on what to do with the 
inferences (add them to Wikidata, just display them, add them only to 
the query service, etc.). That's great, but this is already the second 
step from where we are now.


Right now, we don't have any way yet for people to write down what 
should be inferred. If we could describe this, we could easily add 
information on what to do with the inference (add, display, make 
queryable, use for quality control, etc.). This could be discussed on a 
case-by-case basis (similar to bot requests).


Even the (very simple) case of symmetry shows that we are not there yet: 
we have no information anywhere on Wikidata that tells us that start and 
end qualifiers for spouse should also be symmetric. It is not 
automatically the case that all qualifiers of a symmetric property are 
symmetric! For example, diplomatic relation (P530) is symmetric and 
uses qualifier diplomatic mission sent (P531) that points to the 
embassy of the subject country in the value country. Clearly this 
qualifier should not be copied when inferring symmetric statements. 
Symmetry is only the simplest case; already inverse of requires more 
information ...


We therefore first need to come up with a good way of describing the 
intended inferences in the wiki. Then we can think about how to best act 
on this information, step by step. The current constraints such as this 
property is symmetric are obviously too limited for really describing 
what should be inferred. On the other hand, one needs to take care that 
descriptions are not too general, to make sure that they can still be 
implemented and that they remain meaningful when considering many of 
them together (just consider what happens when an inferred relation 
triggers another inference ...). Luckily, there is a lot of experience 
in this area today, so it's not rocket science to come up with a 
workable description language that is not a collection of special cases 
and still is not too general or too complicated.


So what's the best way to move forward? I have some ideas on how to do 
this, but I would like to also have user feedback to make sure that the 
result is easy to use and covers many important use cases. The basic 
idea would be to come up with a template-based format for describing 
rules of inference of the form If there is a statement that looks like 
X, then infer a statement that looks like Y. However, there must also 
be a way to say how the qualifiers should be formed for Y. I have some 
ideas on how to do this in a (hopefully) sane way.


If other people are interested, we could form some kind of interest 
group to work this out together. Alternatively, I can start by making a 
proposal on the wiki.


Markus



On 17.08.2015 14:47, Markus Kroetzsch wrote:

Hi Andrew,

I am very interested in this, especially in the second aspect (how to
handle symmetry). There are many cases where we have two or more ways to
say the same thing on Wikidata (symmetric properties are only one case).
It would be useful to draw these inferences so that they can used for
queries and maybe also in the UI.

This can also help to solve some of the other problems you mention: for
those who would like to have properties son and daughter, one could
infer their values automatically from other statements, without editors
having to maintain this data at all.

A possible way to maintain these statements on wiki would be to use a
special reference to encode that they have been inferred (and from
what). This would make it possible to maintain them automatically
without the problem of human editors ending up wrestling with bots ;-)
Moreover, it would not require any change in the software on which
Wikidata is running.

For the cases you mentioned, I don't think that there is a problem with
too many inferred statements. There are surely cases where it would not
be practical (in the current system) to store inferred data, but family
relationships are usually not problematic. In fact, they are very useful
to human readers.

Of course, the community needs to fully control what is inferred, and
this has to be done in-wiki. We already have symmetry information in
constraints, but for useful inference we might have to be stricter. The
current constraints also cover some not-so-strict cases where exceptions
are likely (e.g., most people have only one gender, but this is not a
strong rule; on the other hand, one is always the child of one's mother
by definition).

One also has to be careful with qualifiers etc. For example, the start
end end of a spouse statement should be copied to its symmetric
version, but there might also be qualifiers that should not be copied
like this. I would like to work on a proposal for how to specify such
things. It would be good to coordinate there.

A first step (even before adding any statement to Wikidata) could be to
add inferred information to the query services and RDF exports. This
will make it 

Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-19 Thread Markus Krötzsch

On 19.08.2015 08:38, Stas Malyshev wrote:
...


Also, there's another thing. Suppose we have Q345 - spouse - Q123, but
not Q123 - spouse - Q345, and we process entities, without loss of
generality, in order of ascending IDs. When we generate data for Q123,
we don't know yet that Q345 is linked to it, so in order to infer Q123
- spouse - Q345, we can't just load Q345 (we'd need to load it later
anyway to get the qualifiers, etc.), since we don't know we'd need it,
we'd probably somehow have to query the database (if we have suitable
links table?) for every entry that has Q123 on the other end of
spouse. I'm not even sure it's possible currently on Wikidata (query
service can easily do that, but not within 1ms), but even if it is, I
don't see how it is cacheable and doing this for every entity for
multiple relationships may be quite expensive.



That's an important concern for generating the live exports, but it does 
not actually matter for the dumps. RDF does not care about the order, so 
you can generate triples about Q123 when processing Q345. There are also 
other methods of taking advantage of inferences during query answering 
without having to precompute them first (based on query rewriting, which 
could be done by a service on top of the main SPARQL endpoint). Anyway, 
this really needs a bit more thought before it should be part of the 
main SPARQL endpoint. I will write another email on this ...


Markus


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Properties for family relationships in Wikidata

2015-08-17 Thread Andrew Gray
Hi all,

I've recently been thinking about how we handle family/genealogical
relationships in Wikidata - this is, potentially, a really valuable
source of information for researchers to have available in a
structured form, especially now we're bringing together so many
biographical databases.

We currently have the following properties to link people together:

* spouses (P26) and cohabitants (P451) - not gendered
* parents (P22/P25) and step-parents (P43/P44) - gendered
* siblings (P7/P9) - gendered
* children (P40) - not gendered (and oddly no step-children?)
* a generic related to (P1038) for more distant relationships

There's two big things that jump out here.

** First, gender. Parents are split by gender while children are not
(we have mother/father not son/daughter). Siblings are likewise
gendered, and spouses are not. These are all very early properties -
does anyone remember how we got this way?

This makes for some odd results. For example, if we want to using our
data to identify all the male-line *descendants* of a person, we have
to do some complicated inference from [P40 + target is male]. However,
to identify all the male-line *ancestors*, we can just run back up the
P22 chain. It feels quite strange to have this difference, and I
wonder if we should standardise one way or the other - split P40 or
merge the others.

In some ways, merging seems more elegant. We do have fairly good
gender metadata (and getting better all the time!), so we can still do
gender-specific relationship searches where needed. It also avoids
having to force a binary gender approach - we are in the odd position
of being able to give a nuanced entry in P21 but can only say if
someone is a sister or brother.

** Secondly, symmetry. Siblings, spouses, and parent-child pairs are
by definition symmetric. If A has P26:B, then B should also have
P26:A. The gendered cases are a little more complicated, as if A has
P40:B, then B has P22:A or P25:A, but there is still a degree of
symmetry - one of those must be true.

However, Wikidata doesn't really help us make use of this symmetry. If
I list A as spouse of B, I need to add (separately) that B is spouse
of A. If they have four children C, D, E, and F, this gets very
complicated - we have six articles with *30* links between them, all
of which need to be made manually. It feels like automatically making
symmetric links for these properties would save a lot of work, and
produce a much more reliable dataset.

I believe we decided early on not to do symmetric links because it
would swamp commonly linked articles (imagine what Q5 would look like
by now!). On the other hand, these are properties with a very narrowly
defined scope, and we actively *want* them to be comprehensively
symmetric - every parent article should list all their children on
Wikidata, and every child article should list their parent and all
their siblings.

Perhaps it's worth reconsidering whether to allow symmetry for a
specifically defined class of properties - would an automatically
symmetric P26 really swamp the system? It would be great if the system
could match up relationships and fill in missing parent/child,
sibling, and spouse links. I can't be the only one who regularly adds
one half of the relationship and forgets to include the other!

A bot looking at all of these and filling in the gaps might be a
useful approach... but it would break down if someone tries to remove
one of the symmetric entries without also removing the other, as the
bot would probably (eventually) fill it back in. Ultimately, an
automatic symmetry would seem best.

Thoughts on either of these? If there is interest I will write up a
formal proposal on-wiki.

-- 
- Andrew Gray
  andrew.g...@dunelm.org.uk

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-17 Thread Markus Kroetzsch

Hi Andrew,

I am very interested in this, especially in the second aspect (how to 
handle symmetry). There are many cases where we have two or more ways to 
say the same thing on Wikidata (symmetric properties are only one case). 
It would be useful to draw these inferences so that they can used for 
queries and maybe also in the UI.


This can also help to solve some of the other problems you mention: for 
those who would like to have properties son and daughter, one could 
infer their values automatically from other statements, without editors 
having to maintain this data at all.


A possible way to maintain these statements on wiki would be to use a 
special reference to encode that they have been inferred (and from 
what). This would make it possible to maintain them automatically 
without the problem of human editors ending up wrestling with bots ;-) 
Moreover, it would not require any change in the software on which 
Wikidata is running.


For the cases you mentioned, I don't think that there is a problem with 
too many inferred statements. There are surely cases where it would not 
be practical (in the current system) to store inferred data, but family 
relationships are usually not problematic. In fact, they are very useful 
to human readers.


Of course, the community needs to fully control what is inferred, and 
this has to be done in-wiki. We already have symmetry information in 
constraints, but for useful inference we might have to be stricter. The 
current constraints also cover some not-so-strict cases where exceptions 
are likely (e.g., most people have only one gender, but this is not a 
strong rule; on the other hand, one is always the child of one's mother 
by definition).


One also has to be careful with qualifiers etc. For example, the start 
end end of a spouse statement should be copied to its symmetric 
version, but there might also be qualifiers that should not be copied 
like this. I would like to work on a proposal for how to specify such 
things. It would be good to coordinate there.


A first step (even before adding any statement to Wikidata) could be to 
add inferred information to the query services and RDF exports. This 
will make it easier to solve part of the problem first without having 
too many discussions in parallel.


Best regards,

Markus


On 17.08.2015 13:29, Andrew Gray wrote:

Hi all,

I've recently been thinking about how we handle family/genealogical
relationships in Wikidata - this is, potentially, a really valuable
source of information for researchers to have available in a
structured form, especially now we're bringing together so many
biographical databases.

We currently have the following properties to link people together:

* spouses (P26) and cohabitants (P451) - not gendered
* parents (P22/P25) and step-parents (P43/P44) - gendered
* siblings (P7/P9) - gendered
* children (P40) - not gendered (and oddly no step-children?)
* a generic related to (P1038) for more distant relationships

There's two big things that jump out here.

** First, gender. Parents are split by gender while children are not
(we have mother/father not son/daughter). Siblings are likewise
gendered, and spouses are not. These are all very early properties -
does anyone remember how we got this way?

This makes for some odd results. For example, if we want to using our
data to identify all the male-line *descendants* of a person, we have
to do some complicated inference from [P40 + target is male]. However,
to identify all the male-line *ancestors*, we can just run back up the
P22 chain. It feels quite strange to have this difference, and I
wonder if we should standardise one way or the other - split P40 or
merge the others.

In some ways, merging seems more elegant. We do have fairly good
gender metadata (and getting better all the time!), so we can still do
gender-specific relationship searches where needed. It also avoids
having to force a binary gender approach - we are in the odd position
of being able to give a nuanced entry in P21 but can only say if
someone is a sister or brother.

** Secondly, symmetry. Siblings, spouses, and parent-child pairs are
by definition symmetric. If A has P26:B, then B should also have
P26:A. The gendered cases are a little more complicated, as if A has
P40:B, then B has P22:A or P25:A, but there is still a degree of
symmetry - one of those must be true.

However, Wikidata doesn't really help us make use of this symmetry. If
I list A as spouse of B, I need to add (separately) that B is spouse
of A. If they have four children C, D, E, and F, this gets very
complicated - we have six articles with *30* links between them, all
of which need to be made manually. It feels like automatically making
symmetric links for these properties would save a lot of work, and
produce a much more reliable dataset.

I believe we decided early on not to do symmetric links because it
would swamp commonly linked articles (imagine what Q5 would look like
by now!). On 

Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-17 Thread Gerard Meijssen
Hoi,
When you make these inferences, you have to appreciate how English oriented
they are. In many cultures there are specific names for older sisters,
brothers and younger sisters and brothers. There are names for uncles aunts
from mother's side that differ from those of father's side.

Inferences are language specific. They may have a place but they are not
obvious when you look at a scale of Wikidata.
Thanks,
  GerardM

On 17 August 2015 at 14:47, Markus Kroetzsch markus.kroetz...@tu-dresden.de
 wrote:

 Hi Andrew,

 I am very interested in this, especially in the second aspect (how to
 handle symmetry). There are many cases where we have two or more ways to
 say the same thing on Wikidata (symmetric properties are only one case). It
 would be useful to draw these inferences so that they can used for queries
 and maybe also in the UI.

 This can also help to solve some of the other problems you mention: for
 those who would like to have properties son and daughter, one could
 infer their values automatically from other statements, without editors
 having to maintain this data at all.

 A possible way to maintain these statements on wiki would be to use a
 special reference to encode that they have been inferred (and from what).
 This would make it possible to maintain them automatically without the
 problem of human editors ending up wrestling with bots ;-) Moreover, it
 would not require any change in the software on which Wikidata is running.

 For the cases you mentioned, I don't think that there is a problem with
 too many inferred statements. There are surely cases where it would not be
 practical (in the current system) to store inferred data, but family
 relationships are usually not problematic. In fact, they are very useful to
 human readers.

 Of course, the community needs to fully control what is inferred, and this
 has to be done in-wiki. We already have symmetry information in
 constraints, but for useful inference we might have to be stricter. The
 current constraints also cover some not-so-strict cases where exceptions
 are likely (e.g., most people have only one gender, but this is not a
 strong rule; on the other hand, one is always the child of one's mother by
 definition).

 One also has to be careful with qualifiers etc. For example, the start end
 end of a spouse statement should be copied to its symmetric version, but
 there might also be qualifiers that should not be copied like this. I would
 like to work on a proposal for how to specify such things. It would be good
 to coordinate there.

 A first step (even before adding any statement to Wikidata) could be to
 add inferred information to the query services and RDF exports. This will
 make it easier to solve part of the problem first without having too many
 discussions in parallel.

 Best regards,

 Markus



 On 17.08.2015 13:29, Andrew Gray wrote:

 Hi all,

 I've recently been thinking about how we handle family/genealogical
 relationships in Wikidata - this is, potentially, a really valuable
 source of information for researchers to have available in a
 structured form, especially now we're bringing together so many
 biographical databases.

 We currently have the following properties to link people together:

 * spouses (P26) and cohabitants (P451) - not gendered
 * parents (P22/P25) and step-parents (P43/P44) - gendered
 * siblings (P7/P9) - gendered
 * children (P40) - not gendered (and oddly no step-children?)
 * a generic related to (P1038) for more distant relationships

 There's two big things that jump out here.

 ** First, gender. Parents are split by gender while children are not
 (we have mother/father not son/daughter). Siblings are likewise
 gendered, and spouses are not. These are all very early properties -
 does anyone remember how we got this way?

 This makes for some odd results. For example, if we want to using our
 data to identify all the male-line *descendants* of a person, we have
 to do some complicated inference from [P40 + target is male]. However,
 to identify all the male-line *ancestors*, we can just run back up the
 P22 chain. It feels quite strange to have this difference, and I
 wonder if we should standardise one way or the other - split P40 or
 merge the others.

 In some ways, merging seems more elegant. We do have fairly good
 gender metadata (and getting better all the time!), so we can still do
 gender-specific relationship searches where needed. It also avoids
 having to force a binary gender approach - we are in the odd position
 of being able to give a nuanced entry in P21 but can only say if
 someone is a sister or brother.

 ** Secondly, symmetry. Siblings, spouses, and parent-child pairs are
 by definition symmetric. If A has P26:B, then B should also have
 P26:A. The gendered cases are a little more complicated, as if A has
 P40:B, then B has P22:A or P25:A, but there is still a degree of
 symmetry - one of those must be true.

 However, Wikidata doesn't really help us