Re: [Wikidata] WIkidata reasoning (Was: Properties for family relationships in Wikidata)

2015-08-27 Thread Svavar Kjarrval


On fim 27.ágú 2015 15:52, Markus Krötzsch wrote:
> On 27.08.2015 14:43, Svavar Kjarrval wrote:
>> So far from the other thread, the current need seems to be for two types
>> of definitions:
>> 1. How to interpret declarations depending on associated properties.
>
> If I understand your explanations correctly, the first point is a very
> specific case of inference, which is already thinking in terms of
> "hierarchies" (of some property). I am asking: how do we even know
> that some properties are supposed to be read as forming a "hierarchy".
> This is one special case of a rule of inference that one might
> formulate. Have a look at
>
> https://www.wikidata.org/wiki/Wikidata:WikiProject_Reasoning/Use_cases
>
> for some more examples of what could be relevant inferences. As you
> can see, only few of these cases have anything to do with hierarchies
> (subclass of in particular), but one could easily come up with similar
> rules to express that something should be propagated along a hierarchy
> (in some cases).
>
> > 2. Constraints (or suggestions) when interpreting multiple items.
>
> For me, a constraint is a rule that infers a Warning. It can follow a
> similar pattern as the examples I gave, but instead of deriving a new
> statement, it will derive that a human should better look at a
> particular piece of our data to check if it is meaningful.
>
> There is no huge theoretical challenge involved here, but a big
> practical one. I expect that we will refine our rules once we
> encounter cases where they do not yield the right result. If you look
> at the examples I gave, they are all mostly based on how we choose to
> define the meaning of our properties. This is different from our
> current constraints that specify how thing *usually* are in the world.
> We can have both (constraints that warn us of unusual situations and
> rules that derive statements) based on similar technology, but
> different considerations are relevant when defining these two types of
> things.
>
> As for Stubbs, there is a strong and a weak rule involved:
> * Strong: all mayors are persons (I assume now that this class
> encompasses named animals, as suggested in earlier messages; if not,
> then replace "person" by a suitable generalisation that does).
> * Weak: most mayors are humans.
>
> The strong version could probably be applied to derive new
> information, without danger of "exceptions" -- it would be part of our
> characterisation of what makes something a "person" in our view (or
> whatever other class we pick there). The weak version should only be
> used to find potential problems that humans might want to check.
>
> Similar rules exist in many domains:
> * Strong: All birds are animals (it's part of how we define "bird")
> * Weak: All birds can fly (it's something we observe for actual birds,
> but not part of the definition of what it means to be a bird).
>
> I suggest we start by focussing on strong rules, since they make a big
> contribution to documenting what we mean (by "person", by "bird",
> etc.), even before we have any tool support for acting on this
> information.
>
> Cheers,
>
> Markus

I'm a big advocate of strong versions. My suggestions for "exceptions"
was practical since we can't reasonably expect all data to be consistent
with strong definitions. Personally I wouldn't support weak versions
when a feasable strong version alternate would be available. The
constraints I had in mind are only suggestive and would only serve as
warnings, so I think we agree there. The constraints wouldn't be
enforced but rather used to detect potential mistakes in the data. It
wouldn't prevent someone adding the information that Stubbs is a mayor
when it would lead to the contradiction of him being both a human and a cat.

Regarding your question of my former definition, the point is to serve
as a classification of what can be reasonably inferred from the
relationship of two items, depending on the property used to connect
them. Like in the case of Stubbs. Stubbs is a mayor and from that
connection we can (or should be able to) to assume Stubbs is also a
public official, a head of government and a politician. However, we
shouldn't reasonably be able to assume Stubb's Freebase identifier is
the same as for the town. The purpose is to enable machines to retrieve
an item and extract all the relevant facts which can be reasonable
inferred based on the relationship of that item with other items,
recursively, until all the branches are exhausted.

- Svavar Kjarrval



signature.asc
Description: OpenPGP digital signature
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Trends in links from Wikidata items to Commons

2015-08-27 Thread James Heald
In terms of navigation from article-items to Commons categories, the 
policy is very straightforward: set and use the P373 property.


This property also makes the inverse very straightforward, to go from a 
Commons category to a Wikidata item:  use the script

https://commons.wikimedia.org/wiki/User:TheDJ/wdcat.js
or the tweaked version at
https://commons.wikimedia.org/wiki/User:Jheald/wdcat.js
which handles diacritics properly.  These scripts automatically add a 
Reasonator link to the Commons category whenever there is a Wikidata 
article-like item pointing to it with a P373.



What we have at the moment is the worst of all worlds -- namely 
inconsistency which is getting worse.


As a result people don't know what to do, and they are not setting the 
P373 property -- with the result that scripts and queries don't find the 
connections that they should.


What we need is clarity and systematic consistency.  Then it is an easy 
step to adjust the user-presentation to do the right thing.



  -- James.




On 27/08/2015 14:03, Romaine Wiki wrote:

No we have not a clear policy on only linking sitelinks to categories if
the item itself is about a category. So not let's not break that.

You suggest to break down almost the complete navigational structure
Commons has in relationship with Wikipedia, and makes it possible to find
articles that are about the same subject as the category. Without it
becomes almost impossible to identify a category on Commons to be related
to an article in Wikipedia.
Sorry, but your proposal is insane and making the navigational situation a
thousand times worse. And does it make anything better? No, totally not.
Only the opposite: worse.

Wikidata is currently heavily used to connect categories on Commons to
articles on Wikipedia. This so that interwikilinks are shown on the
category on Commons to the related Wikipedia article. This for navigational
purposes but also to uniquely identify categories on Commons to articles on
Wikipedia and items on Wikidata.

How nice Commons galleries are giving an overview, they are crap in
speaking of navigational purposes. For every subject a category on Commons
is created and used and the Commons categories form the backbone to media
categories.

It has been pointed out for a long time that the linking situation on
Commons is problematic and this is a software issue, not a user side issue.
This consists out of:
* There can only be added one sitelink to an item.
* If no sitelink added (but only added as property), a Commons category
can't show the interwikilinks.
* If a category and an article on Wikipedia/etc exist for a subject, only
one of them can be shown on the Commons category.

The annoying part is that some large wikis, especially the English
Wikipedia, creates too many categories that are not created on other
Wikipedias. This causes that categories on Commons are only linked to a
category on Wikipedia, which is useless for most other wikis and on Commons
we miss an interwikilink to the related article.

A gallery on Commons is a great way as alternative to show images, but is
not suitable for navigational purposes, as that requires a much higher
coverage and being a backbone everything relies on. On Commons only
categories have that function. A counter proposal makes more sense: no
Commons galleries as sitelinks any more and having Commons galleries only
as property added.

But this only solves a part of the problem: on Commons I would like to see
somehow that both the related category as the related article are shown.
Example: on the Commons category for a specific country both the country
category on Wikipedia is linked as the article on Wikipedia is linked.

Something I have been wondering about for a long time is why there are 2
places on an item where a Commonscat is added. I understand the development
and technical behind it, but this should not be needed.

So the developers of Wikidata should try to find a way to show both groups
of interwikilinks on categories on Commons.

As long as this is not resolved in software, this problem of 2 items both
strongly related to a Commons category keeps an issue.

Romaine





2015-08-27 11:29 GMT+02:00 James Heald :


A few days ago I made the following post to Project Chat, looking at how
people are linking from Wikidata items to Commons categories and galleries
compared to a year ago, that some people on the list may have seen, which
has now been archived:


https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2015/08#Trends_in_links_from_items_to_Commons


A couple of headlines:

* Category <-> commonscat identifications :

** There was a net increase of 61,784 Commons categories that can now be
identified with category-like items, to 323,825 Commons categories in all

**  96.4% of category <-> commonscat identifications (312,266 items) now
have sitelinks.  This represents a rise in sitelinks (60,463 items)
amounting to 97.8% of the increase in identifications

**  80.0%

Re: [Wikidata] WIkidata reasoning (Was: Properties for family relationships in Wikidata)

2015-08-27 Thread Markus Krötzsch

On 27.08.2015 14:43, Svavar Kjarrval wrote:

So far from the other thread, the current need seems to be for two types
of definitions:
1. How to interpret declarations depending on associated properties.


If I understand your explanations correctly, the first point is a very 
specific case of inference, which is already thinking in terms of 
"hierarchies" (of some property). I am asking: how do we even know that 
some properties are supposed to be read as forming a "hierarchy". This 
is one special case of a rule of inference that one might formulate. 
Have a look at


https://www.wikidata.org/wiki/Wikidata:WikiProject_Reasoning/Use_cases

for some more examples of what could be relevant inferences. As you can 
see, only few of these cases have anything to do with hierarchies 
(subclass of in particular), but one could easily come up with similar 
rules to express that something should be propagated along a hierarchy 
(in some cases).


> 2. Constraints (or suggestions) when interpreting multiple items.

For me, a constraint is a rule that infers a Warning. It can follow a 
similar pattern as the examples I gave, but instead of deriving a new 
statement, it will derive that a human should better look at a 
particular piece of our data to check if it is meaningful.


There is no huge theoretical challenge involved here, but a big 
practical one. I expect that we will refine our rules once we encounter 
cases where they do not yield the right result. If you look at the 
examples I gave, they are all mostly based on how we choose to define 
the meaning of our properties. This is different from our current 
constraints that specify how thing *usually* are in the world. We can 
have both (constraints that warn us of unusual situations and rules that 
derive statements) based on similar technology, but different 
considerations are relevant when defining these two types of things.


As for Stubbs, there is a strong and a weak rule involved:
* Strong: all mayors are persons (I assume now that this class 
encompasses named animals, as suggested in earlier messages; if not, 
then replace "person" by a suitable generalisation that does).

* Weak: most mayors are humans.

The strong version could probably be applied to derive new information, 
without danger of "exceptions" -- it would be part of our 
characterisation of what makes something a "person" in our view (or 
whatever other class we pick there). The weak version should only be 
used to find potential problems that humans might want to check.


Similar rules exist in many domains:
* Strong: All birds are animals (it's part of how we define "bird")
* Weak: All birds can fly (it's something we observe for actual birds, 
but not part of the definition of what it means to be a bird).


I suggest we start by focussing on strong rules, since they make a big 
contribution to documenting what we mean (by "person", by "bird", etc.), 
even before we have any tool support for acting on this information.


Cheers,

Markus




The first definition is used so the machine can know *if* the
declaration is up in the hierarchy or sideways. When interpreting the
item, the machine needs to know if the property implies that all
declarations of that item are inhereted. If we take some currently
living human as an example who has a Wikidata item and that human is
connected to an occupation via a property. The machine should know if it
should process the declarations of the occupation to apply them to the
human, in whole or partially. Then there are properties which don't
inheret, like if the human has a declared family member, the human
doesn't inherit the other family member's name or birthdate.

The other definition has the purpose of solving contradictions like in
my example of Stubbs. If we are realistic, it's not likely that a tree
structure with that much data is totally free of contradictions. So we
need to have some way of telling the machine that there are, or could
be, contradictions. One example of this to define that a certain
property can't be more than one of something (at any given time). For
simplification (not referring to the current data structure) is that a
human is a part of a certain species. If we were to define, in this
case, that any item can't be part of more than one species, then the
machine would detect a contradiction. In the specific example of Stubbs,
the machine would determine that cats and humans are two separate
species and there can be only one[1]. If we had a definition that the
declaration closer to the item in the specific link has precedence, then
the machine would solve it by determining that mayors are generally
humans but Stubbs being a cat is an exception to that rule.

[1] Didn't see the Highlander reference until I had written it.

- Svavar Kjarrval



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




___

Re: [Wikidata] WIkidata reasoning (Was: Properties for family relationships in Wikidata)

2015-08-27 Thread Thomas Douillard
A human is not a part of a species, it is an instance of a species :)

Contradiction management is a very interisting topic, and contradictions
are inherent to Wikidata model. We can't expect everything is consistent
considering Wikidata only reflects sources, and that 2 sources can disagree
in an essentially inconsistent way.

We could expect however that several statements extracted from the same
source should be consistent themselves, but it might be rare that we will
have enough statements that will be sourced to draw useful inferences. This
can lead to subproblems like computing the maximum set of consistent
sources on a part of the graph or finding the sources that leads to
contradiction when took together.

However, we already have qualifiers that marks a source in contradiction
with another : "statement disputed by". We could assume that the sources
involved are probably inconsistent with each other.

Or we could simply drop the consistency checks out of the inference way :)
And leave it to the constraint system : if an inference draws a path that
leads to constraint violation, then community will be notified. To avoid
explosion, the scope of inerences could be limited (not trying to compute
the transitive closure of the inferences rules application). We could use
some sort of "partial consistency" notion, such as those used in constraint
programming.

Thinking about it I can imagine constraint problems such as "considering an
inference I deduced some way, is it fully consistent with the set of
sources we have, or is there a set of sources that implies the inference is
not true ?" -> Is the inference a tautology or is the infererence only
satisfiable in a problem where each statements maps to a variable, the
different sources are values for the domain of the variables, and the
sources must be consistent wrt. what we know they says on Wikidata ?

2015-08-27 14:43 GMT+02:00 Svavar Kjarrval :

> So far from the other thread, the current need seems to be for two types
> of definitions:
> 1. How to interpret declarations depending on associated properties.
> 2. Constraints (or suggestions) when interpreting multiple items.
>
> The first definition is used so the machine can know *if* the
> declaration is up in the hierarchy or sideways. When interpreting the
> item, the machine needs to know if the property implies that all
> declarations of that item are inhereted. If we take some currently
> living human as an example who has a Wikidata item and that human is
> connected to an occupation via a property. The machine should know if it
> should process the declarations of the occupation to apply them to the
> human, in whole or partially. Then there are properties which don't
> inheret, like if the human has a declared family member, the human
> doesn't inherit the other family member's name or birthdate.
>
> The other definition has the purpose of solving contradictions like in
> my example of Stubbs. If we are realistic, it's not likely that a tree
> structure with that much data is totally free of contradictions. So we
> need to have some way of telling the machine that there are, or could
> be, contradictions. One example of this to define that a certain
> property can't be more than one of something (at any given time). For
> simplification (not referring to the current data structure) is that a
> human is a part of a certain species. If we were to define, in this
> case, that any item can't be part of more than one species, then the
> machine would detect a contradiction. In the specific example of Stubbs,
> the machine would determine that cats and humans are two separate
> species and there can be only one[1]. If we had a definition that the
> declaration closer to the item in the specific link has precedence, then
> the machine would solve it by determining that mayors are generally
> humans but Stubbs being a cat is an exception to that rule.
>
> [1] Didn't see the Highlander reference until I had written it.
>
> - Svavar Kjarrval
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Trends in links from Wikidata items to Commons

2015-08-27 Thread Romaine Wiki
No we have not a clear policy on only linking sitelinks to categories if
the item itself is about a category. So not let's not break that.

You suggest to break down almost the complete navigational structure
Commons has in relationship with Wikipedia, and makes it possible to find
articles that are about the same subject as the category. Without it
becomes almost impossible to identify a category on Commons to be related
to an article in Wikipedia.
Sorry, but your proposal is insane and making the navigational situation a
thousand times worse. And does it make anything better? No, totally not.
Only the opposite: worse.

Wikidata is currently heavily used to connect categories on Commons to
articles on Wikipedia. This so that interwikilinks are shown on the
category on Commons to the related Wikipedia article. This for navigational
purposes but also to uniquely identify categories on Commons to articles on
Wikipedia and items on Wikidata.

How nice Commons galleries are giving an overview, they are crap in
speaking of navigational purposes. For every subject a category on Commons
is created and used and the Commons categories form the backbone to media
categories.

It has been pointed out for a long time that the linking situation on
Commons is problematic and this is a software issue, not a user side issue.
This consists out of:
* There can only be added one sitelink to an item.
* If no sitelink added (but only added as property), a Commons category
can't show the interwikilinks.
* If a category and an article on Wikipedia/etc exist for a subject, only
one of them can be shown on the Commons category.

The annoying part is that some large wikis, especially the English
Wikipedia, creates too many categories that are not created on other
Wikipedias. This causes that categories on Commons are only linked to a
category on Wikipedia, which is useless for most other wikis and on Commons
we miss an interwikilink to the related article.

A gallery on Commons is a great way as alternative to show images, but is
not suitable for navigational purposes, as that requires a much higher
coverage and being a backbone everything relies on. On Commons only
categories have that function. A counter proposal makes more sense: no
Commons galleries as sitelinks any more and having Commons galleries only
as property added.

But this only solves a part of the problem: on Commons I would like to see
somehow that both the related category as the related article are shown.
Example: on the Commons category for a specific country both the country
category on Wikipedia is linked as the article on Wikipedia is linked.

Something I have been wondering about for a long time is why there are 2
places on an item where a Commonscat is added. I understand the development
and technical behind it, but this should not be needed.

So the developers of Wikidata should try to find a way to show both groups
of interwikilinks on categories on Commons.

As long as this is not resolved in software, this problem of 2 items both
strongly related to a Commons category keeps an issue.

Romaine





2015-08-27 11:29 GMT+02:00 James Heald :

> A few days ago I made the following post to Project Chat, looking at how
> people are linking from Wikidata items to Commons categories and galleries
> compared to a year ago, that some people on the list may have seen, which
> has now been archived:
>
>
> https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2015/08#Trends_in_links_from_items_to_Commons
>
>
> A couple of headlines:
>
> * Category <-> commonscat identifications :
>
> ** There was a net increase of 61,784 Commons categories that can now be
> identified with category-like items, to 323,825 Commons categories in all
>
> **  96.4% of category <-> commonscat identifications (312,266 items) now
> have sitelinks.  This represents a rise in sitelinks (60,463 items)
> amounting to 97.8% of the increase in identifications
>
> **  80.0% of category <-> commonscat identifications (259,164 items) now
> have P373 statements.  This represents a rise in P373 statements (8,774
> items) amounting to 14.2% of the increase in identifications
>
>
> *  Article <-> commonscat identifications :
>
> ** There was a net increase of 176,382 Commons categories that can now be
> identified with article-like items, to 884,439 Commons categories in all
>
> ** 23.4% of article <-> commonscat identifications (207,494 items) now
> have (deprecated) sitelinks. This represents a rise in sitelinks (112,595
> items) amounting to 63.8% of the increase in identifications.
>
> ** 91.3% of article <-> commonscat identifications (807,776 items) now
> have P373 statements. This represents a rise in P373 statements (110,727
> items) amounting to 62.8% of the increase in identifications
>
>
> *  In addition, a recent RfC showed considerable confusion as to what
> actually was the current operational Wikidata policy on sitelinks to
> Commons:
>
>
> https://www.wikidata.org/wiki/Wikidata:Reques

Re: [Wikidata] WIkidata reasoning (Was: Properties for family relationships in Wikidata)

2015-08-27 Thread Svavar Kjarrval
So far from the other thread, the current need seems to be for two types
of definitions:
1. How to interpret declarations depending on associated properties.
2. Constraints (or suggestions) when interpreting multiple items.

The first definition is used so the machine can know *if* the
declaration is up in the hierarchy or sideways. When interpreting the
item, the machine needs to know if the property implies that all
declarations of that item are inhereted. If we take some currently
living human as an example who has a Wikidata item and that human is
connected to an occupation via a property. The machine should know if it
should process the declarations of the occupation to apply them to the
human, in whole or partially. Then there are properties which don't
inheret, like if the human has a declared family member, the human
doesn't inherit the other family member's name or birthdate.

The other definition has the purpose of solving contradictions like in
my example of Stubbs. If we are realistic, it's not likely that a tree
structure with that much data is totally free of contradictions. So we
need to have some way of telling the machine that there are, or could
be, contradictions. One example of this to define that a certain
property can't be more than one of something (at any given time). For
simplification (not referring to the current data structure) is that a
human is a part of a certain species. If we were to define, in this
case, that any item can't be part of more than one species, then the
machine would detect a contradiction. In the specific example of Stubbs,
the machine would determine that cats and humans are two separate
species and there can be only one[1]. If we had a definition that the
declaration closer to the item in the specific link has precedence, then
the machine would solve it by determining that mayors are generally
humans but Stubbs being a cat is an exception to that rule.

[1] Didn't see the Highlander reference until I had written it.

- Svavar Kjarrval



signature.asc
Description: OpenPGP digital signature
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] WIkidata reasoning (Was: Properties for family relationships in Wikidata)

2015-08-27 Thread Markus Kroetzsch
[Splitting the general (Wikidata reasoning; this thread) from the 
specific (Wikidata family relationships for horses; original thread).]



Many issues have been brought up, and we cannot solve all with one big 
hammer. I have now started a WikiProject (see below) to address one of 
the key points raised by Peter:


'''
Nobody has ever defined which inferences can/should be drawn from the 
content of Wikidata.

'''

We do in fact use several properties that seem to ask for inferencing. 
Probably the clearest is "subclass of" (P279). It has been related to 
rdfs:subClassOf in many community discussions, so it seems clear that a 
similar meaning is intended. This would lead to the following rule:


'''
If an item A has a "subclass of" statement with value B,
and if item B has a "subclass of" statement with value C,
  then it should follow
that item A has a "subclass of" statement with value C."
'''

I think there is wide agreement on this idea. Constraints rely on it 
(constraint checking travels the P279 hierarchy), and it's a main 
motivation for why Wikidata Query has its "tree" feature. There are 
similarly clear intentions for the properties "instance of" (P31) and 
"subproperty of" (P1647). I am not spelling them out here.


Nevertheless, Peter is right that even in these cases, the intention is 
not fully clear, because of two reasons:


(1) There is no machine-readable specification of the intended 
behaviour. It's part of user discussions, not of the data or templates. 
Even the user discussions are distributed over several pages, so a lot 
of wiki archaeology is needed to get a full picture of what we, the 
community, might have intended.
(2) The informal discussions on the intended semantics are not precise 
about all relevant cases. Many questions remain open, such as what to do 
if qualifiers are used on a statement (rarely the case for "subclass 
of", but not so uncommon for "instance of").


To address these issues, I propose to come up with a format that allows 
us to clearly specify inference rules such as the one for "subclass of" 
above. Each rule should have one page where it is specified (for humans 
and machines), explained (to humans), and discussed. It is not possible 
to encode such rules as property values on data pages (for a start, it 
would not be clear which page this should be on, because rules typically 
refer to several properties and items). Therefore, the best we could do 
now seems to have standard wiki pages for this. They could be linked 
from all relevant properties/items (talk pages) though.


Even if we do not have any reasoner to compute all the results, writing 
down the intended rules would be useful documentation for other users to 
clarify what we expect (see the original family relationship discussion).


I propose to start by gathering use cases, that is, examples of rules 
that we might want to express. From this, we can then extract suitable 
template structure. I have created a WikiProject for getting us started:


https://www.wikidata.org/wiki/Wikidata:WikiProject_Reasoning

Feel free to contribute.

Best regards,

Markus


On 27.08.2015 06:26, Peter F. Patel-Schneider wrote:>
>
> On 08/26/2015 06:01 PM, Svavar Kjarrval wrote:
>> On mið 26.ágú 2015 23:05, James Heald wrote:
>>> There are a *lot* of problems with P279 (subclass), right across
>>> Wikidata.
>>>
>>> These will only be corrected once people start doing searches in a
>>> systematic way and addressing the anomalies they find.
>>>
>>> In this case, politician (Q82955) should *not* be a subclass of human
>>> (Q5), instead it should be a subclass of something like occupation
>>> (Q13516667), or alternatively perhaps profession (Q28640).
>>>
>>>
>>> My understanding is that currently there are a vast number of
>>> incorrect subclass relationships in the project, messing up tree
>>> searches, and so far it is something that has simply not yet been
>>> systematically addressed.
>>>
>>>-- James.
>>>
>>>
>> For now, what's the best way to find (and perhaps correct) incorrect
>> declarations like these?
>>
>> If I were to just change items for commonly used items like politician
>> (Q82955) it might be construed as vandalism or someone who doesn't care
>> about or understand the Stubbs-declared-as-a-human problem might just
>> add that declaration back later.
>>
>> When it comes to the gender property (P21), the human readable
>> description indicates that it's to define genders in general, yet it's
>> declared as an instance of an item (Q18608871) which only applies to
>> humans, which of course has consequences further up in the hierarchy
>> since the maintainers of item Q18608871 faithfully assume it only
>> applies to humans.
>
> Well, the situation with respect to  Wikidata property for items 
about people
> (Q18608871) is very difficult.   There is absolutely no 
machine-interpretable
> information associated with this class that can be used to deterimine 
that

> instances of it are only suppo

[Wikidata] Trends in links from Wikidata items to Commons

2015-08-27 Thread James Heald
A few days ago I made the following post to Project Chat, looking at how 
people are linking from Wikidata items to Commons categories and 
galleries compared to a year ago, that some people on the list may have 
seen, which has now been archived:


https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2015/08#Trends_in_links_from_items_to_Commons


A couple of headlines:

* Category <-> commonscat identifications :

** There was a net increase of 61,784 Commons categories that can now be 
identified with category-like items, to 323,825 Commons categories in all


**  96.4% of category <-> commonscat identifications (312,266 items) now 
have sitelinks.  This represents a rise in sitelinks (60,463 items) 
amounting to 97.8% of the increase in identifications


**  80.0% of category <-> commonscat identifications (259,164 items) now 
have P373 statements.  This represents a rise in P373 statements (8,774 
items) amounting to 14.2% of the increase in identifications



*  Article <-> commonscat identifications :

** There was a net increase of 176,382 Commons categories that can now 
be identified with article-like items, to 884,439 Commons categories in all


** 23.4% of article <-> commonscat identifications (207,494 items) now 
have (deprecated) sitelinks. This represents a rise in sitelinks 
(112,595 items) amounting to 63.8% of the increase in identifications.


** 91.3% of article <-> commonscat identifications (807,776 items) now 
have P373 statements. This represents a rise in P373 statements 
(110,727 items) amounting to 62.8% of the increase in identifications



*  In addition, a recent RfC showed considerable confusion as to what 
actually was the current operational Wikidata policy on sitelinks to 
Commons:


https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Category_commons_P373_and_%22Other_sites%22 




In view of the trends above; and the need for predictability and 
consistency for queries and templates and scripts to depend on; and 
particularly in view of the apparent confusion as to what the 
operational policy currently actually is, can I suggest that the time 
has come for a bot to monitor all new sitelinks to Commons categories,

*  adding a corresponding P373 statement if there is not one already, and
*  removing the sitelink if it is from an article-like item to a commonscat.


I believe we have clear policy on only sitelinking commons categories to 
category-like items, and commons galleries to article-like items; but 
there is currently confusion and unpredictability being caused because 
these relationships are not being enforced -- breaking scripts and queries.


It's time to fix this.


All best,

  James.


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-27 Thread Gerard Meijssen
Hoi,
Absolutely..

When full genealogy information is available, you do not need special words
that indicate whatever. It is only when this is not the case that you need
to specify what type of link there is. This can be specific like maternal
uncle or paternal aunt. This makes a practical difference in several
cultures and is THEREFORE significant. Again, it is only of relevance when
it cannot be inferred.
Thanks,
  GerardM

On 27 August 2015 at 11:08, Marielle Volz  wrote:

> If you want to find all humans on wikidata, find all items with the
> property "instance of" (p35) equal to "human" (q5). There is no need
> to infer this from things like having the parent property, that's a
> terrible way to do things. Items that are instances of different items
> use the same properties all the time, you shouldn't be inferring
> anything about the class of an item based on the properties it has.
>
> If you are worried about horses being put in a genealogical tree with
> humans, that would require someone to put a horse as a parent of a
> human or vice versa. That's an problem with an invalid relationship
> being added, not the property itself.
>
> On Wed, Aug 26, 2015 at 6:43 PM, Svavar Kjarrval 
> wrote:
> >
> >
> > On mið 26.ágú 2015 13:58, Peter F. Patel-Schneider wrote:
> >> I don't think that P21 (https://www.wikidata.org/wiki/Property:P21,
> sex or
> >> gender) is a subclass of P31 (
> https://www.wikidata.org/wiki/Property:P31,
> >> instance of).  Properties aren't subclasses in general.
> >>
> >> Perhaps you meant to talk about
> https://www.wikidata.org/wiki/Property:P21
> >> (sex or gender) being related via (
> https://www.wikidata.org/wiki/Property:P31
> >> (instance of) to https://www.wikidata.org/wiki/Q18608871 (Wikidata
> property
> >> for items about people).   This indicates that the property should only
> be
> >> used on people, even though the description of the property itself
> talks about
> >> its use on animals.
> >>
> >> It appears that Wikidata is not very consistent internally.
> >>
> >> peter
> >>
> > Sorry, I'm not used to the Wikidata lingo.
> >
> > To further explain my point (to which I think you have already agreed
> to):
> > If I were to produce a code which makes assumptions based on such
> > relations, the code would come to the contradiction that a non-human
> > with a P21 relation is a human, if it were to recursively travel via in
> > the hierarchy of declarations. P21 is declared with a P31->Q18608871 and
> > Q18608871 is in turn declared P1269->Q5. Unless special precautions
> > would be taken, anyone trying to generate an exhaustive list of all
> > humans on Wikidata (without relying solely on the direct declaration on
> > each item), they might find themselves with non-humans on that list due
> > to travelling backwards via such relations.
> >
> > In essence, it seems like P21 either wrongfully allows definitions of
> > genders of non-humans or that the property is too broad for a
> > declaration of P31->Q18608871.
> >
> > - Svavar Kjarrval
> >
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-27 Thread Marielle Volz
If you want to find all humans on wikidata, find all items with the
property "instance of" (p35) equal to "human" (q5). There is no need
to infer this from things like having the parent property, that's a
terrible way to do things. Items that are instances of different items
use the same properties all the time, you shouldn't be inferring
anything about the class of an item based on the properties it has.

If you are worried about horses being put in a genealogical tree with
humans, that would require someone to put a horse as a parent of a
human or vice versa. That's an problem with an invalid relationship
being added, not the property itself.

On Wed, Aug 26, 2015 at 6:43 PM, Svavar Kjarrval  wrote:
>
>
> On mið 26.ágú 2015 13:58, Peter F. Patel-Schneider wrote:
>> I don't think that P21 (https://www.wikidata.org/wiki/Property:P21, sex or
>> gender) is a subclass of P31 (https://www.wikidata.org/wiki/Property:P31,
>> instance of).  Properties aren't subclasses in general.
>>
>> Perhaps you meant to talk about https://www.wikidata.org/wiki/Property:P21
>> (sex or gender) being related via (https://www.wikidata.org/wiki/Property:P31
>> (instance of) to https://www.wikidata.org/wiki/Q18608871 (Wikidata property
>> for items about people).   This indicates that the property should only be
>> used on people, even though the description of the property itself talks 
>> about
>> its use on animals.
>>
>> It appears that Wikidata is not very consistent internally.
>>
>> peter
>>
> Sorry, I'm not used to the Wikidata lingo.
>
> To further explain my point (to which I think you have already agreed to):
> If I were to produce a code which makes assumptions based on such
> relations, the code would come to the contradiction that a non-human
> with a P21 relation is a human, if it were to recursively travel via in
> the hierarchy of declarations. P21 is declared with a P31->Q18608871 and
> Q18608871 is in turn declared P1269->Q5. Unless special precautions
> would be taken, anyone trying to generate an exhaustive list of all
> humans on Wikidata (without relying solely on the direct declaration on
> each item), they might find themselves with non-humans on that list due
> to travelling backwards via such relations.
>
> In essence, it seems like P21 either wrongfully allows definitions of
> genders of non-humans or that the property is too broad for a
> declaration of P31->Q18608871.
>
> - Svavar Kjarrval
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-27 Thread Ole Palnatoke Andersen
Well, I decided to be bold (that is often the road to reversion, but
let's get the ball rolling):

Tarok[1] now has Pay Dirt[2] as his father.
B.B.S. Sugarlight[3] now has Sugarsweet Sid[4] as his mother, and she
has Sugarcane Hanover[5] as her father.

1 https://www.wikidata.org/wiki/Q12338810
2 https://www.wikidata.org/wiki/Q12331109
3 https://www.wikidata.org/wiki/Q20872428
4 https://www.wikidata.org/wiki/Q20873813
5 https://www.wikidata.org/wiki/Q12003911

When I asked about this on Facebook, the first answer was "Random
guess: Check out Secretariat. My guess is that it has been registered
thoroughly."
Now the quest is to connect Secretariat, Tarok and Sugarcane Hanover.. :-)


On Wed, Aug 26, 2015 at 9:24 PM, Joe Filceolaire  wrote:
> Every other ontology mixes humans with fictional characters and with groups
> of humans and possibly fictional humans (biblical characters for instance).
> Wikidata has gone to a lot of trouble to try to untangle these into separate
> classes. Anyone trying to get an exhaustive list of humans and not using
>  deserves everything he gets.
>
> P21 (sex or gender) is very explicitly specified as being usable for humans
> and for other creatures. At the request of some languages we have separate
> items for 'female human' and for 'female creature' (we have the same for
> male), 'Female human' is 'subclass of:female creature'. Relying on P21 to
> tell if something is or is not human is not recommended as it will probably
> miss out all the humans who are neither male nor female - wikidata has about
> a dozen other values that can be used with this property.
>
> Father (P22) and mother (P25) can perfectly well be used for non-humans and
> if the current constraints on these properties flag this as a problem then
> the constraints will have to be updated. I expect to see extensive pedigrees
> for racehorses entered in Wikidata. Note that there is a proposal under
> consideration to replace P22 and P25 with a single 'parent' property.
>
> Hope this helps
>
> Joe
>
>
> On Wed, 26 Aug 2015 18:44 Svavar Kjarrval  wrote:
>>
>>
>>
>> On mið 26.ágú 2015 13:58, Peter F. Patel-Schneider wrote:
>> > I don't think that P21 (https://www.wikidata.org/wiki/Property:P21, sex
>> > or
>> > gender) is a subclass of P31
>> > (https://www.wikidata.org/wiki/Property:P31,
>> > instance of).  Properties aren't subclasses in general.
>> >
>> > Perhaps you meant to talk about
>> > https://www.wikidata.org/wiki/Property:P21
>> > (sex or gender) being related via
>> > (https://www.wikidata.org/wiki/Property:P31
>> > (instance of) to https://www.wikidata.org/wiki/Q18608871 (Wikidata
>> > property
>> > for items about people).   This indicates that the property should only
>> > be
>> > used on people, even though the description of the property itself talks
>> > about
>> > its use on animals.
>> >
>> > It appears that Wikidata is not very consistent internally.
>> >
>> > peter
>> >
>> Sorry, I'm not used to the Wikidata lingo.
>>
>> To further explain my point (to which I think you have already agreed to):
>> If I were to produce a code which makes assumptions based on such
>> relations, the code would come to the contradiction that a non-human
>> with a P21 relation is a human, if it were to recursively travel via in
>> the hierarchy of declarations. P21 is declared with a P31->Q18608871 and
>> Q18608871 is in turn declared P1269->Q5. Unless special precautions
>> would be taken, anyone trying to generate an exhaustive list of all
>> humans on Wikidata (without relying solely on the direct declaration on
>> each item), they might find themselves with non-humans on that list due
>> to travelling backwards via such relations.
>>
>> In essence, it seems like P21 either wrongfully allows definitions of
>> genders of non-humans or that the property is too broad for a
>> declaration of P31->Q18608871.
>>
>> - Svavar Kjarrval
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 
http://palnatoke.org * @palnatoke * +4522934588

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata