Re: [Wikidata-l] What is the point of properties?
Hello, Concerning the use of owl:sameAs http://www.w3.org/TR/owl-ref/#IndividualIdentity, it is used in dbpedia to link for instance http://dbpedia.org/page/Joseph_Hocking to its equivalent in Freebase, WikiData and Yago. If we refer to your remark, Markus, this is not an example to follow ? If the use of owl:sameAs http://www.w3.org/TR/owl-ref/#IndividualIdentity is discouraged, what is its purpose and in which case could it be used ? Does this means that OWL lacks a proper way to interlink ressources from different editors ? By the way, is the notion of individuals an OWL concept ? Alternative ways to interlink data could also be found on : http://notes.3kbo.com/owl-sameas. Jean-Baptiste Pressac Traitement et analyse de bases de données Centre de Recherche Bretonne et Celtique UMS 3554 20 rue Duquesne CS 93837 29238 Brest cedex 3 tel : +33 (0)2 98 01 68 95 fax : +33 (0)2 98 01 63 93 attachment: Jean-Baptiste_Pressac.vcf___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] What is the point of properties?
On 29/05/14 21:04, Andrew Gray wrote: One other issue to bear in mind: it's *simple* to have properties as a separate thing. I have been following this discussion with some interest but... well, I don't think I'm particularly stupid, but most of it is completely above my head. Saying here are items, here are a set of properties you can define relating to them, here's some notes on how to use properties is going to get a lot more people able to contribute than if they need to start understanding theoretical aspects of semantic relationships... Good point. The thread has really gone off in a rather philosophical direction :-) As Jane said, examples (of places where a property should be used *and* of places where it should not be used) are definitely much more useful to help our editors on the ground. I usually use items I know as role models or have a look for suitable showcase items. Markus On 28 May 2014 09:37, Daniel Kinzler daniel.kinz...@wikimedia.de wrote: Key differences between Properties and Items: * Properties have a data type, items don't. * Items have sitelinks, Properties don't. * Items have Statements, Properties will support Claims (without sources). The software needs these constraints/guarantees to be able to take shortcuts, provide specialized UI and API functionality, etc. Yes, it would be possible to use items as properties instead of having a separate entity type. But they are structurally and functionally different, so it makes sense to have a strict separate. This makes a lot of things easier, e.g.: * setting different permissions for properties * mapping to rdf vocabularies More fundamentally, they are semantically different: an item describes a concept in the real world, while a property is a structural component used for such a description. Yes, properies are simmilar to data items, and in some cases, there may be an item representing the same concept that is represented by a property entity. I don't see why that is a problem, while I can see a lot of confusion arising from mixing them. -- daniel Am 28.05.2014 09:25, schrieb David Cuenca: Since the very beginning I have kept myself busy with properties, thinking about which ones fit, which ones are missing to better describe reality, how integrate into the ones that we have. The thing is that the more I work with them, the less difference I see with normal items and if soon there will be statements allowed in property pages, the difference will blur even more. I can understand that from the software development point of view it might make sense to have a clear difference. Or for the community to get a deeper understanding of the underlying concepts represented by words. But semantically I see no difference between: cement (Q45190) emissivity (P1295) 0.54 and cement (Q45190) emissivity (Q899670) 0.54 Am I missing something here? Are properties really needed or are we adding unnecessary artificial constraints? Cheers, Micru ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] What is the point of properties?
On Thu, May 29, 2014 at 9:04 PM, Andrew Gray andrew.g...@dunelm.org.uk wrote: One other issue to bear in mind: it's *simple* to have properties as a separate thing. I have been following this discussion with some interest but... well, I don't think I'm particularly stupid, but most of it is completely above my head. Saying here are items, here are a set of properties you can define relating to them, here's some notes on how to use properties is going to get a lot more people able to contribute than if they need to start understanding theoretical aspects of semantic relationships... Definitely, I cannot agree more. TBH, the original question of this thread was already settled some messages ago. I understand that it might result confusing that we have wandered off into other realms, so I consider that it is better to consider this thread closed and I will consider opening a new one with the right topic (which is quite different as it started :-P) Cheers, Micru ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] What is the point of properties?
And to summarize the answer of the original question to future readers. The point of properties is: a) to help humans to better understand Wikidata b) to help programmers (also humans :P) build the software running it c) to make a distinction between concepts found in the world and the concepts that have been interiorized by the community There might be more, but those are the main points that suggest that it is better to keep properties and items separate even if their essence is the same. Thank you all for this learning experience :-) Micru ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] What is the point of properties?
On Fri, May 30, 2014 at 10:06 AM, Andrew Gray andrew.g...@dunelm.org.uk wrote: Do we have an easy way of highlighting a gallery of good examples or even a plain wikipage of topical guidance? Would be very useful if we could say 'here's a politician, here's a French city, etc' https://www.wikidata.org/wiki/Wikidata:Showcase_items :) -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] What is the point of properties?
@David: I think you should have a look to fuzzy logic https://www.wikidata.org/wiki/Q224821 :) 2014-05-29 1:48 GMT+02:00 David Cuenca dacu...@gmail.com: Markus, On Thu, May 29, 2014 at 12:53 AM, Markus Krötzsch mar...@semantic-mediawiki.org wrote: This is an easy question once you have been clear about what human behaviour is. According to enwiki, it is a range of behaviours *exhibited by* humans. Settled :) Let's leave it at defined as a trait of What would anybody do with this data? In what application could it be of interest? Well, our goal it to gather the whole human knowledge, not to use it. I can think of several applications, but let's leave that open. Never underestimate human creativity ;-) Moreover, as a great Icelandic ontologist once said: There is definitely, definitely, definitely no logic, to human behaviour ;-) Definitely, that is why we spend so much time in front of flickering squares making them flicker even more. It makes total sense :P I think constraints are already understood in this way. The name comes from databases, where a constraint violation is indeed a rather hard error. On the other hand, ironically, constraints (as a technical term) are often considered to be a softer form of modelling than (onto)logical axioms: a constraint can be violated while a logical axiom (as the name suggests) is always true -- if it is not backed by the given data, new data will be inferred. So as a technical term, constraint is quite appropriate for the mechanism we have, although it may not be the best term to clarify the intention. Ok, I will not fight traditional labels nor conventions. I was interested in pointing out to the inappropriateness of using a word inside our community with a definition that doesn't matches its use, when there is another word that matches perfectly and conveys its meaning better to users. Some important ideas like classification (instance of/subclass of) belong completely to the analytical realm. We don't observe classes, we define them. A planet is what we call a planet, and this can change even if the actual lumps in space are pretty much the same. Agreed. Better labels could be defined as instance of/defined as subclass of Now inferences are slightly different. If we know that X implies Y, then if A says X we can infer that (implicitly) A says Y. That is a logical relationship (or rule) on the level of what is claimed, rather than on the level of statements. Note that we still need to have a way to find out that X implies Y, which is a content-level claim that should have its own reference somewhere. We mainly use inference in this sense with subclass of in reasonator or when checking constraints. In this case, the implications are encoded as subclass-of statements (If X is a piano, then X is an instrument). This allows us to have references on the implications. Nope, nope, nope. I was not referring to hard implications, but to heuristic ones. Consider that these properties in the item namespace: defined as a trait of defined as having defined as instance of Would translate as these constraints in the property namespace: likely to be a trait of likely to have likely to be an instance of In general, an interesting question here is what the status of subclass of really is. Do we gather this information from external sources (surely there must be a book that tells us that pianos are instruments) or do we as a community define this for Wikidata (surely, the overall hierarchy we get is hardly the universal class hierarchy of the world but a very specific classification that is different from other classifications that may exist elsewhere)? Best not to think about it too much and to gather sources whenever we have them ;-) I think it is good to think about it and to consider options to deal with it. Like for instance: defined as instance of corresponds with item Wikimedia community concept We already have items that refer to concepts that only make sense for us, so no change in that regard. At the moment, hard constraints (from definitions) and soft constraints (expectations) are simply mixed, and maybe this is fine since we handle them in a similar fashion (humans need to look how to fix the situation). Most constraints, even those that refer to definitions, are rather soft anyway since we apply them to statements, not to hard facts. Hard constraints can only occur in cases where the *encoding* of a statement in Wikidata is wrong (not the intended statement as such, but how it was translated to data). As explained above, expectations inferred from definitions should not be treated as hard constraints, but as soft ones. Micru ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l
Re: [Wikidata-l] What is the point of properties?
On 29/05/14 12:41, Thomas Douillard wrote: @David: I think you should have a look to fuzzy logic https://www.wikidata.org/wiki/Q224821:) Or at probabilistic logic, possibilistic logic, epistemic logic, ... it's endless. Let's first complete the data we are sure of before we start to discuss whether Pluto is a planet with fuzzy degree 0.6 or 0.7 ;-) (The problem with quantitative logics is that there is usually no reference for the numbers you need there, so they are not well suited for a secondary data collection like Wikidata that relies on other sources. The closest concept that still might work is probabilistic logic, since you can really get some probabilities from published data; but even there it is hard to use the probability as a raw value without specifying very clearly what the experiment looked like.) Markus 2014-05-29 1:48 GMT+02:00 David Cuenca dacu...@gmail.com mailto:dacu...@gmail.com: Markus, On Thu, May 29, 2014 at 12:53 AM, Markus Krötzsch mar...@semantic-mediawiki.org mailto:mar...@semantic-mediawiki.org wrote: This is an easy question once you have been clear about what human behaviour is. According to enwiki, it is a range of behaviours *exhibited by* humans. Settled :) Let's leave it at defined as a trait of What would anybody do with this data? In what application could it be of interest? Well, our goal it to gather the whole human knowledge, not to use it. I can think of several applications, but let's leave that open. Never underestimate human creativity ;-) Moreover, as a great Icelandic ontologist once said: There is definitely, definitely, definitely no logic, to human behaviour ;-) Definitely, that is why we spend so much time in front of flickering squares making them flicker even more. It makes total sense :P I think constraints are already understood in this way. The name comes from databases, where a constraint violation is indeed a rather hard error. On the other hand, ironically, constraints (as a technical term) are often considered to be a softer form of modelling than (onto)logical axioms: a constraint can be violated while a logical axiom (as the name suggests) is always true -- if it is not backed by the given data, new data will be inferred. So as a technical term, constraint is quite appropriate for the mechanism we have, although it may not be the best term to clarify the intention. Ok, I will not fight traditional labels nor conventions. I was interested in pointing out to the inappropriateness of using a word inside our community with a definition that doesn't matches its use, when there is another word that matches perfectly and conveys its meaning better to users. Some important ideas like classification (instance of/subclass of) belong completely to the analytical realm. We don't observe classes, we define them. A planet is what we call a planet, and this can change even if the actual lumps in space are pretty much the same. Agreed. Better labels could be defined as instance of/defined as subclass of Now inferences are slightly different. If we know that X implies Y, then if A says X we can infer that (implicitly) A says Y. That is a logical relationship (or rule) on the level of what is claimed, rather than on the level of statements. Note that we still need to have a way to find out that X implies Y, which is a content-level claim that should have its own reference somewhere. We mainly use inference in this sense with subclass of in reasonator or when checking constraints. In this case, the implications are encoded as subclass-of statements (If X is a piano, then X is an instrument). This allows us to have references on the implications. Nope, nope, nope. I was not referring to hard implications, but to heuristic ones. Consider that these properties in the item namespace: defined as a trait of defined as having defined as instance of Would translate as these constraints in the property namespace: likely to be a trait of likely to have likely to be an instance of In general, an interesting question here is what the status of subclass of really is. Do we gather this information from external sources (surely there must be a book that tells us that pianos are instruments) or do we as a community define this for Wikidata (surely, the overall hierarchy we get is hardly the universal class hierarchy of the world but a very specific classification that is different from other classifications that may exist elsewhere)? Best not to think about it too much and to gather sources whenever
Re: [Wikidata-l] What is the point of properties?
hehe, maybe some kind inferences can lead to a good heuristic to suggest properties and values in the entity suggester. As they naturally become softer and softer by combination of uncertainties, this could also provide some kind of limits for inferences by fixing a probability below which we don't add a fuzzy fact to the set of facts. Maybe we could fix an heuristic starting fuzziness or probability score based on 1 sourced claim - big score ; one disputed claim ; based on ranks and so on. 2014-05-29 13:43 GMT+02:00 Markus Krötzsch mar...@semantic-mediawiki.org: On 29/05/14 12:41, Thomas Douillard wrote: @David: I think you should have a look to fuzzy logic https://www.wikidata.org/wiki/Q224821:) Or at probabilistic logic, possibilistic logic, epistemic logic, ... it's endless. Let's first complete the data we are sure of before we start to discuss whether Pluto is a planet with fuzzy degree 0.6 or 0.7 ;-) (The problem with quantitative logics is that there is usually no reference for the numbers you need there, so they are not well suited for a secondary data collection like Wikidata that relies on other sources. The closest concept that still might work is probabilistic logic, since you can really get some probabilities from published data; but even there it is hard to use the probability as a raw value without specifying very clearly what the experiment looked like.) Markus 2014-05-29 1:48 GMT+02:00 David Cuenca dacu...@gmail.com mailto:dacu...@gmail.com: Markus, On Thu, May 29, 2014 at 12:53 AM, Markus Krötzsch mar...@semantic-mediawiki.org mailto:mar...@semantic-mediawiki.org wrote: This is an easy question once you have been clear about what human behaviour is. According to enwiki, it is a range of behaviours *exhibited by* humans. Settled :) Let's leave it at defined as a trait of What would anybody do with this data? In what application could it be of interest? Well, our goal it to gather the whole human knowledge, not to use it. I can think of several applications, but let's leave that open. Never underestimate human creativity ;-) Moreover, as a great Icelandic ontologist once said: There is definitely, definitely, definitely no logic, to human behaviour ;-) Definitely, that is why we spend so much time in front of flickering squares making them flicker even more. It makes total sense :P I think constraints are already understood in this way. The name comes from databases, where a constraint violation is indeed a rather hard error. On the other hand, ironically, constraints (as a technical term) are often considered to be a softer form of modelling than (onto)logical axioms: a constraint can be violated while a logical axiom (as the name suggests) is always true -- if it is not backed by the given data, new data will be inferred. So as a technical term, constraint is quite appropriate for the mechanism we have, although it may not be the best term to clarify the intention. Ok, I will not fight traditional labels nor conventions. I was interested in pointing out to the inappropriateness of using a word inside our community with a definition that doesn't matches its use, when there is another word that matches perfectly and conveys its meaning better to users. Some important ideas like classification (instance of/subclass of) belong completely to the analytical realm. We don't observe classes, we define them. A planet is what we call a planet, and this can change even if the actual lumps in space are pretty much the same. Agreed. Better labels could be defined as instance of/defined as subclass of Now inferences are slightly different. If we know that X implies Y, then if A says X we can infer that (implicitly) A says Y. That is a logical relationship (or rule) on the level of what is claimed, rather than on the level of statements. Note that we still need to have a way to find out that X implies Y, which is a content-level claim that should have its own reference somewhere. We mainly use inference in this sense with subclass of in reasonator or when checking constraints. In this case, the implications are encoded as subclass-of statements (If X is a piano, then X is an instrument). This allows us to have references on the implications. Nope, nope, nope. I was not referring to hard implications, but to heuristic ones. Consider that these properties in the item namespace: defined as a trait of defined as having defined as instance of Would translate as these constraints in the property namespace: likely to be a trait of
Re: [Wikidata-l] What is the point of properties?
On 29/05/14 13:53, Thomas Douillard wrote: hehe, maybe some kind inferences can lead to a good heuristic to suggest properties and values in the entity suggester. As they naturally become softer and softer by combination of uncertainties, this could also provide some kind of limits for inferences by fixing a probability below which we don't add a fuzzy fact to the set of facts. Maybe we could fix an heuristic starting fuzziness or probability score based on 1 sourced claim - big score ; one disputed claim ; based on ranks and so on. Sorry, I have to expand on this a bit ... My main point was that there are many fuzzy logics (depending on the t-norm you chose) and many probabilistic logics (depending on the stochastic assumptions you make). The meaning of a score crucially depends on which logic you are in. Moreover, at least in fuzzy logic, the scores only are relevant in comparison to other scores (there is no absolute meaning to 0.3) -- therefore you need to ensure that the scores are assigned in a globally consistent way (0.3 in Wikidata would have to mean exactly the same wherever it is used). This makes it extremely hard to implement such an approach in practice in a large, distributed knowledge base like ours. What's more, you cannot find these scores in books or newspapers, so you somehow have to make them up in another way. You suggested to use this for statements that are not generally accepted, but how do you measure how disputed a statement is? If two thirds of references are for it and the rest is against it, do you assign 0.66 as a score? It's very tricky. Fuzzy logic has its main use in fuzzy control (the famous washing machine example), which is completely different and largely unrelated to fuzzy knowledge representation. In knowledge representation, fuzzy approaches are also studied, but their application is usually in a closed system (e.g., if you have one system that extracts data from a text and assigns certainties to all extracted facts in the same way). It's still unclear how to choose the right logic, but at least it will give you a uniform treatment of your data according to some fixed principles (whether they make sense or not). The situation is much clearer in probabilistic logics, where you define your assumptions first (e.g., you assume that events are independent or that dependencies are captured in some specific way). This makes it more rigorous, but also harder to apply, since in practice these assumptions rarely hold. This is somewhat tolerable if you have a rather uniform data set (e.g., a lot of sensor measurements that give you some probability for actual states of the underlying system). But if you have a huge, open, cross-domain system like Wikidata, it would be almost impossible to force it into a particular probability framework where 0.3 really means in 30% of all cases. Also note that scientific probability is always a limit of observed frequencies. It says: if you do something again and again, this is the rate you will get. Often-heard statements like We have an 80% chance to succeed! or Chances are almost zero that the Earth will blow up tomorrow! are scientifically pointless, since you cannot repeat the experiments that they claim to make statements about. Many things we have in Wikidata are much more on the level of such general statements than on the level that you normally use probability for (good example of a proper use of probability: based the tests that we did so far, this patient has a 35% chance of having cancer -- these are not the things we normally have in Wikidata). Markus 2014-05-29 13:43 GMT+02:00 Markus Krötzsch mar...@semantic-mediawiki.org mailto:mar...@semantic-mediawiki.org: On 29/05/14 12:41, Thomas Douillard wrote: @David: I think you should have a look to fuzzy logic https://www.wikidata.org/__wiki/Q224821 https://www.wikidata.org/wiki/Q224821:) Or at probabilistic logic, possibilistic logic, epistemic logic, ... it's endless. Let's first complete the data we are sure of before we start to discuss whether Pluto is a planet with fuzzy degree 0.6 or 0.7 ;-) (The problem with quantitative logics is that there is usually no reference for the numbers you need there, so they are not well suited for a secondary data collection like Wikidata that relies on other sources. The closest concept that still might work is probabilistic logic, since you can really get some probabilities from published data; but even there it is hard to use the probability as a raw value without specifying very clearly what the experiment looked like.) Markus ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] What is the point of properties?
The other answers, under the original subject: On 29/05/14 01:48, David Cuenca wrote: Settled :) Let's leave it at defined as a trait of I don't think it is very clear what the intention of this property is. What are the limits of its use? What is it meant to do? Can behaviour really be a trait of a species? If we allow it here, it seems to apply to all kinds of connections: density/car? eternity/time? time/reality? evil/devil? rigour/science? -- this is opening a can of worms. It will be hard to maintain this. Wikiuser13 recently added consists of: Neptune to Q1. It was fixed. But it is a good example of the kind of confusion that comes from such general ontological (in the philosophical sense) properties. And consists of is still very simple compared to defined as a trait of. Can't we focus on more obvious things like has social network account for a while? ;-) ... Some important ideas like classification (instance of/subclass of) belong completely to the analytical realm. We don't observe classes, we define them. A planet is what we call a planet, and this can change even if the actual lumps in space are pretty much the same. Agreed. Better labels could be defined as instance of/defined as subclass of I don't think this is better. The short names are fine. As I explained in my email, Wikidata statements are mainly about what the external references say. The distinction between defined and observed is not on the surface of this. The main question is Did the reference say that pianos are instruments? but not Did the reference say pianos are instruments because of the definition of 'piano'? Therefore, we don't need to put this information in our labels. Now inferences are slightly different. If we know that X implies Y, then if A says X we can infer that (implicitly) A says Y. That is a logical relationship (or rule) on the level of what is claimed, rather than on the level of statements. Note that we still need to have a way to find out that X implies Y, which is a content-level claim that should have its own reference somewhere. We mainly use inference in this sense with subclass of in reasonator or when checking constraints. In this case, the implications are encoded as subclass-of statements (If X is a piano, then X is an instrument). This allows us to have references on the implications. Nope, nope, nope. I was not referring to hard implications, but to heuristic ones. Consider that these properties in the item namespace: defined as a trait of defined as having defined as instance of Would translate as these constraints in the property namespace: likely to be a trait of likely to have likely to be an instance of I think you might have misunderstood my email. I was arguing *in favour* of soft constraints, but in the paragraph before the one about inferences that you reply to here. Inferences are hard ways for obtaining new knowledge from our own definitions. Example: If X is the father of Y according to reference A Then Y is the child of X according to reference A This is as hard as it can get. We are absolutely sure of this since this rule just explains the relationship between two different ways we have for encoding family relationships. Below, you said expectations inferred from definitions should not be treated as hard constraints -- maybe this mixture of terms indicates that I have not been clear enough about the distinction between inference and constraint. They are really completely different ways of looking at things. Inferences are something that adds (inevitable) conclusions to your knowledge, while constraints just tell you what to check for. If you accept the premises of an inference and the inference rule, then you must also accept the conclusion -- there is no soft way of reading this. To make it soft, you can start to formalise softness in your knowledge, using fuzzy logic or whatnot (see my other email with Thomas). I don't think we can use soft inferences (in the sense of fuzzy logic et al.) but I am in favour of soft constraints (in the sense of your expectations). I guess we agree on all of this, but have a bit of trouble in making ourselves clear :-) But it is rather subtle material after all. In general, an interesting question here is what the status of subclass of really is. Do we gather this information from external sources (surely there must be a book that tells us that pianos are instruments) or do we as a community define this for Wikidata (surely, the overall hierarchy we get is hardly the universal class hierarchy of the world but a very specific classification that is different from other classifications that may exist elsewhere)? Best not to think about it too much and to gather sources whenever we have them ;-) I think it is good to think about it and to consider options to deal with it. Like for instance:
Re: [Wikidata-l] What is the point of properties?
One other issue to bear in mind: it's *simple* to have properties as a separate thing. I have been following this discussion with some interest but... well, I don't think I'm particularly stupid, but most of it is completely above my head. Saying here are items, here are a set of properties you can define relating to them, here's some notes on how to use properties is going to get a lot more people able to contribute than if they need to start understanding theoretical aspects of semantic relationships... ;-) Andrew. On 28 May 2014 09:37, Daniel Kinzler daniel.kinz...@wikimedia.de wrote: Key differences between Properties and Items: * Properties have a data type, items don't. * Items have sitelinks, Properties don't. * Items have Statements, Properties will support Claims (without sources). The software needs these constraints/guarantees to be able to take shortcuts, provide specialized UI and API functionality, etc. Yes, it would be possible to use items as properties instead of having a separate entity type. But they are structurally and functionally different, so it makes sense to have a strict separate. This makes a lot of things easier, e.g.: * setting different permissions for properties * mapping to rdf vocabularies More fundamentally, they are semantically different: an item describes a concept in the real world, while a property is a structural component used for such a description. Yes, properies are simmilar to data items, and in some cases, there may be an item representing the same concept that is represented by a property entity. I don't see why that is a problem, while I can see a lot of confusion arising from mixing them. -- daniel Am 28.05.2014 09:25, schrieb David Cuenca: Since the very beginning I have kept myself busy with properties, thinking about which ones fit, which ones are missing to better describe reality, how integrate into the ones that we have. The thing is that the more I work with them, the less difference I see with normal items and if soon there will be statements allowed in property pages, the difference will blur even more. I can understand that from the software development point of view it might make sense to have a clear difference. Or for the community to get a deeper understanding of the underlying concepts represented by words. But semantically I see no difference between: cement (Q45190) emissivity (P1295) 0.54 and cement (Q45190) emissivity (Q899670) 0.54 Am I missing something here? Are properties really needed or are we adding unnecessary artificial constraints? Cheers, Micru ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- - Andrew Gray andrew.g...@dunelm.org.uk ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] What is the point of properties?
Héhé, the Wikidata game suggest it may be a little bit too complicated and better abstracted away by a three button game for mass contribution :) 2014-05-29 21:04 GMT+02:00 Andrew Gray andrew.g...@dunelm.org.uk: One other issue to bear in mind: it's *simple* to have properties as a separate thing. I have been following this discussion with some interest but... well, I don't think I'm particularly stupid, but most of it is completely above my head. Saying here are items, here are a set of properties you can define relating to them, here's some notes on how to use properties is going to get a lot more people able to contribute than if they need to start understanding theoretical aspects of semantic relationships... ;-) Andrew. On 28 May 2014 09:37, Daniel Kinzler daniel.kinz...@wikimedia.de wrote: Key differences between Properties and Items: * Properties have a data type, items don't. * Items have sitelinks, Properties don't. * Items have Statements, Properties will support Claims (without sources). The software needs these constraints/guarantees to be able to take shortcuts, provide specialized UI and API functionality, etc. Yes, it would be possible to use items as properties instead of having a separate entity type. But they are structurally and functionally different, so it makes sense to have a strict separate. This makes a lot of things easier, e.g.: * setting different permissions for properties * mapping to rdf vocabularies More fundamentally, they are semantically different: an item describes a concept in the real world, while a property is a structural component used for such a description. Yes, properies are simmilar to data items, and in some cases, there may be an item representing the same concept that is represented by a property entity. I don't see why that is a problem, while I can see a lot of confusion arising from mixing them. -- daniel Am 28.05.2014 09:25, schrieb David Cuenca: Since the very beginning I have kept myself busy with properties, thinking about which ones fit, which ones are missing to better describe reality, how integrate into the ones that we have. The thing is that the more I work with them, the less difference I see with normal items and if soon there will be statements allowed in property pages, the difference will blur even more. I can understand that from the software development point of view it might make sense to have a clear difference. Or for the community to get a deeper understanding of the underlying concepts represented by words. But semantically I see no difference between: cement (Q45190) emissivity (P1295) 0.54 and cement (Q45190) emissivity (Q899670) 0.54 Am I missing something here? Are properties really needed or are we adding unnecessary artificial constraints? Cheers, Micru ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- - Andrew Gray andrew.g...@dunelm.org.uk ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] What is the point of properties?
Hoi, In OmegaWiki we made the choice that any defined meaning can be used as a property. This makes OmegaWiki more like a Wiki than Wikidata were properties have to be created by fiat. What was found is that people tend to not abuse this and there is a limited set that is used as properties. When you do not insist on the artificial limits implicit in properties, there will be one victim; it is the structure of the ontology. However when you analyse things, such a structure still exists it is just no longer formal. In a way it is similar to the early insistence on using the GND types, they did not fit but thankfully we kept the GND identifier in this way we left the structure of GND where it belonged; in GND itself. They can map to their hearts content our content using their structure. One final thought, when we have enough data, we can manipulate it. Because of a lack of data we are still left with many GND types. PS there is nothing wrong in leaving things as they are.. It works more or less. Thanks, Geard On 28 May 2014 09:25, David Cuenca dacu...@gmail.com wrote: Since the very beginning I have kept myself busy with properties, thinking about which ones fit, which ones are missing to better describe reality, how integrate into the ones that we have. The thing is that the more I work with them, the less difference I see with normal items and if soon there will be statements allowed in property pages, the difference will blur even more. I can understand that from the software development point of view it might make sense to have a clear difference. Or for the community to get a deeper understanding of the underlying concepts represented by words. But semantically I see no difference between: cement (Q45190) emissivity (P1295) 0.54 and cement (Q45190) emissivity (Q899670) 0.54 Am I missing something here? Are properties really needed or are we adding unnecessary artificial constraints? Cheers, Micru ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] What is the point of properties?
Key differences between Properties and Items: * Properties have a data type, items don't. * Items have sitelinks, Properties don't. * Items have Statements, Properties will support Claims (without sources). The software needs these constraints/guarantees to be able to take shortcuts, provide specialized UI and API functionality, etc. Yes, it would be possible to use items as properties instead of having a separate entity type. But they are structurally and functionally different, so it makes sense to have a strict separate. This makes a lot of things easier, e.g.: * setting different permissions for properties * mapping to rdf vocabularies More fundamentally, they are semantically different: an item describes a concept in the real world, while a property is a structural component used for such a description. Yes, properies are simmilar to data items, and in some cases, there may be an item representing the same concept that is represented by a property entity. I don't see why that is a problem, while I can see a lot of confusion arising from mixing them. -- daniel Am 28.05.2014 09:25, schrieb David Cuenca: Since the very beginning I have kept myself busy with properties, thinking about which ones fit, which ones are missing to better describe reality, how integrate into the ones that we have. The thing is that the more I work with them, the less difference I see with normal items and if soon there will be statements allowed in property pages, the difference will blur even more. I can understand that from the software development point of view it might make sense to have a clear difference. Or for the community to get a deeper understanding of the underlying concepts represented by words. But semantically I see no difference between: cement (Q45190) emissivity (P1295) 0.54 and cement (Q45190) emissivity (Q899670) 0.54 Am I missing something here? Are properties really needed or are we adding unnecessary artificial constraints? Cheers, Micru ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] What is the point of properties?
Hi David, Interesting remark. Let's explore this idea a bit. I will give you two main reasons why we have properties separate, one practical and one conceptual. First the practical point. Certainly, everything that is used as a property needs to have a datatype, since otherwise the wiki would not know what kind of input UI to show. So you cannot use just any item as a property straight away -- it needs to have a datatype first. So, yes, you could abolish the namespace Property but you still would have a clear, crisp distinction between property items (those with datatype) and normal items (those without a datatype). Because of this, most of the other functions would work the same as before (for example, property autocompletion would still only show properties, not arbitrary items). A complication with this approach is that property datatypes cannot change in Wikibase. This design was picked since there is no way to convert existing data from one datatype to another in general. So changing the datatype would create problems by making a lot of data invalid, and require special handling and special UI to handle this situation. With properties living in a separate namespace, this is not a real restriction: you can just create a new property and give it the same label (after naming the old one differently, e.g., putting DEPRECATED in its name). Then you can migrate the data in some custom fashion. But if properties would be items, we would have a problem here: the item is already linked to many Wikipedias and other projects, and it might be used in LUA scripts, queries, or even external applications like Denny's Javascript translation library. You cannot change item ids easily. Also, many items would not have a datatype, so the first one who (accidentally?) is entered will be fixed. So we would definitely need to rethink the whole idea of unchangeable datatypes. My other important reason is conceptual. Properties are not considered part of the (encyclopaedic) data but rather part of the schema that the community has picked to organise that data. As in your example, emissivity (Q899670) is a notion in physics as described in a Wikipedia article. There are many things to say about this notion (for example, it has a history: somebody must have defined this first -- although Wikipedia does not say it in this case). As in all cases, some statements might be disputed while others are widely acknowledged to be true. For the property emissivity (P1295), the situation is quite different. It was introduced as an element used to enter data, similar to a row in a database table or an infobox template in some Wikipedia. It does probably closely relate to the actual physical notion Q899670, but it still is a different thing. For example, it was first introduced by User:Jakec, who is probably not the person who introduced the physical concept ;-) Anything that we will say about P1295 in the future refers to the property -- a concept of our own making, that is not described in any external source (there are no publications discussing P1295). This is also the reason why properties are supposed to support *claims* not *statements*. That is, they will have property-value pairs and qualifiers, but no references or ranks. Indeed, anything we say about properties has the status of a definition. If we say it, it's true. There is no other authority on Wikidata properties. You could of course still have items and properties share a page and somehow define which statements/claims refer to which concept, but this does not seem to make things easier for users. These are, for me, the two main reasons why it makes sense to keep properties apart from items on a technical level. Besides this, it is also convenient to separate the 1000-something properties from the 15-million something items for reasons of maintenance. Best regards, Markus On 28/05/14 09:25, David Cuenca wrote: Since the very beginning I have kept myself busy with properties, thinking about which ones fit, which ones are missing to better describe reality, how integrate into the ones that we have. The thing is that the more I work with them, the less difference I see with normal items and if soon there will be statements allowed in property pages, the difference will blur even more. I can understand that from the software development point of view it might make sense to have a clear difference. Or for the community to get a deeper understanding of the underlying concepts represented by words. But semantically I see no difference between: cement (Q45190) emissivity (P1295) 0.54 and cement (Q45190) emissivity (Q899670) 0.54 Am I missing something here? Are properties really needed or are we adding unnecessary artificial constraints? Cheers, Micru ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org
Re: [Wikidata-l] What is the point of properties?
On 28/05/14 10:37, Daniel Kinzler wrote: Key differences between Properties and Items: * Properties have a data type, items don't. * Items have sitelinks, Properties don't. * Items have Statements, Properties will support Claims (without sources). The software needs these constraints/guarantees to be able to take shortcuts, provide specialized UI and API functionality, etc. Yes, it would be possible to use items as properties instead of having a separate entity type. But they are structurally and functionally different, so it makes sense to have a strict separate. This makes a lot of things easier, e.g.: * setting different permissions for properties * mapping to rdf vocabularies This one point requires a tiny remark: there is no problem in OWL or RDF to use the same URI as a property, an individual, and a class in different contexts. The only thing that OWL (DL) forbids is to use one property for literal values (like string) and for object values (like other items), but this would not occur in our case anyway since we have clearly defined types. I completely agree with all the rest :-) Cheers, Markus More fundamentally, they are semantically different: an item describes a concept in the real world, while a property is a structural component used for such a description. Yes, properies are simmilar to data items, and in some cases, there may be an item representing the same concept that is represented by a property entity. I don't see why that is a problem, while I can see a lot of confusion arising from mixing them. -- daniel Am 28.05.2014 09:25, schrieb David Cuenca: Since the very beginning I have kept myself busy with properties, thinking about which ones fit, which ones are missing to better describe reality, how integrate into the ones that we have. The thing is that the more I work with them, the less difference I see with normal items and if soon there will be statements allowed in property pages, the difference will blur even more. I can understand that from the software development point of view it might make sense to have a clear difference. Or for the community to get a deeper understanding of the underlying concepts represented by words. But semantically I see no difference between: cement (Q45190) emissivity (P1295) 0.54 and cement (Q45190) emissivity (Q899670) 0.54 Am I missing something here? Are properties really needed or are we adding unnecessary artificial constraints? Cheers, Micru ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] What is the point of properties?
On Wed, May 28, 2014 at 10:37 AM, Daniel Kinzler daniel.kinz...@wikimedia.de wrote: More fundamentally, they are semantically different: an item describes a concept in the real world, while a property is a structural component used for such a description. As I perceive it, a property is a normal item (concept) imbued with the option to use it as predicate and allow it to use different datatypes. There is no property that cannot be expressed as an item, even properties that represent an identifier, they also could be said that they are a concept in the real world. I understand that from the software side you need to make a difference between basic concepts (items) and concepts that can be used as predicates (properties). From the community side we also need to scrutinize and rinse the concepts that hide behind the words before using them as predicates, but sometimes it is good to stop and consider what are we really doing. Yes, properies are simmilar to data items, and in some cases, there may be an item representing the same concept that is represented by a property entity. I haven't found yet a property that couldn't be expressed as an item. I don't see why that is a problem, while I can see a lot of confusion arising from mixing them. It is not a problem now but I considered interesting to analyze what is the substance of the distinction. If properties and concepts are separate in the end we will be reproducing their ontological structure when organizing them. So then it might not make sense to use subproperty of to organize properties, but just corresponds to item. Gerard, thanks for bringing the example of OmegaWiki, it is interesting that two independent communities came to the same thoughts without any contact between them :) Cheers, Micru ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] What is the point of properties?
Am 28.05.2014 11:44, schrieb Markus Krötzsch: This one point requires a tiny remark: there is no problem in OWL or RDF to use the same URI as a property, an individual, and a class in different contexts. The only thing that OWL (DL) forbids is to use one property for literal values (like string) and for object values (like other items), but this would not occur in our case anyway since we have clearly defined types. I completely agree with all the rest :-) Yea, I didn't mean to say that there is an issue with representing this in RDF, but with mapping to RDF vocabularies. Having a relatively limited and stable set of properties to map makes that a lot easier. -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] What is the point of properties?
Markus, The explanation about the implications of renaming/deleting makes most sense and just that justifies already the separation in two. It is equally true that when we create a property, we might have cleaned the original concept so much that it might differ (even slightly) with the understood concept that the item represents. However, even after that process, the new concept is still an item... The process of imbuing a concept with permanent characteristics (adding a datatype) and the practical approach, also seems to recommend keeping items and properties separate. Thanks for showing me that reasoning :) I am still wondering about how are we going to classify properties. Maybe it will require a broader discussion, but if they are the same (or mostly the same) as items, then we can just link them as same as, and build the classing structure just for the items. OTOH, if they are different, then we will need to mirror that classification for properties, which seems quite redundant. Plus adding a new datatype, property. All in all, my conclusion about this is that properties are just concepts with special qualities that justify the separation in the software (even if in real life there is no separation). many thanks for your detailed answer, and sorry if I'm bringing up already discussed topics. It is just that when you stare long into wikidata, wikidata stares back into you ;) Cheers, Micru On Wed, May 28, 2014 at 11:39 AM, Markus Krötzsch mar...@semantic-mediawiki.org wrote: Hi David, Interesting remark. Let's explore this idea a bit. I will give you two main reasons why we have properties separate, one practical and one conceptual. First the practical point. Certainly, everything that is used as a property needs to have a datatype, since otherwise the wiki would not know what kind of input UI to show. So you cannot use just any item as a property straight away -- it needs to have a datatype first. So, yes, you could abolish the namespace Property but you still would have a clear, crisp distinction between property items (those with datatype) and normal items (those without a datatype). Because of this, most of the other functions would work the same as before (for example, property autocompletion would still only show properties, not arbitrary items). A complication with this approach is that property datatypes cannot change in Wikibase. This design was picked since there is no way to convert existing data from one datatype to another in general. So changing the datatype would create problems by making a lot of data invalid, and require special handling and special UI to handle this situation. With properties living in a separate namespace, this is not a real restriction: you can just create a new property and give it the same label (after naming the old one differently, e.g., putting DEPRECATED in its name). Then you can migrate the data in some custom fashion. But if properties would be items, we would have a problem here: the item is already linked to many Wikipedias and other projects, and it might be used in LUA scripts, queries, or even external applications like Denny's Javascript translation library. You cannot change item ids easily. Also, many items would not have a datatype, so the first one who (accidentally?) is entered will be fixed. So we would definitely need to rethink the whole idea of unchangeable datatypes. My other important reason is conceptual. Properties are not considered part of the (encyclopaedic) data but rather part of the schema that the community has picked to organise that data. As in your example, emissivity (Q899670) is a notion in physics as described in a Wikipedia article. There are many things to say about this notion (for example, it has a history: somebody must have defined this first -- although Wikipedia does not say it in this case). As in all cases, some statements might be disputed while others are widely acknowledged to be true. For the property emissivity (P1295), the situation is quite different. It was introduced as an element used to enter data, similar to a row in a database table or an infobox template in some Wikipedia. It does probably closely relate to the actual physical notion Q899670, but it still is a different thing. For example, it was first introduced by User:Jakec, who is probably not the person who introduced the physical concept ;-) Anything that we will say about P1295 in the future refers to the property -- a concept of our own making, that is not described in any external source (there are no publications discussing P1295). This is also the reason why properties are supposed to support *claims* not *statements*. That is, they will have property-value pairs and qualifiers, but no references or ranks. Indeed, anything we say about properties has the status of a definition. If we say it, it's true. There is no other authority on Wikidata properties. You could of course
Re: [Wikidata-l] What is the point of properties?
David, Regarding the question of how to classify properties and how to relate them to items: * same as (in the sense of owl:sameAs) is not the right concept here. In fact, it has often been discouraged to use this on the Web, since it has very strong implications: it means that in all uses of the one identifier, one could just as well use the other identifier, and that it is indistinguishable if something has been said about the one or the other. That seems too strong here, at least for most cases. * In the world of OWL DL, sameAs specifically refers to individuals, not to classes or properties. Saying P sameAs Q does not imply that P and Q have the same extension as properties. For the latter, OWL has the relationship owl:equivalentProperties. This distinction of instance level and schema level is similar to the distinction we have between instance of and subclass of. * Therefore, I would suggest to use a property called subproperty of as one way of relating properties (analogously to subclass of). It has to be checked if this actually occurs in Wikidata (do we have any properties that would be in this relation, or do we make it a modelling principle to have only the most specific properties in Wikidata?). * The relationship from properties to items could be modelled with the existing property subject of (P805). * It might be useful to also have a taxonomic classification of properties. For example, we already group properties into properties for people, organisations, etc. Such information could also be added with a specific property (this would be a bit more like a category system on property pages). On the other hand, some of this might coincide with constraint information that could be expressed as claims. For instance, person properties might be those with Type (i.e., rdfs:domain) constraint human. By the way, our constraint system could use some systematisation -- there are many overlaps in what you can do with one constraint or another. Cheers, Markus On 28/05/14 12:14, David Cuenca wrote: Markus, The explanation about the implications of renaming/deleting makes most sense and just that justifies already the separation in two. It is equally true that when we create a property, we might have cleaned the original concept so much that it might differ (even slightly) with the understood concept that the item represents. However, even after that process, the new concept is still an item... The process of imbuing a concept with permanent characteristics (adding a datatype) and the practical approach, also seems to recommend keeping items and properties separate. Thanks for showing me that reasoning :) I am still wondering about how are we going to classify properties. Maybe it will require a broader discussion, but if they are the same (or mostly the same) as items, then we can just link them as same as, and build the classing structure just for the items. OTOH, if they are different, then we will need to mirror that classification for properties, which seems quite redundant. Plus adding a new datatype, property. All in all, my conclusion about this is that properties are just concepts with special qualities that justify the separation in the software (even if in real life there is no separation). many thanks for your detailed answer, and sorry if I'm bringing up already discussed topics. It is just that when you stare long into wikidata, wikidata stares back into you ;) Cheers, Micru On Wed, May 28, 2014 at 11:39 AM, Markus Krötzsch mar...@semantic-mediawiki.org mailto:mar...@semantic-mediawiki.org wrote: Hi David, Interesting remark. Let's explore this idea a bit. I will give you two main reasons why we have properties separate, one practical and one conceptual. First the practical point. Certainly, everything that is used as a property needs to have a datatype, since otherwise the wiki would not know what kind of input UI to show. So you cannot use just any item as a property straight away -- it needs to have a datatype first. So, yes, you could abolish the namespace Property but you still would have a clear, crisp distinction between property items (those with datatype) and normal items (those without a datatype). Because of this, most of the other functions would work the same as before (for example, property autocompletion would still only show properties, not arbitrary items). A complication with this approach is that property datatypes cannot change in Wikibase. This design was picked since there is no way to convert existing data from one datatype to another in general. So changing the datatype would create problems by making a lot of data invalid, and require special handling and special UI to handle this situation. With properties living in a separate namespace, this is not a real restriction: you can just create a new property and give it the same label
Re: [Wikidata-l] What is the point of properties?
On Wed, May 28, 2014 at 2:48 PM, Markus Krötzsch mar...@semantic-mediawiki.org wrote: David, Regarding the question of how to classify properties and how to relate them to items: * same as (in the sense of owl:sameAs) is not the right concept here. In fact, it has often been discouraged to use this on the Web, since it has very strong implications: it means that in all uses of the one identifier, one could just as well use the other identifier, and that it is indistinguishable if something has been said about the one or the other. That seems too strong here, at least for most cases. * In the world of OWL DL, sameAs specifically refers to individuals, not to classes or properties. Saying P sameAs Q does not imply that P and Q have the same extension as properties. For the latter, OWL has the relationship owl:equivalentProperties. This distinction of instance level and schema level is similar to the distinction we have between instance of and subclass of. * Therefore, I would suggest to use a property called subproperty of as one way of relating properties (analogously to subclass of). It has to be checked if this actually occurs in Wikidata (do we have any properties that would be in this relation, or do we make it a modelling principle to have only the most specific properties in Wikidata?). * The relationship from properties to items could be modelled with the existing property subject of (P805). * It might be useful to also have a taxonomic classification of properties. For example, we already group properties into properties for people, organisations, etc. Such information could also be added with a specific property (this would be a bit more like a category system on property pages). Yes. That's the way forward for now. On the other hand, some of this might coincide with constraint information that could be expressed as claims. For instance, person properties might be those with Type (i.e., rdfs:domain) constraint human. By the way, our constraint system could use some systematisation -- there are many overlaps in what you can do with one constraint or another. I hope to have a team of students work on improving constraints reports and everything around it later in the year. It'll depend on if they pick this project though. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] What is the point of properties?
Markus, Ok, now I understand that same as wouldn't be a good name for the confusion it would cause. However the property subject of as it is now wouldn't be a good candidate either. Its meaning is that a certain statement is represented by another item (that is why it is only allowed to be used as qualifier). Perhaps a better name would be corresponds with item and the inverse corresponds with property. Just by having these connections, a lot of information can be inferred from the connected item. Consider the following example with occupation (P106), and occupation (Q13516667): - I cannot find any clear subproperty of for p106, but there is a clear subclass of:human behaviour for the item - human behaviour is part of human - human can have a statement intrinsic property (property proposal still under discussion) with values birthday (Q47223) and an (eventual) date of death. It can be expanded in the future to include newly created properties like height, weight, eye color, etc - birthday (Q47223) corresponds with property date of birth (P569) Out of this I reach the following conclusions: - the taxonomy of properties is going to be weak, since there is not always a clear subpropertyOf unless created artificially (more work) - the standard taxonomy of items (subclass of/part of) is sufficient to automatically reach meaningful constraints and inference (less work) - by adding manually the constraints to the property itself we are duplicating information which will require volunteer effort to maintain (more work) My recommendation is to rely mainly on the main taxonomy instead of creating a parallel property taxonomy, and then think of ways to extract information from the main taxonomy to convert it automatically into constraints. All the maintenance takes effort, so the more it can be automated, the more efficient volunteers will be. And if we can simplify the maintenance of properties, we will be able to simplify the creation of properties too, specially when we face the next surge which will come with the datatype number with units. Cheers, Micru On Wed, May 28, 2014 at 2:48 PM, Markus Krötzsch mar...@semantic-mediawiki.org wrote: David, Regarding the question of how to classify properties and how to relate them to items: * same as (in the sense of owl:sameAs) is not the right concept here. In fact, it has often been discouraged to use this on the Web, since it has very strong implications: it means that in all uses of the one identifier, one could just as well use the other identifier, and that it is indistinguishable if something has been said about the one or the other. That seems too strong here, at least for most cases. * In the world of OWL DL, sameAs specifically refers to individuals, not to classes or properties. Saying P sameAs Q does not imply that P and Q have the same extension as properties. For the latter, OWL has the relationship owl:equivalentProperties. This distinction of instance level and schema level is similar to the distinction we have between instance of and subclass of. * Therefore, I would suggest to use a property called subproperty of as one way of relating properties (analogously to subclass of). It has to be checked if this actually occurs in Wikidata (do we have any properties that would be in this relation, or do we make it a modelling principle to have only the most specific properties in Wikidata?). * The relationship from properties to items could be modelled with the existing property subject of (P805). * It might be useful to also have a taxonomic classification of properties. For example, we already group properties into properties for people, organisations, etc. Such information could also be added with a specific property (this would be a bit more like a category system on property pages). On the other hand, some of this might coincide with constraint information that could be expressed as claims. For instance, person properties might be those with Type (i.e., rdfs:domain) constraint human. By the way, our constraint system could use some systematisation -- there are many overlaps in what you can do with one constraint or another. Cheers, Markus On 28/05/14 12:14, David Cuenca wrote: Markus, The explanation about the implications of renaming/deleting makes most sense and just that justifies already the separation in two. It is equally true that when we create a property, we might have cleaned the original concept so much that it might differ (even slightly) with the understood concept that the item represents. However, even after that process, the new concept is still an item... The process of imbuing a concept with permanent characteristics (adding a datatype) and the practical approach, also seems to recommend keeping items and properties separate. Thanks for showing me that reasoning :) I am still wondering about how are we going to classify properties. Maybe it will require a broader
Re: [Wikidata-l] What is the point of properties?
David, On 28/05/14 16:35, David Cuenca wrote: Markus, Ok, now I understand that same as wouldn't be a good name for the confusion it would cause. However the property subject of as it is now wouldn't be a good candidate either. Its meaning is that a certain statement is represented by another item (that is why it is only allowed to be used as qualifier). Ok. Perhaps a better name would be corresponds with item and the inverse corresponds with property. Just by having these connections, a lot of information can be inferred from the connected item. Consider the following example with occupation (P106), and occupation (Q13516667): - I cannot find any clear subproperty of for p106, but there is a clear subclass of:human behaviour for the item - human behaviour is part of human I don't understand this use of part of. Maybe I would say having an occupation is part of being human but not that occupation is part of human. I would not use either of these and restrict part of to clear, undisputed statements like the steering wheel is part of the car. Otherwise, anything could be part of human (head?, sadness?, singing?, birth? -- entering this in Wikidata would not lead anywhere). Part of is quite problematic in general. You can see it from the discussion on its property page, and also from the uses it sees in the wiki, that this property is severely misunderstood and/or misused. At the very least, one should distinguish physical part of from meronym (both are aliases of the property now!). And then one should realise that meronyms are in the domain of Wiktionary, which we cannot capture in Wikidata properly since we do not have items for words but for concepts. One alias for an item might be a meronym of something else, while another alias for the same item is not. Using statements for linguistic properties in Wikidata will not be successful. I am not saying that Wikibase is not able to capture some ideas of a thesaurus (we have actually discussed this), but this is not how it is used in Wikidata. - human can have a statement intrinsic property (property proposal still under discussion) with values birthday (Q47223) and an (eventual) date of death. It can be expanded in the future to include newly created properties like height, weight, eye color, etc Yes, this again makes sense to me. It is basically a variant of the constraint Item which allows you to say that items that are instance of human should also have a birthday. But again, this is schematic information (like constraints) and it should not be mixed up with actual data. It is the same conceptual difference that I have explained for properties vs. items earlier. Moreover, I think this information (even if correct in some sense) has very little utility as a piece of information about an item; it is much more useful for constraints about properties (which are not items). - birthday (Q47223) corresponds with property date of birth (P569) It should be the other way around: the correspondence says something about P569, not about Q47223. There cannot be any reference for this. It should therefore be a claim on the page of P569 rather than a statement on the page of Q47223. Out of this I reach the following conclusions: - the taxonomy of properties is going to be weak, since there is not always a clear subpropertyOf unless created artificially (more work) I agree. - the standard taxonomy of items (subclass of/part of) is sufficient to automatically reach meaningful constraints and inference (less work) I agree that the taxonomy will be helpful in constraints. This is what constraints already do when using instance of/subclass of. However, I do not agree that the constraints can or should be stated as part of this taxonomy. Constraints are too complex, and they are conceptually different (they say how a property should be used, not how something in the Real World relates to something else). Constraints interact nicely with the taxonomy and help to get useful conclusions, but they are not part of taxonomy ;-). We must keep content organisation separate from content. - by adding manually the constraints to the property itself we are duplicating information which will require volunteer effort to maintain (more work) I disagree. Constraints refer to the property, not to the Wikidata item, and it would be conceptually wrong to mix these things up. We already have agreed that properties and items need to remain distinct for technical reasons. Once this is clear, there is no reason to move information that refers to properties (constraints) to item pages. This will not be a duplication of information: it is enough to have the constraints on the property pages only. If you look at the constraints we have, you can see many examples that are specific to Wikidata and certainly not a general thing about the concept (take the allowed values for sex or gender). We really want to keep editorial
Re: [Wikidata-l] What is the point of properties?
Markus, I share your dissatisfaction with part of because that language construct hides many different conceptual relationships that should be cleared out, I think we'll have some community discussion work to do in that regard. One of the uses is: what is the relationship between a human and his behavior? I would say that the human has been defined as having human behavior (or the reverse). But if you have a better suggestion to express this concept I would be really glad to hear it. Now that you mention it, yes, I agree that only a property called corresponds with item makes sense in this context, but not the inverse. I would like to make a further distinction regarding constraints. The nature of constraints is not to set arbitrary limits but to reflect patterns that naturally appear in concepts. On that regard, I hate the word constraint, because it means that we are placing a straitjacket on reality, when it is the other way round, recurring patterns in the real world make us expect that a value will fall within the bonds of our expectations. I think that we should seriously consider using the term expectation from now on because we don't constrain the values per se, we expect them to have a value, and when the value departs from the expected value, then it sets an alarm that might reflect an error or not. Once made that distinction, yes, you are right, considering that we are separating properties and items, our expectations do not belong to the data itself, they belong to the property. However, I would like to go to bring the conversation to a deeper level. What is that what makes the concept of addition (Q32043) to be that? What is in physical object (Q223557) that we, sentient beings, can perceive and agree to treat as a concept? I mention those two because one is purely abstract, and the other one is purely physical. And I would say that addition (Q32043) has been defined as having associativity (Q177251) and physical object (Q223557) has been repeatedly observed to have density (Q29539). We can argue whether the second is an expectation or not, but the first is definitely not, someone defined an addition like that and this information can be sourced. Even more, we could also say that also physical object (Q223557) has been defined as having density (Q29539), and I guess we could find sources for that statement too. With all this I want to make the point that there are two sources of expectations: - from our experience seeing repetitions and patterns in the values (male/female/etc between 10 and 50), which belong to the property - from the agreed definition of the concept itself, which belong to the data Cheers, Micru PS: this is a re-post because my previous message was bounced back for being too long :) ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] What is the point of properties?
Hi, for the behavior, I would said a behavior may be linked to a psychological trait. I's say a behavior is defined by the person having a lot of acts belonging to a typical class of events. someone is said to be aggressive if typically when he acts as hostile in many situations. I remember a theory about that : https://en.wikipedia.org/wiki/Trait_theory :) 2014-05-28 20:46 GMT+02:00 David Cuenca dacu...@gmail.com: Markus, I share your dissatisfaction with part of because that language construct hides many different conceptual relationships that should be cleared out, I think we'll have some community discussion work to do in that regard. One of the uses is: what is the relationship between a human and his behavior? I would say that the human has been defined as having human behavior (or the reverse). But if you have a better suggestion to express this concept I would be really glad to hear it. Now that you mention it, yes, I agree that only a property called corresponds with item makes sense in this context, but not the inverse. I would like to make a further distinction regarding constraints. The nature of constraints is not to set arbitrary limits but to reflect patterns that naturally appear in concepts. On that regard, I hate the word constraint, because it means that we are placing a straitjacket on reality, when it is the other way round, recurring patterns in the real world make us expect that a value will fall within the bonds of our expectations. I think that we should seriously consider using the term expectation from now on because we don't constrain the values per se, we expect them to have a value, and when the value departs from the expected value, then it sets an alarm that might reflect an error or not. Once made that distinction, yes, you are right, considering that we are separating properties and items, our expectations do not belong to the data itself, they belong to the property. However, I would like to go to bring the conversation to a deeper level. What is that what makes the concept of addition (Q32043) to be that? What is in physical object (Q223557) that we, sentient beings, can perceive and agree to treat as a concept? I mention those two because one is purely abstract, and the other one is purely physical. And I would say that addition (Q32043) has been defined as having associativity (Q177251) and physical object (Q223557) has been repeatedly observed to have density (Q29539). We can argue whether the second is an expectation or not, but the first is definitely not, someone defined an addition like that and this information can be sourced. Even more, we could also say that also physical object (Q223557) has been defined as having density (Q29539), and I guess we could find sources for that statement too. With all this I want to make the point that there are two sources of expectations: - from our experience seeing repetitions and patterns in the values (male/female/etc between 10 and 50), which belong to the property - from the agreed definition of the concept itself, which belong to the data Cheers, Micru PS: this is a re-post because my previous message was bounced back for being too long :) ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] What is the point of properties?
David, One of the uses is: what is the relationship between a human and his behavior? This is an easy question once you have been clear about what human behaviour is. According to enwiki, it is a range of behaviours *exhibited by* humans. The bigger question for me is, whether it is useful to record this relationship (exhibited by) in Wikidata. What would anybody do with this data? In what application could it be of interest? Moreover, as a great Icelandic ontologist once said: There is definitely, definitely, definitely no logic, to human behaviour ;-) On that regard, I hate the word constraint, because it means that we are placing a straitjacket on reality, when it is the other way round, recurring patterns in the real world make us expect that a value will fall within the bonds of our expectations. I think constraints are already understood in this way. The name comes from databases, where a constraint violation is indeed a rather hard error. On the other hand, ironically, constraints (as a technical term) are often considered to be a softer form of modelling than (onto)logical axioms: a constraint can be violated while a logical axiom (as the name suggests) is always true -- if it is not backed by the given data, new data will be inferred. So as a technical term, constraint is quite appropriate for the mechanism we have, although it may not be the best term to clarify the intention. However, I would like to go to bring the conversation to a deeper level. ... With all this I want to make the point that there are two sources of expectations: - from our experience seeing repetitions and patterns in the values (male/female/etc between 10 and 50), which belong to the property - from the agreed definition of the concept itself, which belong to the data Yes. I agree with this as a basic dichotomy of things we may want to record in Wikidata. Some things are true by definition, while others are just very likely by observation. The exact population of Paris we will never know, but we are completely sure that a piano is an instrument. (Maybe somebody with a better philosophical background than me could give a better perspective of these notions -- analytical vs. empirical come to mind, but I am sure there is more.) Some important ideas like classification (instance of/subclass of) belong completely to the analytical realm. We don't observe classes, we define them. A planet is what we call a planet, and this can change even if the actual lumps in space are pretty much the same. However, there is yet a deeper level here (you asked for it ;-). Wikidata is not about facts but about statements with references. We do not record Pluto was a planet until 2006 but Pluto was a planet until 2006 *according to the IAU*. Likewise, we don't say Berlin has 3 million inhabitants but Berlin has 3 million inhabitants *according to the Amt fuer Statistik Berlin-Brandenburg*. If you compare these two statements, you can see that they are both empirical, based on our observation of a particular reference. We do not have analytical knowledge of what the IAU or the Amt fuer Statistic might say. So in this sense constraints can only ever be rough guidelines. It does not make logical sense to say if source A says X then source B must say Y -- even if we know that X implies Y (maybe by definition), we don't know what sources A and B say. All we can do with constraints it to uncover possible contradictions between sources, which might then be looked into. Now inferences are slightly different. If we know that X implies Y, then if A says X we can infer that (implicitly) A says Y. That is a logical relationship (or rule) on the level of what is claimed, rather than on the level of statements. Note that we still need to have a way to find out that X implies Y, which is a content-level claim that should have its own reference somewhere. We mainly use inference in this sense with subclass of in reasonator or when checking constraints. In this case, the implications are encoded as subclass-of statements (If X is a piano, then X is an instrument). This allows us to have references on the implications. In general, an interesting question here is what the status of subclass of really is. Do we gather this information from external sources (surely there must be a book that tells us that pianos are instruments) or do we as a community define this for Wikidata (surely, the overall hierarchy we get is hardly the universal class hierarchy of the world but a very specific classification that is different from other classifications that may exist elsewhere)? Best not to think about it too much and to gather sources whenever we have them ;-) Besides these two notions (constraints to uncover inconsistent references, and logical axioms to derive new statements from given ones), there is also a third type of constraint that is purely analytical. If we *define* that our
Re: [Wikidata-l] What is the point of properties?
Markus, On Thu, May 29, 2014 at 12:53 AM, Markus Krötzsch mar...@semantic-mediawiki.org wrote: This is an easy question once you have been clear about what human behaviour is. According to enwiki, it is a range of behaviours *exhibited by* humans. Settled :) Let's leave it at defined as a trait of What would anybody do with this data? In what application could it be of interest? Well, our goal it to gather the whole human knowledge, not to use it. I can think of several applications, but let's leave that open. Never underestimate human creativity ;-) Moreover, as a great Icelandic ontologist once said: There is definitely, definitely, definitely no logic, to human behaviour ;-) Definitely, that is why we spend so much time in front of flickering squares making them flicker even more. It makes total sense :P I think constraints are already understood in this way. The name comes from databases, where a constraint violation is indeed a rather hard error. On the other hand, ironically, constraints (as a technical term) are often considered to be a softer form of modelling than (onto)logical axioms: a constraint can be violated while a logical axiom (as the name suggests) is always true -- if it is not backed by the given data, new data will be inferred. So as a technical term, constraint is quite appropriate for the mechanism we have, although it may not be the best term to clarify the intention. Ok, I will not fight traditional labels nor conventions. I was interested in pointing out to the inappropriateness of using a word inside our community with a definition that doesn't matches its use, when there is another word that matches perfectly and conveys its meaning better to users. Some important ideas like classification (instance of/subclass of) belong completely to the analytical realm. We don't observe classes, we define them. A planet is what we call a planet, and this can change even if the actual lumps in space are pretty much the same. Agreed. Better labels could be defined as instance of/defined as subclass of Now inferences are slightly different. If we know that X implies Y, then if A says X we can infer that (implicitly) A says Y. That is a logical relationship (or rule) on the level of what is claimed, rather than on the level of statements. Note that we still need to have a way to find out that X implies Y, which is a content-level claim that should have its own reference somewhere. We mainly use inference in this sense with subclass of in reasonator or when checking constraints. In this case, the implications are encoded as subclass-of statements (If X is a piano, then X is an instrument). This allows us to have references on the implications. Nope, nope, nope. I was not referring to hard implications, but to heuristic ones. Consider that these properties in the item namespace: defined as a trait of defined as having defined as instance of Would translate as these constraints in the property namespace: likely to be a trait of likely to have likely to be an instance of In general, an interesting question here is what the status of subclass of really is. Do we gather this information from external sources (surely there must be a book that tells us that pianos are instruments) or do we as a community define this for Wikidata (surely, the overall hierarchy we get is hardly the universal class hierarchy of the world but a very specific classification that is different from other classifications that may exist elsewhere)? Best not to think about it too much and to gather sources whenever we have them ;-) I think it is good to think about it and to consider options to deal with it. Like for instance: defined as instance of corresponds with item Wikimedia community concept We already have items that refer to concepts that only make sense for us, so no change in that regard. At the moment, hard constraints (from definitions) and soft constraints (expectations) are simply mixed, and maybe this is fine since we handle them in a similar fashion (humans need to look how to fix the situation). Most constraints, even those that refer to definitions, are rather soft anyway since we apply them to statements, not to hard facts. Hard constraints can only occur in cases where the *encoding* of a statement in Wikidata is wrong (not the intended statement as such, but how it was translated to data). As explained above, expectations inferred from definitions should not be treated as hard constraints, but as soft ones. Micru ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l