David,

On 28/05/14 16:35, David Cuenca wrote:
Markus,

Ok, now I understand that "same as" wouldn't be a good name for the
confusion it would cause. However the property "subject of" as it is now
wouldn't be a good candidate either. Its meaning is that a certain
statement is represented by another item (that is why it is only allowed
to be used as qualifier).

Ok.


Perhaps a better name would be "corresponds with item" and the inverse
"corresponds with property". Just by having these connections, a lot of
information can be inferred from the connected item.

Consider the following example with "occupation (P106)", and "occupation
(Q13516667)":
- I cannot find any clear "subproperty of" for p106, but there is a
clear "subclass of:human behaviour" for the item
- "human behaviour" is "part of" human

I don't understand this use of "part of". Maybe I would say "having an occupation is part of being human" but not that "occupation is part of human". I would not use either of these and restrict "part of" to clear, undisputed statements like "the steering wheel is part of the car". Otherwise, anything could be part of human ("head"?, "sadness"?, "singing"?, "birth"? -- entering this in Wikidata would not lead anywhere).

"Part of" is quite problematic in general. You can see it from the discussion on its property page, and also from the uses it sees in the wiki, that this property is severely misunderstood and/or misused. At the very least, one should distinguish "physical part of" from "meronym" (both are aliases of the property now!). And then one should realise that meronyms are in the domain of Wiktionary, which we cannot capture in Wikidata properly since we do not have items for words but for concepts. One alias for an item might be a meronym of something else, while another alias for the same item is not. Using statements for linguistic properties in Wikidata will not be successful. I am not saying that Wikibase is not able to capture some ideas of a thesaurus (we have actually discussed this), but this is not how it is used in Wikidata.

- "human" can have a statement "intrinsic property" (property proposal
still under discussion) with values "birthday (Q47223)" and an
"(eventual) date of death". It can be expanded in the future to include
newly created properties like "height", "weight", "eye color", etc

Yes, this again makes sense to me. It is basically a variant of the constraint "Item" which allows you to say that items that are instance of human should also have a birthday. But again, this is schematic information (like constraints) and it should not be mixed up with actual data. It is the same conceptual difference that I have explained for properties vs. items earlier. Moreover, I think this information (even if correct in some sense) has very little utility as a piece of information about an item; it is much more useful for constraints about properties (which are not items).

- birthday (Q47223) <corresponds with property> date of birth (P569)

It should be the other way around: the correspondence says something about P569, not about Q47223. There cannot be any reference for this. It should therefore be a claim on the page of P569 rather than a statement on the page of Q47223.


Out of this I reach the following conclusions:
- the taxonomy of properties is going to be weak, since there is not
always a clear subpropertyOf unless created artificially (more work)

I agree.

- the standard taxonomy of items (subclass of/part of) is sufficient
to automatically reach meaningful constraints and inference (less work)

I agree that the taxonomy will be helpful in constraints. This is what constraints already do when using instance of/subclass of. However, I do not agree that the constraints can or should be stated as part of this taxonomy. Constraints are too complex, and they are conceptually different (they say how a property should be used, not how something in the Real World relates to something else). Constraints interact nicely with the taxonomy and help to get useful conclusions, but they are not "part of" taxonomy ;-). We must keep content organisation separate from content.

- by adding manually the constraints to the property itself we are
duplicating information which will require volunteer effort to maintain
(more work)

I disagree. Constraints refer to the property, not to the Wikidata item, and it would be conceptually wrong to mix these things up. We already have agreed that properties and items need to remain distinct for technical reasons. Once this is clear, there is no reason to move information that refers to properties (constraints) to item pages. This will not be a duplication of information: it is enough to have the constraints on the property pages only. If you look at the constraints we have, you can see many examples that are specific to Wikidata and certainly not a general thing about the concept (take the "allowed values" for "sex or gender"). We really want to keep editorial helpers (constraints) distinct from sourced information (statements about items).


My recommendation is to rely mainly on the main taxonomy instead of
creating a parallel property taxonomy, and then think of ways to extract
information from the main taxonomy to convert it automatically into
constraints.
All the maintenance takes effort, so the more it can be automated, the
more efficient volunteers will be. And if we can simplify the
maintenance of properties, we will be able to simplify the creation of
properties too, specially when we face the next surge which will come
with the datatype "number with units".

I agree with the general goals, but I don't think that things become any easier if we confuse information about properties with information about items. We can still re-use information we have about items (like the class hierarchy that we already use in constraints) to avoid duplication, but some things are clearly not part of the item taxonomy.

Cheers,

Markus




On Wed, May 28, 2014 at 2:48 PM, Markus Krötzsch
<mar...@semantic-mediawiki.org <mailto:mar...@semantic-mediawiki.org>>
wrote:

    David,

    Regarding the question of how to classify properties and how to
    relate them to items:

    * "same as" (in the sense of owl:sameAs) is not the right concept
    here. In fact, it has often been discouraged to use this on the Web,
    since it has very strong implications: it means that in all uses of
    the one identifier, one could just as well use the other identifier,
    and that it is indistinguishable if something has been said about
    the one or the other. That seems too strong here, at least for most
    cases.

    * In the world of OWL DL, sameAs specifically refers to individuals,
    not to classes or properties. Saying "P sameAs Q" does not imply
    that P and Q have the same extension as properties. For the latter,
    OWL has the relationship owl:equivalentProperties. This distinction
    of instance level and schema level is similar to the distinction we
    have between "instance of" and "subclass of".

    * Therefore, I would suggest to use a property called "subproperty
    of" as one way of relating properties (analogously to "subclass
    of"). It has to be checked if this actually occurs in Wikidata (do
    we have any properties that would be in this relation, or do we make
    it a modelling principle to have only the most specific properties
    in Wikidata?).

    * The relationship from properties to items could be modelled with
    the existing property "subject of" (P805).

    * It might be useful to also have a taxonomic classification of
    properties. For example, we already group properties into properties
    for "people", "organisations", etc. Such information could also be
    added with a specific property (this would be a bit more like a
    "category" system on property pages). On the other hand, some of
    this might coincide with constraint information that could be
    expressed as claims. For instance, person properties might be those
    with "Type" (i.e., "rdfs:domain") constraint human. By the way, our
    constraint system could use some systematisation -- there are many
    overlaps in what you can do with one constraint or another.

    Cheers,

    Markus


    On 28/05/14 12:14, David Cuenca wrote:

        Markus,
        The explanation about the implications of renaming/deleting
        makes most
        sense and just that justifies already the separation in two.
        It is equally true that when we create a property, we might have
        "cleaned" the original concept so much that it might differ (even
        slightly) with the understood concept that the item represents.
        However,
        even after that process, the "new" concept is still an item...

        The process of imbuing a concept with permanent characteristics
        (adding
        a datatype) and the practical approach, also seems to recommend
        keeping
        items and properties separate.
        Thanks for showing me that reasoning :)

        I am still wondering about how are we going to classify properties.
        Maybe it will require a broader discussion, but if they are the
        same (or
        mostly the same) as items, then we can just link them as "same
        as", and
        build the classing structure just for the items. OTOH, if they are
        different, then we will need to mirror that classification for
        properties, which seems quite redundant. Plus adding a new datatype,
        "property".

        All in all, my conclusion about this is that properties are just
        concepts with special qualities that justify the separation in the
        software (even if in real life there is no separation).

        many thanks for your detailed answer, and sorry if I'm bringing up
        already discussed topics. It is just that when you stare long into
        wikidata, wikidata stares back into you ;)

        Cheers,
        Micru


        On Wed, May 28, 2014 at 11:39 AM, Markus Krötzsch
        <mar...@semantic-mediawiki.org
        <mailto:mar...@semantic-mediawiki.org>
        <mailto:markus@semantic-__mediawiki.org
        <mailto:mar...@semantic-mediawiki.org>>>

        wrote:

             Hi David,

             Interesting remark. Let's explore this idea a bit. I will
        give you
             two main reasons why we have properties separate, one
        practical and
             one conceptual.

             First the practical point. Certainly, everything that is
        used as a
             property needs to have a datatype, since otherwise the wiki
        would
             not know what kind of input UI to show. So you cannot use
        just any
             item as a property straight away -- it needs to have a datatype
             first. So, yes, you could abolish the namespace Property
        but you
             still would have a clear, crisp distinction between
        property items
             (those with datatype) and normal items (those without a
        datatype).
             Because of this, most of the other functions would work the
        same as
             before (for example, property autocompletion would still
        only show
             properties, not arbitrary items).

             A complication with this approach is that property
        datatypes cannot
             change in Wikibase. This design was picked since there is
        no way to
             convert existing data from one datatype to another in
        general. So
             changing the datatype would create problems by making a lot
        of data
             "invalid", and require special handling and special UI to
        handle
             this situation. With properties living in a separate
        namespace, this
             is not a real restriction: you can just create a new
        property and
             give it the same label (after naming the old one
        differently, e.g.,
             putting "DEPRECATED" in its name). Then you can migrate the
        data in
             some custom fashion. But if properties would be items, we
        would have
             a problem here: the item is already linked to many
        Wikipedias and
             other projects, and it might be used in LUA scripts,
        queries, or
             even external applications like Denny's Javascript translation
             library. You cannot change item ids easily. Also, many
        items would
             not have a datatype, so the first one who (accidentally?)
        is entered
             will be fixed. So we would definitely need to rethink the
        whole idea
             of unchangeable datatypes.

             My other important reason is conceptual. Properties are not
             considered part of the (encyclopaedic) data but rather part
        of the
             schema that the community has picked to organise that data.
        As in
             your example, "emissivity" (Q899670) is a notion in physics as
             described in a Wikipedia article. There are many things to
        say about
             this notion (for example, it has a history: somebody must have
             defined this first -- although Wikipedia does not say it in
        this
             case). As in all cases, some statements might be disputed while
             others are widely acknowledged to be "true".

             For the property "emissivity" (P1295), the situation is quite
             different. It was introduced as an element used to enter data,
             similar to a row in a database table or an infobox template
        in some
             Wikipedia. It does probably closely relate to the actual
        physical
             notion Q899670, but it still is a different thing. For
        example, it
             was first introduced by User:Jakec, who is probably not the
        person
             who introduced the physical concept ;-) Anything that we
        will say
             about P1295 in the future refers to the property -- a
        concept of our
             own making, that is not described in any external source
        (there are
             no publications discussing P1295).

             This is also the reason why properties are supposed to support
             *claims* not *statements*. That is, they will have
        property-value
             pairs and qualifiers, but no references or ranks. Indeed,
        anything
             we say about properties has the status of a definition. If
        we say
             it, it's true. There is no other authority on Wikidata
        properties.
             You could of course still have items and properties "share"
        a page
             and somehow define which statements/claims refer to which
        concept,
             but this does not seem to make things easier for users.

             These are, for me, the two main reasons why it makes sense
        to keep
             properties apart from items on a technical level. Besides
        this, it
             is also convenient to separate the 1000-something
        properties from
             the 15-million something items for reasons of maintenance.

             Best regards,

             Markus



             On 28/05/14 09:25, David Cuenca wrote:

                 Since the very beginning I have kept myself busy with
        properties,
                 thinking about which ones fit, which ones are missing
        to better
                 describe
                 reality, how integrate into the ones that we have. The
        thing is
                 that the
                 more I work with them, the less difference I see with
        normal
                 items....
                 and if soon there will be statements allowed in
        property pages, the
                 difference will blur even more.
                 I can understand that from the software development
        point of view it
                 might make sense to have a clear difference. Or for the
                 community to get
                 a deeper understanding of the underlying concepts
        represented by
                 words.

                 But semantically I see no difference between:
                 cement (Q45190) <emissivity (P1295)> 0.54
                 and
                 cement (Q45190) <emissivity (Q899670)> 0.54

                 Am I missing something here? Are properties really
        needed or are we
                 adding unnecessary artificial constraints?

                 Cheers,
                 Micru


                 ___________________________________________________
                 Wikidata-l mailing list
        Wikidata-l@lists.wikimedia.org
        <mailto:Wikidata-l@lists.wikimedia.org>
                 <mailto:Wikidata-l@lists.__wikimedia.org
        <mailto:Wikidata-l@lists.wikimedia.org>>
        https://lists.wikimedia.org/____mailman/listinfo/wikidata-l
        <https://lists.wikimedia.org/__mailman/listinfo/wikidata-l>

        <https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
        <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>>



             ___________________________________________________
             Wikidata-l mailing list
        Wikidata-l@lists.wikimedia.org
        <mailto:Wikidata-l@lists.wikimedia.org>
        <mailto:Wikidata-l@lists.__wikimedia.org
        <mailto:Wikidata-l@lists.wikimedia.org>>
        https://lists.wikimedia.org/____mailman/listinfo/wikidata-l
        <https://lists.wikimedia.org/__mailman/listinfo/wikidata-l>

             <https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
        <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>>




        --
        Etiamsi omnes, ego non


        _________________________________________________
        Wikidata-l mailing list
        Wikidata-l@lists.wikimedia.org
        <mailto:Wikidata-l@lists.wikimedia.org>
        https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
        <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>



    _________________________________________________
    Wikidata-l mailing list
    Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org>
    https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
    <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>




--
Etiamsi omnes, ego non


_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l



_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to