Since it appears that the creation of *subproperty of* went unnoticed by
many, I'd like to describe an important aspect of its proper use, and how
that relates to classification.

Please note that *instance of* (P31) and *subclass of* (P279) are not valid
values for *subproperty of* (P1647) claims, as described in the P1647
documentation [1].  For example, claims like "occupation *subproperty of*
instance of" are invalid.  The reasons for this are both technical and
architectural.

On the technical side, *instance of, subclass of* and *subproperty of* are
intended to be straightforwardly exportable as rdf:type, rdfs:subClassOf
and rdfs:subPropertyOf.  As described in *On the Properties of Metamodeling
in OWL* [2], claims that use OWL's built-in vocabulary (e.g. rdf:type) as
individuals make an ontology undecidable.  If an ontology is undecidable,
then queries are not guaranteed to terminate.  This is a big deal.
Decidability is a main goal of OWL 2 DL and a requirement in the more
specialized profiles OWL 2 EL, OWL 2 RL and OWL 2 QL.  Most Semantic Web
ontologies aim to valid be in at least OWL 2 DL.  So if Wikidata aims to be
easily interoperable with the rest of the Semantic Web, we should aim to be
valid in OWL 2 DL, and thus not make claims of the form "P *subproperty of*
instance of (P31)" or "P *subproperty of* subclass of (P279)".

Avoiding such claims is also good design.  There should be one -- and
preferably only one -- obvious way to specify the type of an instance.
Having a multitude of domain-specific "type" subproperties would promote an
anti-pattern: using *instance of* as a catch-all property to make any
statement under the sun that makes sense when connected with the phrase "is
a".

Having a single "type" property for instances also fosters another best
practice in Wikidata: asserted monohierarchy [3].  In other words, there
should be only one explicit normal or preferred *instance of *or *subclass
of* claim per item.  Having an *instance of *claim and a *subclass of*
claim on an item isn't necessarily bad (it's called "punning"), but having
multiple *instance of* claims or multiple *subclass of* claims on an item
is a bad smell.  Items can typically satisfy a huge number of *instance of*
claims, but should generally have only one such claim made explicitly in
Wikidata.

For example, Coco Chanel (Q45661) can be said to be "*instance of* French
person", "*instance of* fashion designer", "*instance of* female", etc.
Instead of such catch-all use of *instance of*, Wikidata moves that
knowledge into properties like *country of citizenship* (P27), *occupation*
(P106) and *sex or gender* (P21).  Coco Chanel has one explicit *instance
of* value: human (Q5) -- a class that encapsulates essential features of
the subject.

Most of Wikidata follows these general principles of classification.  But a
few domains of knowledge remain either somewhat of a mess, or organized but
idiosyncratic.  Items like the one for the German municipality of Aalen
[4], with 7 *instance of* values -- several of them redundant -- exemplify
the mess.  With the deletion of domain-specific "type" properties like *type
of administrative territorial entity* (P132) [5], we are on the right
track.  The solution is not to make such things subproperties of *instance
of*, but rather to delete them and use *instance of* for one preferred
class and put other values in other properties (note -- this may require
new properties!).

The same applies for *subclass of*.

I encourage anyone interested in stuff like *subproperty of* to join the
discussions ongoing at
https://www.wikidata.org/wiki/Wikidata:Property_proposal/Property_metadata.
The Wikidata community is currently discussing how we want to handle things
like *domain* and *range* properties (e.g. should we use rdfs:domain or
schema:DomainIncludes?)  and whether we want to have an *inverse of*
property (or delete all inverse properties).  The outcome of these
discussions will shape the interface between Wikidata and the rest of the
Semantic Web.

Thanks,
Eric

https://www.wikidata.org/wiki/User:Emw


1.  https://www.wikidata.org/wiki/Property:P1647
2.  Boris Motik (2007).  On the Properties of Metamodeling in OWL.
https://www.cs.ox.ac.uk/boris.motik/pubs/motik07metamodeling-journal.pdf
*3.  *Barry Smith, Werner Ceusters (2011).  Ontological realism: A
methodology for coordinated evolution of scientific ontologies.  Section
1.8: Asserted monohierarchies.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3104413/#S9
4.  Aalen on Wikidata as of 2015-01-10.
https://www.wikidata.org/w/index.php?title=Q3951&oldid=184247296#P31
5.
https://www.wikidata.org/wiki/Wikidata:Requests_for_deletions/Archive/2014/Properties/1#type_of_administrative_territorial_entity_.28P132.29
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to