mkroetzsch added a comment.
Hi, Using the same value for "unknown" is a very bad idea and should not be considered. You already found out why. This highlights another general design principle: the RDF data should encode meaning in structure in a direct way. If two triples have the same RDF term as object, then they should represent relationships to the same thing, without any further conditions on the shape of that term. Otherwise, SPARQL does not work well. For example, the property paths you can write with * have no way of performing extra tests on the nodes you traverse, so the meaning of a chain must not be influenced by the shape of the terms on a property chain, if you want to use * in queries in a meaningful way. This principle is also why we chose bnodes in the first place. OWL also has a standard way of encoding the information that some property has an (unspecified) value, but the encoding of this looks more like what we have in the case of negation (no value) now. If we had used this, one would need a completely different query pattern to find people with unspecified date of death and for people with specified date of death. In contrast, the current bnode encoding allows you to ask a query for everybody with a date of death without having to know if it is given explicitly or left unspecified (you don't even have to know that the latter is possible). This should be kept in mind: the encoding is not just for "use cases" where you are interested in the special situation (e.g., someone having unspecified date of death) but also for all other queries dealing with data of some kind. For this reason, the RDF structure for encoding unspecified values should as much as possible look as the cases where there are values. I am not aware of any other option for encoding "there is a value but we know nothing more about it" in RDF or OWL besides the two options I mentioned. The proposal to use a made-up IRI instead of a bnode gives identity to the unkown (even if that identity has no meaning in our data yet). It works in many unspecified-value use cases where bnodes work, but not in all. The three main confusions possible are: 1. confusing a placeholder "unspecified" IRI with a real IRI that is expected in normal cases (imagine using a FILTER on URL-type property values), 2. believing that the data changed when only the placeholder IRI has changed (imagine someone deleting and re-adding a quantifier with "unspecified" -- if it's a bnode, the outcome is the same in terms of RDF semantics, but if you use placeholder IRIs, you need to know their special meaning to compare the two RDF data sets correctly) 3. accidental or deliberate uses of placeholder IRIs in other places (imagine somebody puts your placeholders as value into a URL-type property) Case 3 can probably be disallowed by the software (if one thinks of it). Another technical issue with the approach is that you would need to use placeholder IRIs also with datatype properties that normally require RDF literals. RDF engines will tolerate this, and for SPARQL use cases it's not a huge difference from tolerating bnodes there. But it does put the data outside of OWL, which does not allow properties to be for literals and IRIs at the same time. Unfortunately, there is no equivalent of creating a placeholder IRI for things like xsd:int or xsd:string in RDF (in OWL, you can write this with a class expression, but it will be structurally different from other cases where this data is set). For the encoding of OWL negation, I am not sure if switching this (internal, structure) bnode to a (generated, unique) IRI would make any difference. One would have to check with the standard to see if this is allowed. I would imagine that it just works. In this case, sharing the same auxiliary IRI between all negative statements that refer to the same property should also work. So: dropping in placeholder IRIs is the "second best thing" to encode bnodes, but it gives up several advantages and introduces some problems (and of course inevitably breaks existing queries). Before doing such a change, there should be a clearer argument as to why this would help, and in which cases. The linked PDF that is posted here for motivation does not speak about updates, and indeed if you look at Aidan's work, he has done a lot of interesting analysis with bnodes that would not make any sense without them (e.g., related to comparing RDF datasets; related to my point 2 above). I am not a big fan of bnodes either, but what we try to encode here is what they have genuinely been invented for, and any alternative also has its issues. TASK DETAIL https://phabricator.wikimedia.org/T244341 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: mkroetzsch, Denny, Lucas_Werkmeister_WMDE, Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs