mkroetzsch added a comment.
Note that this discussion is no longer just about the wdt property values
(called "truthy" above). Simple values are now used on several levels in the
RDF encoding.
In general, the same argument as for coordinates applies: if we cannot do it
right, then better not do it at all (i.e., use a bnode until we have a format).
This might always be necessary in some cases (e.g., even if we convert units,
there might be cases where conversion is not possible).
I agree with the advantages and disadvantages of using a custom datatype.
Without BlazeGraph support for this, one would not be able to do range queries
over such data, which would make it pretty useless. We could as well use
strings in this case.
The normalisation of units by converting them to a base unit would still leave
important problems. If there would be a community controlled way to define
conversions, there would be the problem that the "main" unit that the RDF data
is normalised to might change. This would change the content and meaning of
simple values even though actual property values have not changed. Somehow
declaring this in other triples in the RDF dump would not solve this, since we
assume many fixed (standing) queries to be used which would not be able to
adapt automatically to a new unit declaration. The normalisation scheme would
also create problems for incremental update: a single change in the conversion
definitions would require changes in millions of simple values that are part of
the export of items that have not changed at all.
A possible solution to work around the absence of a datatype and even in the
absence of conversion support would be to create properties like "P1234inCm"
and "P1234inInch". They would have plain number values that work in range
queries. This would basically simulate the custom datatype with very similar
effect on query answering (users would need to adjust queries to specify the
unit that is queried for, but they would at least be sure that the data they
query refers to this unit). The downside is that you need a different property
for each unit, and that therefore you still have no good value to use for the
simple value properties. However, I think this is how other datasets are doing
it (has anybody checked DBpedia?).
TASK DETAIL
https://phabricator.wikimedia.org/T111770
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: mkroetzsch
Cc: Denny, mkroetzsch, Smalyshev, Aklapper, daniel, jkroll, Wikidata-bugs,
Jdouglas, aude, Deskana, Manybubbles, JanZerebecki
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs