mkroetzsch added a comment. Note that this discussion is no longer just about the wdt property values (called "truthy" above). Simple values are now used on several levels in the RDF encoding.
In general, the same argument as for coordinates applies: if we cannot do it right, then better not do it at all (i.e., use a bnode until we have a format). This might always be necessary in some cases (e.g., even if we convert units, there might be cases where conversion is not possible). I agree with the advantages and disadvantages of using a custom datatype. Without BlazeGraph support for this, one would not be able to do range queries over such data, which would make it pretty useless. We could as well use strings in this case. The normalisation of units by converting them to a base unit would still leave important problems. If there would be a community controlled way to define conversions, there would be the problem that the "main" unit that the RDF data is normalised to might change. This would change the content and meaning of simple values even though actual property values have not changed. Somehow declaring this in other triples in the RDF dump would not solve this, since we assume many fixed (standing) queries to be used which would not be able to adapt automatically to a new unit declaration. The normalisation scheme would also create problems for incremental update: a single change in the conversion definitions would require changes in millions of simple values that are part of the export of items that have not changed at all. A possible solution to work around the absence of a datatype and even in the absence of conversion support would be to create properties like "P1234inCm" and "P1234inInch". They would have plain number values that work in range queries. This would basically simulate the custom datatype with very similar effect on query answering (users would need to adjust queries to specify the unit that is queried for, but they would at least be sure that the data they query refers to this unit). The downside is that you need a different property for each unit, and that therefore you still have no good value to use for the simple value properties. However, I think this is how other datasets are doing it (has anybody checked DBpedia?). TASK DETAIL https://phabricator.wikimedia.org/T111770 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: Denny, mkroetzsch, Smalyshev, Aklapper, daniel, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, JanZerebecki _______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs