mkroetzsch added a comment.

Note that this discussion is no longer just about the wdt property values 
(called "truthy" above). Simple values are now used on several levels in the 
RDF encoding.

In general, the same argument as for coordinates applies: if we cannot do it 
right, then better not do it at all (i.e., use a bnode until we have a format). 
This might always be necessary in some cases (e.g., even if we convert units, 
there might be cases where conversion is not possible).

I agree with the advantages and disadvantages of using a custom datatype. 
Without BlazeGraph support for this, one would not be able to do range queries 
over such data, which would make it pretty useless. We could as well use 
strings in this case.

The normalisation of units by converting them to a base unit would still leave 
important problems. If there would be a community controlled way to define 
conversions, there would be the problem that the "main" unit that the RDF data 
is normalised to might change. This would change the content and meaning of 
simple values even though actual property values have not changed. Somehow 
declaring this in other triples in the RDF dump would not solve this, since we 
assume many fixed (standing) queries to be used which would not be able to 
adapt automatically to a new unit declaration. The normalisation scheme would 
also create problems for incremental update: a single change in the conversion 
definitions would require changes in millions of simple values that are part of 
the export of items that have not changed at all.

A possible solution to work around the absence of a datatype and even in the 
absence of conversion support would be to create properties like "P1234inCm" 
and "P1234inInch". They would have plain number values that work in range 
queries. This would basically simulate the custom datatype with very similar 
effect on query answering (users would need to adjust queries to specify the 
unit that is queried for, but they would at least be sure that the data they 
query refers to this unit). The downside is that you need a different property 
for each unit, and that therefore you still have no good value to use for the 
simple value properties. However, I think this is how other datasets are doing 
it (has anybody checked DBpedia?).


TASK DETAIL
  https://phabricator.wikimedia.org/T111770

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mkroetzsch
Cc: Denny, mkroetzsch, Smalyshev, Aklapper, daniel, jkroll, Wikidata-bugs, 
Jdouglas, aude, Deskana, Manybubbles, JanZerebecki



_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to