[Wikidata-bugs] [Maniphest] [Commented On] T111770: Decide how to represent quantities with units in the "truthy" RDF mapping

2015-09-11 Thread mkroetzsch
mkroetzsch added a comment.

Note that this discussion is no longer just about the wdt property values 
(called "truthy" above). Simple values are now used on several levels in the 
RDF encoding.

In general, the same argument as for coordinates applies: if we cannot do it 
right, then better not do it at all (i.e., use a bnode until we have a format). 
This might always be necessary in some cases (e.g., even if we convert units, 
there might be cases where conversion is not possible).

I agree with the advantages and disadvantages of using a custom datatype. 
Without BlazeGraph support for this, one would not be able to do range queries 
over such data, which would make it pretty useless. We could as well use 
strings in this case.

The normalisation of units by converting them to a base unit would still leave 
important problems. If there would be a community controlled way to define 
conversions, there would be the problem that the "main" unit that the RDF data 
is normalised to might change. This would change the content and meaning of 
simple values even though actual property values have not changed. Somehow 
declaring this in other triples in the RDF dump would not solve this, since we 
assume many fixed (standing) queries to be used which would not be able to 
adapt automatically to a new unit declaration. The normalisation scheme would 
also create problems for incremental update: a single change in the conversion 
definitions would require changes in millions of simple values that are part of 
the export of items that have not changed at all.

A possible solution to work around the absence of a datatype and even in the 
absence of conversion support would be to create properties like "P1234inCm" 
and "P1234inInch". They would have plain number values that work in range 
queries. This would basically simulate the custom datatype with very similar 
effect on query answering (users would need to adjust queries to specify the 
unit that is queried for, but they would at least be sure that the data they 
query refers to this unit). The downside is that you need a different property 
for each unit, and that therefore you still have no good value to use for the 
simple value properties. However, I think this is how other datasets are doing 
it (has anybody checked DBpedia?).


TASK DETAIL
  https://phabricator.wikimedia.org/T111770

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mkroetzsch
Cc: Denny, mkroetzsch, Smalyshev, Aklapper, daniel, jkroll, Wikidata-bugs, 
Jdouglas, aude, Deskana, Manybubbles, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T111770: Decide how to represent quantities with units in the "truthy" RDF mapping

2015-09-11 Thread mkroetzsch
mkroetzsch added a comment.

If we could distinguish type quantity properties that require a unit from those 
that do not allow units, there would be another options. Then we could use a 
compound value as the "simple" value for all properties with unit to simulate 
the missing datatype. On the query level, this would be fully equivalent to 
having a custom datatype, since one can specify the unit and the (ranged) 
number individually. (While the P1234inCm properties support only the number, 
but no queries that refer to the unit).

Using a compound value as a simple value is fine. It's not worse than a bnode 
if you do not want to look into the inner structure, but it has additional 
features for those who want. The only problem is that you should not mix number 
literals with URIs that refer to compound values for the same property -- this 
is why one would need to fix in the property datatype whether units are 
required (always there) or forbidden (never there). Mixing this would not work.


TASK DETAIL
  https://phabricator.wikimedia.org/T111770

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mkroetzsch
Cc: Denny, mkroetzsch, Smalyshev, Aklapper, daniel, jkroll, Wikidata-bugs, 
Jdouglas, aude, Deskana, Manybubbles, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs