Am 26.08.2013 12:41, schrieb Markus Krötzsch:
Hi Daniel,

if I understand you correctly, you are in favour of equating datavalue types and
property types. This would solve indeed the problems at hand.

The reason why both kinds of types are distinct in SMW and also in Wikidata is
that property types are naturally more extensible than datavalue types.
CommonsMedia is a good example of this: all you need is a custom UI and you can
handle "new" data without changing the underlying data model. This makes it easy
for contributors to add new types without far-reaching ramifications in the
backend (think of numbers, which could be decimal, natural, positive,
range-restricted, etc. but would still be treated as a "number" in the backend).

This could be solved using polymorphism: CommonsMedia, IRI, etc could simply derive from StringValue. Similarly, Percentage could derive from NumberValue, etc.

This is largely academic though, I don't see a good way to transition from the current system to what I have in mind.

Using fewer datavalue types also improves interoperability. E.g., you want to
compare two numbers, even if one is a natural number and another one is a 
decimal.

Indeed. Which is why I'm reluctant to add more, like the IRI type.

There is no simple rule for deciding how many datavalue types there should be.
The general guideline is to decide on datavalue types based on use cases. I am
arguing for diversifying IRIs and strings since there are many contexts and
applications where this is a crucial difference. Conversely, I don't know of any
application where it makes sense to keep the two similar (this would have to be
something where we compare strings and IRIs on a data level, e.g., if you were
looking for all websites with URLs that are alphabetically greater than the
postcode of a city in England :-p).

Currently, my primary concern are validators and simple renderers to be used e.g. in diffs. For validation against a max length as well as regular expressions, it would be useful to be able to treat URLs as strings. The same is true for basic rendering in diffs.

As for the possible confusion, I think some naming discipline would clarify
this. In SMW, there is a stronger difference between both kinds of types, and a
fixed schema for property type ids that makes it easy to recognise them.

I try to use "data value type" vs. "property type", but whenever "data type" is used, it's unclear what is meant.

In any case, using string for IRIs does not seem to solve any problem. It does
not simplify the type system in general and it does not help with the use cases
that I mentioned.

Well, for my use cases mentioned above, URLs should be strings :)

What I do not agree with are your arguments about all of this
being "internal". We would not have this discussion if it were. The data model
of Wikidata is the primary conceptual model that specifies what Wikidata stores.
You might still be right that some of the implementation is internal, but the
arguments we both exchange are not really on the implementation level ;-).

I do not see why it is useful for a property value to expose two types. That's the situation we currently have, and it's confusing. For a canonical representation, there should be only one type, namely the one that is needed to be able to fully interpret the value given. Whether a URL can be treated as a string or not depends on the use case and should be determined be the respective code. It seems a bad idea to me to try and provide an arbitrary set of base types with an arbitrary mapping to concrete/semantic types. If anything, a type hierarchy would make sense.

-- daniel

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to