Hi,

On 24/08/13 19:45, Jeroen De Dauw wrote:
Hey,

    The situation with commonsMedia is a bit bad because it should be a
    URL rather than a string. What I do in wda is effectively a type
    conversion from string to URI in this particular case. Maybe we can
    fix this somehow in the future when URIs are supported as a value
    datatype.


Ok, this makes me somewhat concerned. We do have a IriValue DV [0],
which we've had for nearly a year. It is indeed not used for
commonsMedia, not sure why. What concerns me is that we are now
introducing a "url" data type, which will also just use the string DV,
rather then the IRI DV. I'm not very happy with this, though it is what
most of the team wants. If there is a problem with this approach, it
should be outlined _soon_, since this is something not far from
deployment if I understand it correctly.

If we have an IRI DV, considering that URLs are special IRIs, it seems clear that IRI would be the best way of storing them. For any Web-based format (esp. OWL and RDF), there is a big difference between "some arbitrary string" and an IRI. Similarly, many tools that use data will naturally treat URLs in a different way than other strings when displaying them to users. If this difference is not captured in the data, then applications have to look it up, use some kind of hard-coded handling for certain properties, or apply heuristics to decide which strings are supposed to be URLs. Using IRI DVs would solve the problem in a cleaner way with less effort.

Of course, you could just use "string" for all types of datavalue without loosing datavalue information. However, this would make the Wikidata data model inadequate for some important uses. The exported RDF will fix this in a sense, so people using this will get the important information from there. However, RDF has other problems that make it difficult to use as a primary data dump format (esp. heavy normalisation), and it is not available from Wikibase yet. Therefore, I think it would be problematic if the Wikidata data model is simplified to such an extent that practically important information is no longer easy to get for external users.

I appreciate that there might be split opinions about this among the developers (who see the immediate technical consequences, esp. for their piece of work). However, this decision has important long-term consequences beyond current engineering aspects. Luckily, Wikidata has a recognized expert in Web data technologies as its technical director ;-) -- the team should trust his judgement here.

Cheers,

Markus



_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to