On 27/09/2017 17:14, Yuri Astrakhan wrote:
* Problem #1:  In my analysis of OSM data, wikipedia tags quickly go stale because they use Wikipedia page titles, and titles are constantly renamed, deleted, and what's worse - old names are reused for new meanings.  This is a fundamental problem with all Wikipedia tags, such as wikipedia, brand:wikipedia, operator:wikipedia, etc, that needs solving. The solution does not need to be perfect, it just needs to be better than what we have.

* Problem #2: the *meaning* of the "wikipedia" tag is ambiguous, and therefor cannot be processed easily. The top three meanings I have seen are:   a) This WP article is about this OSM feature (a so called 1:1 match, e.g. city, famous building, ...)   b) This WP article is about some aspect of this OSM feature, like its brand, tree species, or subject of the sculpture   c) Only a part of this WP article is about this OSM feature, e.g. a WP list of museums in the area contains description of this museum.

* Problem #3: data consumers need cleaner, more machine-processable data. The text label is much more error prone than an ID:  McDonalds vs mcdonalds vs McDonald's vs ..., so having "brand=mcdonalds" results in many errors. Note that just because OSM default map skin may handle some of them correctly, each data consumer has to re-implement that logic, so the more ambiguous something is, the more likely it will result in errors and data omissions.

The brand:wikidata discussion is about #1, #2b, and #3.

Are we in agreement that these are problems, or do you think none of them need solving?

1)  Not a problem as such.  If something has changed on the wikipedia side then something may need checking on the OSM side.  It might be as simple as "someone's just renamed the wikipedia page" then fine just fix the link - but it needs a human to check it. What might have happened of course is that the object has changed in the real world (been renamed, moved, or changed in some other way) and the object in OSM needs a resurvey, or perhaps can be changed based on existing knowledge, but either way it still needs checking.

2b) If someone's added a wikipedia link to an OSM object that represents a tree to point to the wikipedia page of that type of tree, than that's not helpful.  There's no need for the link, since the tree type is already tagged in OSM.

3) This depends on the data consumer.  If you're simply trying to impress people with the volume of data that you have access to then you might indeed want an a large number of unmaintainable extra links of dubious provenance.  Realistically though in my experience (as I've written elsewhere in this thread) data consumers do care about the quality of the data that they're processing, and the fact that the person adding the object spelt "McDonald's" differently is something that they may well have a view about.

In a different context I've written elsewhere about the work that went in to create the list at https://github.com/SomeoneElseOSM/SomeoneElse-style/blob/master/style.lua#L1401 which involved looking at how people tagged certain sorts of features in OSM.  Free tagging is both a strength and a weakness of OSM - without it the data wouldn't get captured at all, but with it people do have to look at the data that's been added - but it's what data consumers do already.  You could argue that a "brand:wikidata" key makes their job easier, but if they want to do a proper job it probably doesn't make a lot of difference.

Another example - I recently looked at the usage of "natural=fell" in OSM with a view to rendering it.  It surprised me that this query http://overpass-turbo.eu/s/s2q showed at least 3 different types of objects with the same OSM tag.  A data consumer can't assume that what they thought that something meant (perhaps after reading the OSM wiki) is what mappers actually do - they'll need to filter the data they're consuming based on actual OSM usage.  In the case of "brand:wikidata" they may want to filter out obviously bot-added values because there was no local knowledge of that data and go back to what other tags the mappers added (in the case of the Aldis discussed elsewhere I suspect that there will always enough info to say which is which in other tags or using geographic location).

Best Regards,

Andy

_______________________________________________
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk

Reply via email to