On 26/09/2017 18:08, Yuri Astrakhan wrote:


      When data consumers want to get a link to corresponding wikipedia
    article, doing that with wikipedia[:xx] tags is straightforward. Doing
    the same with wikidata requires additional pointless and time
    consuming abrakadabra.


no, you clearly haven't worked with any data consumers recently. Data consumers want Wikidata, much more than wikipedia tags - please talk to them.

That would be me in a former job, I think.

One of the things that I used to spend a lot of time doing was finding ways to encode data so that knowledge could be shared by e.g. field engineers, and then analysing those results so that you can find out what was related to what, what caused what, and how much store you can set by a particular result or prediction.  There are a couple of points worth sharing from that experience:

1) The first point to make about human-contributed data is that it's variable.  Some people will say something is probably an X, some people probably a Y.  The reality is that they're actually both right some of the time.  You might think (in the context of e.g. shop brands) "hang on - surely a shop can be only one brand?  It must be _either_ X or Y!" but you'd be wrong.  There are _always_ exceptions, and there will always be "errors" - you just don't know which way is right and which wrong.

2) The second point that's relevant here is that codes such as CODE1, CODE2 etc. are to be avoided at all costs since they don't enable any natural visualisation of what's been captured.  You have already said "but surely every system that displays data can look up the description" but anyone familar with the real world knows that that simply won't happen.  This means that there's no way for an ordinary mapper to verify whether the magic code on an OSM item is correct or not.  Verifiability is one of the key concepts of OSM (see https://wiki.openstreetmap.org/wiki/Verifiability et al) and anything that moves away from it means that data isn't going to be maintained, because people simply won't understand what it means.  I suspect that a key part of the success of OSM was the reliance on natural language-based keys and values, and a loose tagging scheme that allowed easy expansion.

3) The third point is that a database that has been "cleaned" so that there are no "errors" in it is worth far less than one that hasn't, when you're trying to understand the complex relationships between objects.  This goes against most normal data processing instincts because obviously normally you'd try and ensure that data has full referential integrity - but where there are edge cases (and as per (1) above there are always edge cases) different consumers will very likely want to treat those edge cases differently, which they can't do if someone has "helpfully" merged all the edge cases into more popular categories.


To be blunt, if I was trying to process OSM data and had a need to get into the wikidata/wikipedia world based on it (for example because I wanted the municipal coat of arms - something not in OSM) I'd take a wikipedia link over a wikidata one every time because all mappers will have been able to see the text of the wikipedia link rather than just something like Q123456.  You've made the point that things change in wikipedia regularly (articles get renamed etc.), but it's important to remember that things change in the real world all the time as well - and a link that's suddenly pointing at something different in wikipedia is immediately apparent, in the same way that if Q123456 was no longer relevant (because the real world thing has changed) it wouldn't be.

All that said, I don't see wikidata as a key component (or even a very useful component) of OSM - but we all map things that are of interest to us - some people map in great detail the style of British telephone boxes or the "Royal Cipher" on postboxes which I see absolutely no point in, but if it's verifiable, why not - I'm sure I'm mapping stuff that is irrelevent to them.  A problem with wikidata (as noted above) is that I'm not sure that it _is_ verifiable data - I suspect it'll get stale after adding and never be maintained, simply because people will never notice that it's wrong.

(and on an unrelated comment in the same message)


Sure, it can be via dump parsing, but it is a much more complicated than querying.  Would you rather use Overpass turbo to do a quick search for some weird thing that you noticed, or download and parse the dump?  Most people would rather do the former.

It depends - if you want to do a "quick search for something" then an equivalent to overpass turbo might be an option, but in the real world what you'd _actually_ want to do is a local database query. Unfortunately that side of things seems to be completely missing (or at least very well-hidden) - wikidata seems to be quite immature in that respect.   Where's the "switch2osm" for wikidata?  Where's the "osm2pgsql" or "osmosis"?  Sure I can download 20Gb of gzipped JSON from https://dumps.wikimedia.org/wikidatawiki/entities/20170925/ and try and write some sort of parser based on https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON , but this seems very much like going back to banging the rocks together (and no, a third-party query interface that depends on an external network connection such as https://query.wikidata.org/ or anything else isn't a better option).

Regards,
Andy

_______________________________________________
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk

Reply via email to