Re: [OSM-talk] Adding wikidata tags to the remaining objects with only wikipedia tag

Andy Townsend Wed, 27 Sep 2017 17:57:07 -0700

On 26/09/2017 18:08, Yuri Astrakhan wrote:

      When data consumers want to get a link to corresponding wikipedia
    article, doing that with wikipedia[:xx] tags is straightforward. Doing
    the same with wikidata requires additional pointless and time
    consuming abrakadabra.
no, you clearly haven't worked with any data consumers recently. Dataconsumers want Wikidata, much more than wikipedia tags - please talkto them.


That would be me in a former job, I think.

One of the things that I used to spend a lot of time doing was findingways to encode data so that knowledge could be shared by e.g. fieldengineers, and then analysing those results so that you can find outwhat was related to what, what caused what, and how much store you canset by a particular result or prediction. There are a couple of pointsworth sharing from that experience:

1) The first point to make about human-contributed data is that it'svariable. Some people will say something is probably an X, some peopleprobably a Y. The reality is that they're actually both right some ofthe time. You might think (in the context of e.g. shop brands) "hang on- surely a shop can be only one brand? It must be _either_ X or Y!" butyou'd be wrong. There are _always_ exceptions, and there will always be"errors" - you just don't know which way is right and which wrong.

2) The second point that's relevant here is that codes such as CODE1,CODE2 etc. are to be avoided at all costs since they don't enable anynatural visualisation of what's been captured. You have already said"but surely every system that displays data can look up the description"but anyone familar with the real world knows that that simply won'thappen. This means that there's no way for an ordinary mapper to verifywhether the magic code on an OSM item is correct or not. Verifiabilityis one of the key concepts of OSM (seehttps://wiki.openstreetmap.org/wiki/Verifiability et al) and anythingthat moves away from it means that data isn't going to be maintained,because people simply won't understand what it means. I suspect that akey part of the success of OSM was the reliance on naturallanguage-based keys and values, and a loose tagging scheme that allowedeasy expansion.

3) The third point is that a database that has been "cleaned" so thatthere are no "errors" in it is worth far less than one that hasn't, whenyou're trying to understand the complex relationships between objects. This goes against most normal data processing instincts becauseobviously normally you'd try and ensure that data has full referentialintegrity - but where there are edge cases (and as per (1) above thereare always edge cases) different consumers will very likely want totreat those edge cases differently, which they can't do if someone has"helpfully" merged all the edge cases into more popular categories.

To be blunt, if I was trying to process OSM data and had a need to getinto the wikidata/wikipedia world based on it (for example because Iwanted the municipal coat of arms - something not in OSM) I'd take awikipedia link over a wikidata one every time because all mappers willhave been able to see the text of the wikipedia link rather than justsomething like Q123456. You've made the point that things change inwikipedia regularly (articles get renamed etc.), but it's important toremember that things change in the real world all the time as well - anda link that's suddenly pointing at something different in wikipedia isimmediately apparent, in the same way that if Q123456 was no longerrelevant (because the real world thing has changed) it wouldn't be.

All that said, I don't see wikidata as a key component (or even a veryuseful component) of OSM - but we all map things that are of interest tous - some people map in great detail the style of British telephoneboxes or the "Royal Cipher" on postboxes which I see absolutely no pointin, but if it's verifiable, why not - I'm sure I'm mapping stuff that isirrelevent to them. A problem with wikidata (as noted above) is thatI'm not sure that it _is_ verifiable data - I suspect it'll get staleafter adding and never be maintained, simply because people will nevernotice that it's wrong.


(and on an unrelated comment in the same message)

Sure, it can be via dump parsing, but it is a much more complicatedthan querying. Would you rather use Overpass turbo to do a quicksearch for some weird thing that you noticed, or download and parsethe dump? Most people would rather do the former.

It depends - if you want to do a "quick search for something" then anequivalent to overpass turbo might be an option, but in the real worldwhat you'd _actually_ want to do is a local database query.Unfortunately that side of things seems to be completely missing (or atleast very well-hidden) - wikidata seems to be quite immature in thatrespect. Where's the "switch2osm" for wikidata? Where's the"osm2pgsql" or "osmosis"? Sure I can download 20Gb of gzipped JSON fromhttps://dumps.wikimedia.org/wikidatawiki/entities/20170925/ and try andwrite some sort of parser based onhttps://www.mediawiki.org/wiki/Wikibase/DataModel/JSON , but this seemsvery much like going back to banging the rocks together (and no, athird-party query interface that depends on an external networkconnection such as https://query.wikidata.org/ or anything else isn't abetter option).


Regards,
Andy

_______________________________________________
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk

Re: [OSM-talk] Adding wikidata tags to the remaining objects with only wikipedia tag

Reply via email to