Re: [OSM-talk] Adding wikidata tags to the remaining objects with only wikipedia tag

Yuri Astrakhan Sun, 01 Oct 2017 19:52:12 -0700

On Sun, Oct 1, 2017 at 3:45 PM, Tomas Straupis <tomasstrau...@gmail.com>
wrote:


> > Tomas, you claimed that "It adds NO value."  This is demonstrably wrong.
> You
> > are right that the same fixing was done for years. But until wikidata
> tag,
> > there was no easy way to FIND them.
>
>   There always was.
>   You simply take wikipedia provided geo-tags dump like
> https://dumps.wikimedia.org/ltwiki/latest/ltwiki-latest-geo_tags.sql.gz
>
>   This gives you a very simple table with: lat/lon/page_title.
>   No parsing or anything else involved.
>   You then take data from OSM - lat/lon/wikipedia.tag
>   So you have two tables of same structure. Voila. You can compare
> anything (title, coordinates), in any direction with some
> approximation if needed etc. No OSM wikidata involved at all.
>

Thomas, this will not work. Matching wikidata & osm by coordinates is
useless, because the coordinates differ too much -- see the hard data proof
in the prev email.  The only way you can make any useful calculation is if
you analyze the entirety of Wikidata graph, and merge it with OSM objects,
and expose it to other users so that they can figure out what is right or
broken.  That's exactly what my Wikidata+OSM service allows users to do.


>   If wikipedia page moves - title is gone from this dump and the new
> one appears on the same coordinates. You can map them very quickly.
> Theoretically you can update OSM data automatically, but usually if
> wikipedia title has changed, it means that something has changed in
> the object on the ground, so maybe something else has to be changed in
> OSM data as well (for example name).
>

Again - not possible - because coordinate matching is mostly useless.
Also, no, usually wikipedia titles change not because something changed on
the ground, but because of a conflict with a similarly named place
somewhere else. People usually rename the original page to a more specific
name, and create a new page in its place listing all the disambiguations.
This is what breaks titles most often. We now have about 800 left (after
thousands already fixed), plus potentially thousands more of those that
have not been tagged with wikidata tag yet.

>
> I'm just saying the same could be done without wikidata tags.
>

As explained by me in one of the first emails, and by Andy, and a few
others, it cannot be done **as easily**. You can build a complex system if
you have enough disk space (~1TB), and do a local resolve of wikipedia ->
wikidata, and build a complex service on top of it.  Or you can simply add
a single tag that has already been added to 90% of cases, and use
off-the-shelve query engine to merge the data, and let everyone use it.

>
>   See above. What are practical advantages of your method?
>   Because theoretically you are taking a set A, creating a new set B
> from this A, and then you're trying to fix A according to B. This is
> logical nonsense :-) There is no point of putting this B into OSM.
> This is a temporary data which could be stored in your local "error
> checking" database.
>

Strawman argument :)   For each object that has a tag, I use JOSM to get
corresponding wikidata tag, and upload that data to OSM.  The moment it is
uploaded, other systems, such as my wikidata+osm service, get that data.
Then community, without my involvement, can analyze the data with many
different queries, and fix all the errors they find.  If I haven't uploaded
the data to OSM, only I would be able to see it, and only I would be able
to fix it.  I don't know all the different ways community may query the
data (I'm already getting hundreds of thousands of queries). Its a tool
that helps community.

>
>   550 objects globally... Well... :-) You should see from here, that
> the problem is finding people who want to FIX, not finding problems...
>

750 is number NOW. It used to be many thousands. And was all fixed, by
volunteers. For just the most obvious of queries.  There are many more
fixes that needs to happen - see wikipedia link cleanup project on osm
wiki.  So once the problems are identified, they get solved. Finding them
is the problem.


> I'm arguing against idea that wikipedia tag is outdated or in any way
> worse.


But this is exactly what I have been showing with my data about broken
tags. Do you have any data to say that it is not worse?


> Yes, OSM would not be born
> without a geek idea, but it would not have reached what it is now if
> it would not be easy to understand for non geeks. Wikidata tag is
> totally non-understandable to non-geeks.
>

Wikidata does not need to be understood by geeks or non-geeks. It's an ID,
and everyone understands that concept, and most people don't touch tags
they don't understand. Just like mapillary ID, or tons of other local
government IDs.  The tools we have, like iD editor, can easily work with
these IDs without non-geeks as you call them understanding it. The query
system also doesn't need to be understood to be used - you simply share the
link to the query result, and voila - anyone can see problems and fix them.

>
> We are not using wikidata in any way. We are
> fixing wikipedia links, OSM objects, wikipedia articles manually using
> automated checks described above to pinpoint the problems.
>
> I already covered this point - there is too much discrepancy between OSM
and Wikidata to meaningfully compare them with two dumps. Also, that
requires a highly sophisticated geek to process all that data - not a
scalable approach. Hence - most wikipedia tags go stale. Hence the problem
I am trying to solve.

_______________________________________________
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk

Re: [OSM-talk] Adding wikidata tags to the remaining objects with only wikipedia tag

Reply via email to