Hi Martin,

On the first pass, I am not checking individual Q-ID numbers, mostly
because the existing tooling is very poor for that, and the rate of error
is very low. JOSM simply looks up the ID and adds it. BUT, once it is
added, I do a query (OT) for the tags, and match them with the Wikidata
query results, and check that the names and other tags match (in a
spreadsheet), allowing me to quickly catch the very few non-matching or
broken items.  This method has shown much much more value than simply let
people copy/paste IDs, as humans tend to quite a few mistakes - I saw
incorrect language codes for Wikipedia links (probably typed by hand), or
simply stale or non-existent WP links.  None of the approaches are perfect,
but I hope mine will result in a much higher quality and more completeness.

You are correct that sometimes Wikidata could interlink non-related
articles (usually it gets fixed right away), or articles with the different
scope (somewhat more common and permanent). In a rare case, that would mean
Wikidata ID would be wrong or not specific enough, but that is very easy to
catch on the second pass and correct, when the actual data is compared.
The more common case is the one i mentioned before - when Wikipedia
articles (all of the linked languages) are about multiple concepts (e.g.
administrative and ceremonial district together), but there exists another
Wikidata ID, not linked to any articles, just for the admin district. Which
means the linked one is more for ceremonial, and should be fixed (usually
by some additional searching).

So yes, blindly adding tags would work fine for >99%, and will not be good
for the other <1% (guessing). Yet, it would still be good to have that 1%
because they allow much better further validation and correction, whereas
having a Wikipedia link is just a string of text that is much harder to
work with when cross-verifying with other sources.  And BTW, Wikidata is
far from perfect either - most of England frequently has incorrect admin
tree-structure, and should also be fixed - something that this work will
also help fix - win-win for everyone :)

On Fri, Nov 25, 2016 at 6:24 PM Martin Koppenhoefer <dieterdre...@gmail.com>
wrote:



sent from a phone

> Il giorno 25 nov 2016, alle ore 22:55, Yuri Astrakhan <
yuriastrak...@gmail.com> ha scritto:
>
> .  I am simply converting existing Wikipedia tag into the Wikidata tags,
because there is always a 1 to 1 matching between them,


you are checking individually and critically whether the osm objects fit to
the wikidata object definitions, or are you just adding wikidata tags for
wikipedia articles that are already linked from osm?

Afaik many wikidata objects are linked to several wikipedia articles
(because of wp articles being written in different languages). Using
wikipedia quite a bit in 3 languages I have found that inconsistencies
aren't that rare ("wrong" articles interlinked). Partly this is because wp
articles in different languages are mostly not translations but are
articles that have varying coverage and levels of detail and focus (i.e. a
wikidata object that fits onto an English article does not necessarily fit
on the German article that is linked to the English article). Some linked
articles are also simply wrong.

One example: In the field of geographic places and settlements it can occur
that socio-geographic places and political territorial entities are either
mixed in the same article or are split over different articles, and it
might also differ between languages (some languages might have 1 article
dealing with both, others might have 2 and more). Wikidata seems to have a
preference for administrative entities (not sure, it is just a first
impression) and related statements in all cases I have seen so fat (even
when there's a different object that also deals with the administrative
entity).

Misguided wikipedia tags are not very frequent in osm, but they do occur of
course. Blindly adding corresponding wikidata tags might make it look more
consistent even if the tag is wrong, because both tags seem to confirm each
other.

cheers,
Martin
_______________________________________________
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk

Reply via email to