I'd like to add a few words on the topic.

First - the state of the OSM and sources in Poland:
1. Data sources and OSM use different standard of naming of the
streets and not always it can be mapped automatically
2. Addresses change - some addresses change cities, some change
streets, some gain streets, change numbers and so on.
3. Some of the sources contain quite a lot of "wrong addresses"
(counted in houndreds per municipiality).

Scripts what I've wrote, currently use herustics to merge the data
(you can review them here:
https://github.com/wiktorn/osm-addr-tools/blob/master/merger.py). But
during imports there are some cases which the heurestics will not
solve:
1. Addresses that exists in source, but are mistakes. We do report
them to the source, but response time is not satisfying. I've seen
more than a handfull of buildings which where given address by more
than one municipiliaty (and each one gave different address). This can
be solved by local mapper, but I'd like to provide this mapper with
tool, so in future updates, we will not ask him to check again the
same point.
2. Sometimes address points are misplaces - and local mapper can move
them to correct place. It brakes the heuristics (as there is no way
for import script to know, whether the point was mislocated by
previous import, mislocated by mapper, or moved).
3. Addresses change. Currently there is no way to isolate the
situation, that a point in OSM needs a change of street name, city
name or housenumber, because this name changed has changed in the
source. I can't mark all the points with different names, because 80%
of time, in OSM is proper name, but I don't want to loose that 20%

The final result is, that when you have a source with a lot of
corrections in OSM, then you need to review these corrections every
time when you import. For medium city I've checked recently, there was
~1000 points for verification, quite a lot, when there was only 100
actually to be added/updated and discourages users from doing the
updates. Storing additional data (~350MB uncompressed - 50 bytes per
tag, 7M address points) I find small cost versus labour cost during
updates for upcoming years.


Cheers,

Wiktor

_______________________________________________
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk

Reply via email to