I'd like to add a few words on the topic. First - the state of the OSM and sources in Poland: 1. Data sources and OSM use different standard of naming of the streets and not always it can be mapped automatically 2. Addresses change - some addresses change cities, some change streets, some gain streets, change numbers and so on. 3. Some of the sources contain quite a lot of "wrong addresses" (counted in houndreds per municipiality).
Scripts what I've wrote, currently use herustics to merge the data (you can review them here: https://github.com/wiktorn/osm-addr-tools/blob/master/merger.py). But during imports there are some cases which the heurestics will not solve: 1. Addresses that exists in source, but are mistakes. We do report them to the source, but response time is not satisfying. I've seen more than a handfull of buildings which where given address by more than one municipiliaty (and each one gave different address). This can be solved by local mapper, but I'd like to provide this mapper with tool, so in future updates, we will not ask him to check again the same point. 2. Sometimes address points are misplaces - and local mapper can move them to correct place. It brakes the heuristics (as there is no way for import script to know, whether the point was mislocated by previous import, mislocated by mapper, or moved). 3. Addresses change. Currently there is no way to isolate the situation, that a point in OSM needs a change of street name, city name or housenumber, because this name changed has changed in the source. I can't mark all the points with different names, because 80% of time, in OSM is proper name, but I don't want to loose that 20% The final result is, that when you have a source with a lot of corrections in OSM, then you need to review these corrections every time when you import. For medium city I've checked recently, there was ~1000 points for verification, quite a lot, when there was only 100 actually to be added/updated and discourages users from doing the updates. Storing additional data (~350MB uncompressed - 50 bytes per tag, 7M address points) I find small cost versus labour cost during updates for upcoming years. Cheers, Wiktor _______________________________________________ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk