Frederik Ramm <frede...@remote.org> writes: > There are roughly 7.1 million nodes in Massachussetts, and 270k of them > share the same location as another node. This is just an analysis based > on location, not on tags, but it can be assumed that most of those 270k > nodes are not intentionally duplicate. It is possible that there are > duplicate ways as well. But it is not a big problem, it is something > that could be fixed in a day. > > I could help with this but I would need very clear instructions what to > look for, and what to do. Merging nodes may lead to duplicate ways that > share exactly the same nodes (these can probably be removed > automatically), but there might also be situations where you have one > way that uses the nodes A, B1, C1 and one way that uses B2, C2, D (with > B1 being at the same location as B2, and C1 at the same location as C2), > so after merging nodes you'd then end up with the non-identical ways > A,B,C and B,C,D... all this should be considered beforehand.
I have looked a bit more and have a proposal for an automated edit. I am trying to have this be as narrow as possible while still making progress. My proposal below intends to join up roads that were cleaved at town borders. (There is another source of duplicate nodes, which is the open-space database polygons. These duplicate nodes are not problematic, partly because one doesn't route on open space polygons, and partly because they are each in their own way that happen to touch. So maybe they should be merged at some point, but it's far less important.) First, an example: http://www.openstreetmap.org/?node=70786569 At this location is also node 66355413. This is the border of Stow and Maynard. In this case the road name and width changes on the ways (which matches reality). Each way just ends at the town border, and there is a pair of coincident nodes. Each of this duplicated node pair is the last node in a way, and is in only one way. ---------------------------------------- Find the set of duplicated nodes D, where each element d is a set of nodes at the same location. foreach d in D (CONTINUE starts back on the next d, even if nested) if the number of nodes in d > 2 CONTINUE foreach n in d if n has tags other than "attribution" or "source" CONTINUE if n.attribution does not match "Office of Geographic and Environmental Information (MassGIS)" CONTINUE if n.source does not regexp-match "^massgis_import_v0.1_[0-9]*" CONTINUE if n is not in exactly one way CONTINUE if n is not the end node in the way CONTINUE MERGE the two nodes in d, picking the value of source from the lower-numbered node. ---------------------------------------- I am quite confident this won't do anything harmful, and it would be very interesting to see how many of the 270k duplicate nodes (presumably 135k- locations) go away from this. Comments/analysis very welcome. You can easily find these by opening up part of mass in josm and selecting long ways. They typically end at town boundaries. Another node pair is 62178385 and 73732046.
pgpxD1GFwvfpo.pgp
Description: PGP signature
_______________________________________________ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us