Re: [OSM-talk] [Imports] Keeping imported data updated with source changes

Jo Sat, 10 Jan 2015 08:48:34 -0800

If you would go with adding ref. I'd use ref:xyz where xyz is something
which identifies who's foreign keys you are using.


For the ones that are wrong in the external source but double checked, you
could add

source=survey

or

note=resurveyed

Jo



2015-01-10 16:44 GMT+01:00 Jason Remillard <remillard.ja...@gmail.com>:

>  Hi Wiktor,
>
> I don't think an address tag is needed or desirable.
>
> The best way of doing this is to compare versions of the official data
> (perhaps every 6 months), making a list of things that have changed so
> that they can be examined in OSM.
>
> Of coarse the big issue is that the matching is not trivial. First
> devise a matching score combining of distance to address, and edit
> distance in the address name and number. These scores are the weights.
> Then use one of the weighted bipartite graph matching algorithm
> (augmented path) that works well on sparse data. If you keep the
> search radius down, the graph will be very sparse, so should be
> manageable. Using the match, you can get a list of nodes that have
> been moved, deleted, and edited in the official data set.
>
> Jason
>
> On Sat, Jan 10, 2015 at 4:59 AM, Wiktor Niesiobedzki <o...@vink.pl> wrote:
> > Hi,
> >
> > In Poland we have quite a few addresses imported from government
> > sources for quite long time, but as time goes on, changes are made to
> > the source databases, and local communities don't have any viable
> > tools, to track, what has changed in source. In case of city of
> > Skarżysko-Kamienna, local mapper tried hard to track all the changes
> > in source (as well as check this on site), but still, missed a lot of
> > changes, and as it's now - there is no tooling to help such users.
> >
> > What I'd like to do, is to prepare a service, that will generate
> > changes for OSM containing differences for each municipality, so local
> > mapper can load, review and decide what to import.
> >
> > But this tool, to be efficient, needs additional information to be
> > stored in OSM - identifier of the object in the source database, for
> > which i propose tag: ref:addr.
> >
> > This tag is used for both identifying what was already imported, as
> > well as, I'd like to create a protocol, that if there are some "wrong"
> > data in the import source, we would leave a point in OSM containing:
> > addr:ref
> > source:addr
> >
> > So we can instruct further imports, to skip this point, unless there
> > will be some change in source data.
> >
> > I find this solution most robust, as it gives great Signal-to-Noise
> > ratio for local mappers, when they are identifying what needs to be
> > updated, as well as, gives as resilience when someone accidentally
> > deletes some address.
> >
> > In Poland there thousands of people employed by government to keep
> > this data in good quality and using OSM community to duplicate their
> > work is in my opinion - wasteful. Using this method, we can use their
> > work, and use OSM community to improve the data, that government is
> > sourcing. And this is something we should consider for all of the
> > imports.
> >
> > We had some discussion about this already in Polish community, but as
> > it seems, it might be philosophical change for this project, I'd like
> > to raise this issue on global level.
> >
> > Apart from addresses I plan to start importing national heritage
> > objects, for which I see exactly the same problem.
> >
> > The other solution that we discussed in our community is to keep track
> > of import source state in separate database, and use this, to see what
> > has changed in source, to generate files for local mappers, but I see
> > following disadvantages of such solution:
> > - such solution doesn't take into account current state of objects in
> > OSM, what may generate duplicates or miss data, that were accidentally
> > deleted
> > - it makes harder to fork OSM project, as you need to fork two
> > databases, know about them, and the license for such database should
> > be open
> > - it still needs some "protocol" to this database, to mark that import
> > was done (and in what extent) - it would require additional tooling
> > and might be additional problem to causual mappers, and probably would
> > render the tool unusable
> > - it gives no tools for integrity with OSM databases
> > - needs additional support
> >
> >
> > The disadvantages of my solution, that I found most concerning were:
> > - nodes contaning only ref:addr and source:addr might be hard to
> > understand by newcomers, especially that ref:addr doesn't contain any
> > human-understandable data
> > - ref:addr might get clobbered during merge of nodes
> >
> > But I hope that with extensive description on Wiki we can handle that
> problems.
> >
> > Cheers,
> >
> > Wiktor Niesiobędzki
> >
> > _______________________________________________
> > talk mailing list
> > talk@openstreetmap.org
> > https://lists.openstreetmap.org/listinfo/talk
>
> _______________________________________________
> Imports mailing list
> impo...@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/imports
>

_______________________________________________
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk

Re: [OSM-talk] [Imports] Keeping imported data updated with source changes

Reply via email to