Re: [Talk-us] [Imports] City of Seattle imports

Paul Norman Thu, 06 Dec 2012 05:06:30 -0800

Apologies for the length, but there are quite a few points to "address," 
some of a specific nature and others more general. Because I'm replying 
to points across several messages and the formatting in this thread has 
become screwed up I'll be reformatting messages and re-ordering them so 
that they make more sense. Imports, as you can probably tell, are an 
area I care about.


Full disclosure: I maintain ogr2osm, the software used for converting 
geometries in the proposed import. See https://github.com/pnorman/ogr2osm 
for more information.

While I support address imports in principle it is important that they 
are done right. Back in 2010 I did an address import and although on 
the whole it was successful I learned some lessons.

Jeff Meyer @
http://lists.osm.org/pipermail/imports/2012-December/001602.html
> From this page: http://wiki.osm.org/wiki/Import/Guidelines#A_checklist, 
> can you advise as to the checklist steps we are not following, or which
steps 
> should be added to this checklist?

http://wiki.osm.org/wiki/Import/Guidelines#Discuss_import_with_community 
requires that consultation be done with imports@ and the appropriate local 
communities. This certainly includes talk-us@.

Jeff Meyer @
http://lists.osm.org/pipermail/imports/2012-December/001598.html
> The current plan is to focus on addresses and building outlines.
> Sources are currently suggested to be tagged as:
> source:addr=data.seattle.gov
> source:path=data.seattle.gov
> so as to accomodate other sourcing information for those points and ways,
as 
> appropriate.

The value of source=* tags on objects is debatable. Source tags on 
changesets are a very good idea but it's not clear if they're a good 
idea for objects. There are arguments either way. I am working on an as 
yet unproposed address import and in the initial version I will be 
proposing I will not be adding source tags to objects. If after 
consideration you do decide that source tags are worthwhile I would 
recommend just source=data.seattle.gov. This corresponds with the 
general practices for the use of source tags. Remember that source tags 
exist for the convenience of mappers and if they become inconvenient 
they are not worthwhile. Source tags that are long and cryptic do not 
help mappers. Some imports have used source tags where it necessary to 
look up the exact value of the source tag because it was so long and 
complicated.

I quickly found that it wasn't clear what to do with the source tags 
when editing the addresses, primarily to merge with buildings or POIs. 
If it wasn't clear to me, the importer, I'm pretty sure it was unclear 
to other editors.

Jeff Meyer @
http://lists.osm.org/pipermail/imports/2012-December/001602.html
> > How will you handle object conflation?
> Manually and methodically. 

Although not a trivial problem there is work underway on code that will 
handle the address-address conflation
(https://github.com/pnorman/addressmerge). 
Address-POI and address-building conflation remains a purely manual job. 

It shouldn't be too hard to merge addresses with buildings they are within 
when the building has only one address within in and the building does 
not itself have an address. Addresses placed by building doors outside 
the building itself add complications but I expect they are solvable.
Having said that it shouldn't be too hard, it's not trivial. 

Jeff Meyer @
http://lists.osm.org/pipermail/imports/2012-December/001602.html
> > Where is the source of your transformation scripts?
> From the email above: "some translation instructions Cliff has put
together." 
> We will include the specific translation code either at a github page or
on the 
> http://wiki.osm.org/wiki/Seattle page.
> > Where are the specific data files you're transforming?
> They are at data.seattle.gov & we will provide links to the sources. 
> We will also consider posting separate snapshots of these source 
> datafiles if we can figure out where to host them.

To be able to sensibly comment on the tagging and data quality we really 
need a sample .osm showing what the data is like. One option for hosting 
it is an account on the dev server. See
http://wiki.osm.org/wiki/Dev_Server_Account 
for more information on this. If this is a problem you could email me a 
file and I could host it on one of my servers.

Because I maintain ogr2osm I'm comfortable reading very complex 
translations and determining the results without actually running the 
code. Most people aren't and a .osm file is a good way for people to 
access the tagging and data quality. Another option for documenting 
tagging is an appropriate page on the wiki. When I was working on the 
Alaska county import I documented the tagging I would be using at 
http://wiki.osm.org/wiki/Alaska/TIGER_Counties#Tagging before I had 
written the translation file. The correct tagging for the county data 
was pretty obvious. It also gave me a chance to document the changeset
tagging.

Jeff Meyer @
http://lists.osm.org/pipermail/imports/2012-December/001606.html
> We identified these files and developed preliminary scripts a couple days
ago. 
> Here are a couple - these may not be the only sources considered for
import:
> https://data.seattle.gov/dataset/Master-Address-File/3vsa-a788
> https://data.seattle.gov/dataset/Street-Network-Database/afip-2mzr
> https://data.seattle.gov/dataset/2009-Building-Outlines/y7u8-vad7

I would recommend holding off on the streets data and restricting this to 
addresses and building outlines. Virtually every aspect of imports 
involving street data is significantly harder than other imports 
involving only buildings and addresses. I strongly recommend you start 
with this data as you so you gain some experience first. Even writing 
a good streets data translation is generally harder.

When you've finished with the addresses and buildings you could then 
propose a new import for the streets. We are also likely to have better 
tools for dealing with the streets data in the medium term.

Jeff Meyer @
http://lists.osm.org/pipermail/imports/2012-December/001606.html
> I'm not sure why you are asking if I'll be doing this myself. This 
> email thread contains several references to people who will be 
> assisting. Cliff Snow and others attending the meeting to discuss this 
> import. We hope to have a team of local mappers focus on imports where 
> they have local expertise.

CanVec, French cadastre and other imports where the data is made available 
to multiple people (e.g. through a website) show that without a QA process 
the quality of imports varies drastically from person to person and that 
you'll see some real bad imports. That's not to say that it can't work and 
in many ways it's a preferable process but one bad importer can easily 
create a mess in minutes that takes others ages to clean up. One way to 
mitigate this is to do as much post-processing as possible before releasing 
the files. In the context of address this would mean removing the addresses 
that duplicate existing OSM data or for buildings removing buildings that 
intersect existing buildings in OSM.

As the word count is telling me this message is over 1000 words I will 
leave aside some concerns I have with updating and conflation for another 
message later.


_______________________________________________
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us

Re: [Talk-us] [Imports] City of Seattle imports

Reply via email to