Hi, On Thu, Mar 29, 2012 at 8:21 AM, Ian Dees <ian.d...@gmail.com> wrote:
> On Thu, Mar 29, 2012 at 9:10 AM, Josh Doe <j...@joshdoe.com> wrote: > >> On Thu, Mar 29, 2012 at 9:45 AM, Ian Dees <ian.d...@gmail.com> wrote: >> > After loading Cook County TIGER road features and OSM linear features >> into >> > PostGIS, I ran a simple query to find how well the roads matched: >> > >> > SELECT a.name, b.fullname, ST_HausdorffDistance(a.geom, b.geom) as dist >> > FROM cook_tiger a, cook_osm b >> > WHERE (a.geom && b.geom) AND ST_HausdorffDistance(a.geom, b.geom) < >> > 0.0005 >> > LIMIT 50 >> > >> > This returned results that made sense (the names matched in all 50 >> results). >> > >> > I removed the LIMIT clause and let it run before going to work to see >> how >> > many of the TIGER records match existing OSM features. >> > >> > Next up is building a table of TIGER -> OSM matches and using that to >> find >> > TIGER rows that don't have a corresponding OSM feature. >> > >> > If anyone has any ideas for speeding this up I'd love to hear it. It >> took >> > well over a couple hours to run one county. There are a lot of counties >> in >> > the US. >> >> Very cool! To speed this up perhaps try limiting the number of times >> ST_HausdorffDistance is executed. First only run it for ways which are >> "close", such as falling inside a buffer, or even faster inside a >> bounding box. For a trivial speedup generate a table with distances >> first, then use the WHERE clause. However I have no idea how to form >> such queries! > > > The bounds overlap check (a.geom && b.geom) speeds things up drastically, > but because Cook County contains Chicago (which is very road-dense), I > imagine there are tons of HausdorffDistance calls that don't need to > happen. If I thought I was going to run this tons of times I could generate > a table of all possible hausdorff distances, but there would be a lot of > rows (if I remember my high school stats, it would be len(cook_tiger) * > len(cook_osm) rows). > > I may try switching to one of PostGIS's "overlap" or "touching" calls to > limit the number of calls even more, but I think I'd miss lots of possible > matches that way (if the roads are offset enough to not ever touch). > > _______________________________________________ > Talk-us mailing list > Talk-us@openstreetmap.org > http://lists.openstreetmap.org/listinfo/talk-us > > I'm going to look at this same problem for Salt Lake County just to see if any different issues arise for a different geography, and hope to provide some more input soon. Martijn -- martijn van exel geospatial omnivore 1109 1st ave #2 salt lake city, ut 84103 801-550-5815 http://oegeo.wordpress.com
_______________________________________________ Talk-us mailing list Talk-us@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-us