[Talk-ca] Merging and Bounding Box Splitting Yan's NLFLOW Data

Adam Dunn Fri, 06 Aug 2010 18:38:42 -0700

The Players:
Okay, I admit it, I'm not much of a bash scripting person. Actually, I'm not
much of a shell scripting person. Everytime I need to use sed or awk, I
spend a lot of time Googling to remember correct syntax. So I've written a
script, but it doesn't quite work, and I'm hoping lazyweb (read second
paragraph of http://en.wikipedia.org/wiki/Lazyweb if you've never heard of
lazyweb) can help out.


Yan Morin has done a wonderful job of taking the NHN data and converting
NLFLOW and WATERBODY over to OSM format [
http://osm.progysm.com/rncan/geobase/nhn_rhn]. Unfortunately, the tool he
used "bins" ways at seemingly random instead of geographically (the purpose
of the binning is to write out multiple files that are smaller). Those
people who have done importing of nhn flow will know what I'm talking about.
So I wrote a script that will take some RWeait sed-foo and combine it with
Osmosis to merge all the nhn files and then split/bin them back out using
geographical bounding boxes.

The Set-Up:
Download Osmosis 0.35 from [http://wiki.openstreetmap.org/wiki/Osmosis].
I've written the script to use 0.35 because that's the last release that had
official support for API 0.5 files. You could use Osmosis 0.35.1, but things
are less guaranteed, I guess. Unzip Osmosis into your home directory, so
that the osmosis executable can be found at ~/osmosis-0.35/bin/osmosis.
Create a new directory somewhere else in your home folder (perhaps
~/osmfiles/nhn/08HE0X1 or maybe ~/nhnfiles/Tahsis) to store all the files
for a single water basin from Yan's server. Download *all* the files for a
single water basin into that directory. Download the script (attached) and
put it anywhere on your computer, and make it executable (chmod u+x
nlflowrebox.bash).

The Hook:
While in the directory that contains all (and only) the files for one basin,
run the following command:
~/path/to/nlflowrebox.bash
You'll get a bunch of files, named similar to Split_49.58_-126.00.osm, which
is the lat/long of the bottom-right corner of the bounding box for that
file. Start importing using Josm. You'll also get a _COMPLETE.osm file, and
if the area is really small, or you have lots of ram, you could just use
this instead of the bounded files.

The Shut-Out:
Watch it fail. At least on my computer. If it works on your computer, you're
very lucky. For some reason, I keep getting errors about how it can't find
the osmosis executable, or how osmosis can't find the file ("But it's right
there!"). If anyone knows how to fix the problem, or how to improve the
script in other ways, I'm open to suggestions or patch files (preferably
patch files).

The Wire:
There's a way around the problem though. I've set up the script to print out
the two commands that it's attempting to run. So all you have to do is run
the command once, then copy the part that says "Merge command is
~/osmosis-0.35/bin/osmosis --read-xml-0.5 ........ --write-xml
file="NHN_NLFLOW_COMPLETE.osm" (only don't copy the "Merge command is", just
copy the "~/osmosis...") and paste it into the command line and run it. This
will create the fully merged "complete" file. Then you run the
nlflowrebox.bash command again and copy the "Split Command is
~/osmosis-0.35/bin/osmosis --read-xml .....--write-xml
file=Split_49.98_-126.80.osm" (only don't get the part that says "Split
Command is", just the stuff after), and paste that into your command line
and run it. You'll get all the split files for your importing pleasures.

The Sting:
There's still some minor problems; dealing with duplicate nodes. You'll find
that duplicate nodes are still there where ever two or more streams
intersect. Either fix these using Josm validator or send me a .patch. Also,
any way that crosses a bounding box boundary will appear in both bounding
boxes, so be careful when importing along file edges.

The Tale:
Here's what I really would've like to see for a splitting/binning system
such as this: topologically aware binning. Hydro flow data is almost an
acyclic n-ary tree graph (not quite acyclic, but close). It would be nice if
there was a binning system that could perform a "reverse depth-first search"
and add ways to a file starting from a leaf way and slowly adding siblings
and parents until a max number of ways is reached. Then it continues by
adding into a second file, and a third file, etc. This would be better at
binning things like rivers/creeks. I don't think such a system exists for
OSM, and I certainly don't have the gis skills to pull it off.

Chapter titles are courtesy of a Newman/Redford/Shaw movie (though slightly
reorganized).

Adam

nlflowrebox.bash
Description: Binary data

_______________________________________________
Talk-ca mailing list
Talk-ca@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-ca

[Talk-ca] Merging and Bounding Box Splitting Yan's NLFLOW Data

Reply via email to