Re: [Neo4j] More spatial questions

Craig Taverner Sun, 19 Jun 2011 14:47:55 -0700

Hi Nolan,

I think I can answer a few of your questions. Firstly, some background. The
graph model of the OSM data is based largely on the XML formated OSM
documents, and there you will find 'nodes', 'ways', 'relations' and 'tags'
each as their own xml-tag, and as a consequence each will also have their
own neo4j-node in the graph. Another point is that the geometry can be based
on one or more nodes or ways, and so we always create another node for the
geometry, and link it to the osm-node, way or relation that represents that
geometry.


What all this boils down to is that you cannot find the tags on the geometry
node itself. You cannot even find the location on that node. If you want to
use the graph model in a direct way, as you have been trying, you really do
need to know how the OSM data is modeled. For example, for a LineString
geometry, you would need to traverse from the geometry node to the way node
and finally to the tags node (to get the tags). To get to the locations is
even more complex. Rather than do that, I would suggest that you work with
the OSM API we provided with the OSMLayer, OSMDataset and OSMGeometryEncoder
classes. Then you do not need to know the graph model at all.

For example, OSMDataset has a method for getting a Way object from a node,
and the returned object can be queried for its nodes, geometry, etc.
Currently we provide methods for returning neo4j-nodes as well as objects
that make spatial sense. One minor issue here is the ambiguity inherent in
the fact that both neo4j and OSM make use of the term 'node', but for
different things. We have various solutions to this, sometimes replacing
'node' with 'point' and sometimes prefixing with 'osm'. The unit tests in
TestsForDocs includes some tests for the OSM API.

My first goal is to find the nearest OSM node to a given lat, lon. My
> attempts seem to be made of fail thus far, however. Here's my code:
>

Most of the OSM dataset is converted into LineStrings, and what you really
want to do is find the closest vertex of the closest LineString. We have a
utility function 'findClosestEdges' in the SpatialTopologyUtils class for
that. The unit tests in TestSpatialUtils, and the testSnapping() method in
particular, show use of this.

My thinking is that nodes should be represented as points, so I can't
> see why this fails. When I run this in a REPL, I do get a node back. So
> far so good. Next, I want to get the node's tags. So I run:
>

The spatial search will return 'geometries', which are spatial objects. In
neo4j-spatial every geometry is represented by a unique node, but it is not
required that that node contain coordinates or tags. That is up to the
GeometryEncoder. In the case of the OSM model, this information is
elsewhere, because of the nature of the OSM graph, which is a highly
interconnected network of points, most of which do not represent Point
geometries, but are part of much more complex geometries (streets, regions,
buildings, etc.).

n.getSingleRelationship(OSMRelation.TAGS, Direction.INCOMING)
>

The geometry node is not connected directly to the tags node. You need two
steps to get there. But again, rather than figure out the graph yourself,
use the API. In this case, instead of getting the geometry node from the
SpatialDatabaseRecord, rather just get the properties using getPropertyNames
and getProperty(String). This API works the same on all kinds of spatial
data, and in the case of OSM data will return the TAGS, since those are
interpreted as attributes of the geometries.

n.getSingleRelationship(OSMRelationship.GEOM,
> Direction.INCOMING).getOtherNode(n).getPropertyKeys
> I see what appears to be a series of tags (oneway, name, etc.) Why are
> these being returned for OSMRelation.GEOM rather than OSMRelation.TAGS?
>

These are not the tags. Now you have found the node representing an OSM
'Way'. This has a few properties on it that are relevant to the way, the
name, whether the street is oneway or not, etc. Sometimes these are based on
values in the tags, but they are not the tags themselves. This node is
connected to the geometry node and the tags node, so you were half-way there
(to the tags that is). You started at the geometry node, and stepped over to
the way node, and one more step (this time with the TAGS relationship) would
have got you to the tags.

But again, I advise against trying to explore the OSM graph by itself. As
you have already found, it is not completely trivial. What you should have
done is access the attributes directly from the search results.

Additionally, I see the property way_osm_id, which clearly isn't a tag.
> It would also seem to indicate that this query returned a way rather
> than a node like I'd hoped. This conclusion is further born out by the
> tag names. So clearly I'm not getting the search correct. But beyond
> that, the way being returned by this search isn't close to the lat,lon I
> provided. What am I missing?
>

The lat/long values are quite a bit deeper in the graph. In the case of
'ways', we have a chain of nodes that run from the first to the last node of
the way. Each of these nodes has a relationship to another node that
contains the location. The reason for the intermediate nodes is because the
location nodes can exist in multiple ways. Needless to say it is a bit
complex to traverse all this completely manually as you are trying.

Another complication is that most points in the OSM model are not exposed as
Point Geometries in the spatial index. This is because most of them are
intended as parts of bigger geometries. For example, if someone created a
lake in OSM, made of 100 points in a polygon, those 100 points would not be
indexed in the spatial index, but the Polygon would be. So using the spatial
index to find points like this will not work. Only points that are tagged
individually will appear in the spatial index. We have some rules in the
importer to decide if points that are part of bigger geometries can also be
indexed by themselves. Usually the presence of user tags is good enough for
this. Needless to say, very few points have their own tags, but the ways
they belong to will have tags.

So it you are interested in only points of interest, then using the index to
search for points will work. If you are interested in all OSM nodes,
including those that are only vertices of ways, then use the
SpatialTopologyUtils.findClosestEdges(Point,Layer) method.

As an aside, in looking at the latest OSM import testcase, it seems like
> the batch inserter may now be optional. Is this true, and what
> benefits/disadvantages are there to its use? I tried importing the Texas
> OSM data on my fairly powerful laptop, but gave up after 12 hours and
> 170000 way imports (I think there are over a million in that dataset.)
> Other geospatial formats seem to do the import in a matter of hours, but
> this import seemed like it'd go on for days if I let it.
>

There is a problem with the lucene index. Well, not specifically lucene, but
a problem with using a general purpose index to map between OSM-ways and
OSM-nodes. This problem really affects the scalability of the importer. It
certainly slows down a lot for larger datasets. We have investigated various
other solutions. Peter has tried BDB and other external indexes, and I
started writing a HashMap/Array based index. I did not finish that, but have
seen that Chris Gioran has now started a similar (and apparently more
complete / better) solution at https://github.com/digitalstain/BigDataImport.
I would like to try it out when I get a chance. The issue we have scaling
the OSM import is not unique, so a solution like this will probably help
many people.
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] More spatial questions

Reply via email to