Hi David, Great to hear you're interested in using neo4j for the OSM model. I also think it is a great match. However, you are right to assume there are a few missing pieces. Two shortcomings that are relevant to your questions below are:
- *Scalability*. We only recently tried to load very large OSM files, starting with Germany. There are two problems with this currently, one is that the graph model takes up too much disk space, and the second is the load performance degrades. The problem here is that the OSM file, despite being XML, is to some extent just a sequential dump of a number of postgis tables, the first being the point nodes, and only later the ways with foreign key references to the nodes. So we need an independent way to lookup the node-id's when loading the ways, and currently use a lucene index for this. The index works, but like all tree structures degrades in performance as the total index size increases. Peter has been investigating this, and is in the process of evaluating two options: - Switching off the batch-inserter. I refactored the OSMImporter to allow for importing with the normal GraphDatabaseService instead of the batch inserter, and Peter is trying this out for the performance of larger loads and incremental loads. - Using an index other than lucene. Peter is currently evaluating the BDB database for its exact match index which might perform better than lucene for the node-id lookup. - *Changesets*. We do not yet properly support changesets. In fact, the current code loads the OSM XML into a structure that still has some residual resemblances to the XML, for example we store the user, uid and changeset as properties of the nodes and ways they were attributes of in the XML. I have started refactoring this, and the plan is to make a two phase improvement: - Firstly structure users and changesets as a tree, with nodes and ways related to the changeset in the tree structure. This allows for analysis of the graph from the perspective of users and changesets. It also reduces the total disk-space used because the user, uid and changeset id are not duplicated in properties as they are today. I have already done part of this work on my computer, but not pushed it. I see database size reductions down to nearly 60% of previous, but I have not completed the new tree, so the size will go up again somewhat. - Secondly, once we have the changeset tree in place we can work on applying changes to the graph. As you requested in your email, we want to be able to apply the daily updates to an existing full OSM model. So, we have definitely thought about your specific requirements, but due to other priorities have not made much progress in completing these. I certainly welcome your feedback, and even help, in completing this work. I suggest we take a skype call to discuss this further. Regards, Craig On Thu, Feb 17, 2011 at 4:28 AM, David Winslow <cdwins...@gmail.com> wrote: > Hi all, > > My organization (OpenGeo) is investigating options for generating and > hosting map tiles based on OpenStreetMap data on Amazon AWS. We are > currently using OSM's osm2pgsql tool with a PostGIS database, GeoServer > with > SLD styles to render the data, and GeoWebCache to dice up the map into > tiles > and serve them from a filesystem cache. I'm interested in investigating > neo4j-spatial as an alternative to Postgres since the graph model seems to > fit OSM's data more cleanly than an RDBMS. To be clear, investigating > neo4j > is just a side project for me at present. I've played with neo4j-spatial > before, and I plan on getting my hands a bit dirty this weekend, but for > now > I have a few questions about it. > > 1) Has anyone attempted a full OSM planet import using neo4j-spatial? Any > tips on ensuring it goes smoothly (how much disk it is likely to require, > whether the full planet dump will fit in a neo4j 1.2 database, etc)? > 2) Is there any information available about neo4j performance on EC2? > 3) The rendering process divides up the OSM data into several classes which > are styled differently (roads/rivers/buildings/etc). I am aware that > neo4j-spatial can index sublayers based on property filters, but when I > last > checked the filter syntax used wasn't as flexible as I need for the > stylesheet I'm using. For my investigation this weekend I am thinking of > replacing the existing filter system with one based on CQL[1] to serialize > filters, does that seem like a bad idea? > 4) Is there any support for applying OSM's daily or minutely patches? > (From > a look at the code, I think the answer is no, so if not - how tough would > it > be to add? Are there any design docs or notes written up about implementing > that feature?) > > [1] CQL - http://docs.codehaus.org/display/GEOTOOLS/ECQL+Parser+Design > > Thanks in advance. > > -- > David Winslow > OpenGeo - http://opengeo.org/ > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user