I did some initial work on incremental imports back in 2010, but stopped
due to some complications:

   - We needed to mix lucene reads and writes during the import (read to
   check if the node already exists, so we don't import twice) and this
   performs very badly in the batch inserter. We decided to first code a
   non-batch insert mode before re-starting the incremental import work. Now
   Peter and I did code a non-batch importer in early 2011, but never went
   back to complete the incremental import.
   - We wanted to support both the case of importing multiple OSM files
   that could be stitched together by resolving overlaps, as well as the case
   of applying changesets to the existing OSM model. This increased the
   complexity of the work just enough to ensure it got dropped. In early 2011
   we also added support to changesets in the model (but only as a data
   structure, not in terms of importing changesets). So we are one step closer
   to this also.

Since we now have non-batch importing, and changeset data structures, the
opportunity to re-start the incremental import and importing changesets is
there. It should not be too hard.

For incremental imports, stitching osm files together, we re-activate the
old code that tests the lucene index before adding nodes and relations.
There might be some subtle edge cases to consider, but a set of tests with
overlapping and non-overlapping osm files should flush them out.

For applying changesets, more thinking is still required. Do we want to
support history in the model, or only the latest version? Should we verify
that only newer changesets are applied and in the right order, or rely on
the user to get it right?

I can say that we did some thinking this summer on the data structures
required to support a complete change history. This relies on the fact that
we already support multiple possible ways on the same nodes, so we can
also, in principle, support multiple possible 'versions' of ways on the
same nodes. More thinking is required, but I have a suspicion that we
should actually go ahead and do this properly will full history, because
that might be the only way to make sure the user never messes things up by
importing in the wrong order.

On Tue, Nov 22, 2011 at 9:58 AM, Peter Neubauer <
peter.neuba...@neotechnology.com> wrote:

> Gregory,
> incremental loads (and thus, restarts of OSM imports) are a feature we
> want to add later on, but it's not in there yet. This would also mean
> we could stitch in other areas on demand, and support submitting
> changesets back to OSM or at least capture them, so you as an OSM
> based app can contribute to OSM automagically.
>
> I know it's much to ask, but help here would be greatly appreciated. I
> hope to lab with Michael Hunger on import of data into OSM (and
> others) this Friday and hope to get somewhere :)
>
> Cheers,
>
> /peter neubauer
>
> GTalk:      neubauer.peter
> Skype       peter.neubauer
> Phone       +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter      http://twitter.com/peterneubauer
>
> http://www.neo4j.org              - NOSQL for the Enterprise.
> http://startupbootcamp.org/    - Ă–resund - Innovation happens HERE.
>
>
>
> On Tue, Nov 22, 2011 at 7:15 AM, grimace <macegh...@gmail.com> wrote:
> > I've been playing with OSMImporter; tried batch and native java.  I've
> had
> > mixed success trying to import the planet, but since it's of considerable
> > size, the job usually blows up or grinds to a halt about half way.  I
> think
> > the most I've made it to is 651M nodes and that's not even the ways or
> > relations.   I just don't know enough about it and thought I would ask
> > before I try to dive in to it, but what would I have to do to so that I
> > could restart the job ( where it left off ) when it blows?
> >
> > --
> > View this message in context:
> http://neo4j-community-discussions.438527.n3.nabble.com/OSMImporter-Is-there-a-way-to-do-incremental-imports-tp3526941p3526941.html
> > Sent from the Neo4j Community Discussions mailing list archive at
> Nabble.com.
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to