Nolan,
will try to set it up and report back. We are thinking of moving away from
the batchinserter, as new performance improvements in Neo4j might render the
batchinserter unnecessary, and the speedup is not that great. It is mostly
Lucene being the bottleneck and configuration around that, which could even
be the case here.

Mattias, could there be an OOME when loading much stuff into the
LuceneBatchInserter?

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://startupbootcamp.org/    - Ă–resund - Innovation happens HERE.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.


On Fri, May 13, 2011 at 8:25 AM, Nolan Darilek <no...@thewordnerd.info>wrote:

> I'm importing the dataset for Texas. My version is a few weeks old, but
> you can find the newest here:
>
>
> http://downloads.cloudmade.com/americas/northern_america/united_states/texas/texas.osm.bz2
>
> My import code, more or less, let me know if you need more
> implementation details:
>
> class Neo4jImport(filename:String, layer:String = "map") extends Import {
>
>   val importer = new OSMImporter(layer)
>
>   private var processed = 0
>
>   def processedEntities = processed
>
>   private def ds = dataset.asInstanceOf[Neo4JDataSet]
>   private def database = ds.database
>
>   class MyBatchInserter extends BatchInserterImpl(database.getStoreDir) {
>
>     override def createNode(properties:JMap[String, Object]) = {
>       processed += 1
>       super.createNode(properties)
>     }
>
>     override def createNode(id:Long, properties:JMap[String, Object]){
>       super.createNode(id, properties)
>       processed += 1
>     }
>
>     override def createRelationship(n1:Long, n2:Long,
> rt:RelationshipType, properties:JMap[String, Object]) = {
>       processed += 1
>       super.createRelationship(n1, n2, rt, properties)
>     }
>
>   }
>
>   def performImport() {
>     database.shutdown()
>     val batchInserter = new MyBatchInserter
>     importer.importFile(batchInserter, filename)
>     batchInserter.shutdown()
>     ds.init(true)
>     importer.reIndex(database, 1000)
>   }
>
> }
>
> Console output:
>
> Fri May 13 10:22:20 CDT 2011: Saving node 6525309
> (13713.904715468341 node/second)
> Fri May 13 10:22:21 CDT 2011: Saving node 6539916
> (13703.333682556313 node/second)
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid13506.hprof ...
> Heap dump file created [1426787760 bytes in 30.001 secs]
> scala.actors.Actor$$anon$1@764e2837: caught
> java.lang.IllegalStateException: this writer hit an OutOfMemoryError;
> cannot flush
> java.lang.IllegalStateException: this writer hit an OutOfMemoryError;
> cannot flush
>     at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3307)
>     at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3296)
>     at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2376)
>     at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2352)
>     at
>
> org.neo4j.index.impl.lucene.LuceneBatchInserterIndex.closeWriter(LuceneBatchInserterIndex.java:279)
>
>     at
>
> org.neo4j.index.impl.lucene.LuceneBatchInserterIndex.shutdown(LuceneBatchInserterIndex.java:354)
>
>     at
>
> org.neo4j.index.impl.lucene.LuceneBatchInserterIndexProvider.shutdown(LuceneBatchInserterIndexProvider.java:145)
>
>     at
>
> org.neo4j.gis.spatial.osm.OSMImporter$OSMBatchWriter.finish(OSMImporter.java:1144)
>
>     at
> org.neo4j.gis.spatial.osm.OSMImporter.importFile(OSMImporter.java:1320)
>     at
> org.neo4j.gis.spatial.osm.OSMImporter.importFile(OSMImporter.java:1219)
>     at
> org.neo4j.gis.spatial.osm.OSMImporter.importFile(OSMImporter.java:1215)
>     at
>
> info.hermesnav.core.model.data.impl.neo4j.Neo4jImport.performImport(neo4j.scala:54)
>
>     at
>
> info.hermesnav.core.model.data.Import$$anonfun$start$1.apply$mcV$sp(data.scala:25)
>
>     at scala.actors.Actor$$anon$1.act(Actor.scala:135)
>     at scala.actors.Reactor$$anonfun$dostart$1.apply(Reactor.scala:222)
>     at scala.actors.Reactor$$anonfun$dostart$1.apply(Reactor.scala:222)
>     at scala.actors.ReactorTask.run(ReactorTask.scala:36)
>     at
>
> scala.concurrent.forkjoin.ForkJoinPool$AdaptedRunnable.exec(ForkJoinPool.java:611)
>
>     at
> scala.concurrent.forkjoin.ForkJoinTask.quietlyExec(ForkJoinTask.java:422)
>     at
>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.mainLoop(ForkJoinWorkerThread.java:340)
>
>     at
>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:325)
>
>
>
> On 05/13/2011 09:34 AM, Peter Neubauer wrote:
> > Nolan,
> > do you have the importing code and what dataset are you importing? Also,
> do
> > you have any console output? It could be very big transactions or other
> > database settings not adjusted to the size of your import ...
> >
> > Cheers,
> >
> > /peter neubauer
> >
> > GTalk:      neubauer.peter
> > Skype       peter.neubauer
> > Phone       +46 704 106975
> > LinkedIn   http://www.linkedin.com/in/neubauer
> > Twitter      http://twitter.com/peterneubauer
> >
> > http://www.neo4j.org               - Your high performance graph
> database.
> > http://startupbootcamp.org/    - Ă–resund - Innovation happens HERE.
> > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
> >
> >
> > On Fri, May 13, 2011 at 4:13 AM, Nolan Darilek<no...@thewordnerd.info
> >wrote:
> >
> >> Picking up my slow port to Neo4j Spatial again, and am hitting an
> >> out-of-memory error when trying to import large datasets. Given that
> >> this code works fine if I use a different database and swap out the
> >> implementations, I suspect Neo4j as the issue. This is Neo4j
> >> 1.4-SNAPSHOT and Spatial 0.6-SNAPSHOT.
> >>
> >> Not sure if this is enough to diagnose the issue, but I have a heap
> dump:
> >>
> >> http://dl.dropbox.com/u/147071/java_pid7405.hprof.bz2
> >>
> >> It's currently uploading to Dropbox, so maybe grab it in an hour or two.
> >> It's something like 185M compressed, 1.5G uncompressed.
> >>
> >> Thanks, please let me know if I might provide any more details.
> >> The exact error was "this writer hit an OutOfMemoryError; cannot flush",
> >> at org.apache.lucene.IndexWriter.doFlush():3307. Not sure how relevant
> >> the stacktrace is, as my experience with OutOfMemoryErrors is that the
> >> code just fails wherever.
> >> _______________________________________________
> >> Neo4j mailing list
> >> User@lists.neo4j.org
> >> https://lists.neo4j.org/mailman/listinfo/user
> >>
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
>
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to