OK, I found out what's taking the time. It's iterating over the result set of a 
traverser:

            // visit each Route node, and add it to the array
            Traverser routes = graphDb.getReferenceNode().traverse(
                    Traverser.Order.BREADTH_FIRST,
                    StopEvaluator.DEPTH_ONE,
                    ReturnableEvaluator.ALL_BUT_START_NODE,
                    Relationships.ROUTE, Direction.OUTGOING);

            for (Node node : routes)
            {
                 // do stuff
            }


The 'for' loop takes ages. There are probably 2m nodes being returned by that 
traverser at the moment, and that's only a very small subset of the data I want 
to add to the database.

is there any way to tinker with the neo4j properties or anything to improve 
performance here?

Thanks


----- Original Message ----
> From: Mattias Persson <matt...@neotechnology.com>
> To: Neo4j user discussions <user@lists.neo4j.org>
> Sent: Sat, July 24, 2010 10:23:02 PM
> Subject: Re: [Neo4j] Batch inserter shutdown taking forever
> 
> 2010/7/21 Tim Jones <bogol...@ymail.com>
> 
> >  Hi,
> >
> > I'm using a BatchInserter and a LuceneIndexBatchInserter to  insert >5m
> > nodes and
> > >5m relationships into a graph in one  go. The insertion seems to work, but
> > shutting down takes forever - it's  been 2 hours now.
> >
> > At first, the JVM gave me garbage collection  exception, so I've set the
> > heap to
> > 2gb.
> >
> > 'top'  tells me that the application is still running:
> >
> >  PID  USER      PR  NI  VIRT  RES  SHR S %CPU  %MEM    TIME+  COMMAND
> >  9994 tim         17   0 2620m 2.3g 238m S 99.5 39.1 115:48.84 java
> >
> >  but checking the filesystem by running 'ls -l' a few times doesn't  
indicate
> > that
> > files are being updated.
> >
> > Is this  normal? Is there a way to improve performance?
> >
> 
> No, it sounds  quite weird. Any chance to have a look at your code?
> 
> 
> >
> > I'm  loading all my data in one go to ease creating the db - it's simpler to
> >  create it from scratch each time instead of updating an existing database  
-
> > so
> > ideally I don't want to break this job down into multiple  smaller jobs
> > (actually, this would be OK if performance was good, but I  ran into
> > problems
> > inserting data and retrieving existing  nodes).
> >
> 
> What kind of problems? could you supply code and  description of your
> problems?

Problems doing something similar in relational dbs. Also, the API recommends to 
optimise the batch search index before using it for lookups. I just decided not 
to take this approach.

> 
> 
> >
> > Thanks,
> >  Tim
> >
> >
> >
> >
> >
> >  _______________________________________________
> > Neo4j mailing  list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> 
> 
> 
> -- 
> Mattias Persson, [matt...@neotechnology.com]
> Hacker,  Neo Technology
> www.neotechnology.com
> _______________________________________________
> Neo4j  mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
> 


      

_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to