----- Original Message ----
> From: Mattias Persson <matt...@neotechnology.com>
> To: Neo4j user discussions <user@lists.neo4j.org>
> Sent: Tue, July 27, 2010 8:27:24 PM
> Subject: Re: [Neo4j] Batch inserter shutdown taking forever
>
> Since you're doing a depth 1 "traversal" please use something like this
> instead:
>
> for ( Relationship rel : graphDb.getReferenceNode().getRelationships(
> Relationships.ROUTE, Direction.OUTGOING ) )
> {
> Node node = rel.getEndNode();
> // Do stuff
> }
>
> Since a traverser keeps more memory than a simple call to getRelationships.
> Another thing, are you doing any write operation in that for-loop of yours?
> Also do you shut down the batch inserter and start a new
> EmbeddedGraphDatabase to traverse on, or how do you get a hold of the
> graphDb?
Yes, I shut down the batch inserter and instantiate a new EmbeddedGraphDatabase
to run these operations on. The only thing I do in the loop is update an
attribute on the nodes.
I've changed my approach a little bit now. All of the Route nodes were related
to the reference node, but also to Page nodes - now I use a lookup service to
retrieve all Page nodes, and then traverse to a depth of 1 on each of these
returned nodes. Performance is better - it takes about an hour now to update 3m
nodes like this. I think I'll stick with this because it'll scale better than
the first method I was using (I'm basically removing duplicate nodes based on
an
attribute, so I need to build an in-memory look-up table to recognise whether
I've seen a particular node before). I'll change the traverser like you suggest
and see if this improves.
Thanks
>
> 2010/7/26 Tim Jones <bogol...@ymail.com>
>
> > OK, I found out what's taking the time. It's iterating over the result set
> > of a
> > traverser:
> >
> > // visit each Route node, and add it to the array
> > Traverser routes = graphDb.getReferenceNode().traverse(
> > Traverser.Order.BREADTH_FIRST,
> > StopEvaluator.DEPTH_ONE,
> > ReturnableEvaluator.ALL_BUT_START_NODE,
> > Relationships.ROUTE, Direction.OUTGOING);
> >
> > for (Node node : routes)
> > {
> > // do stuff
> > }
> >
> >
> > The 'for' loop takes ages. There are probably 2m nodes being returned by
> > that
> > traverser at the moment, and that's only a very small subset of the data I
> > want
> > to add to the database.
> >
> > is there any way to tinker with the neo4j properties or anything to improve
> > performance here?
> >
> > Thanks
> >
> >
> > ----- Original Message ----
> > > From: Mattias Persson <matt...@neotechnology.com>
> > > To: Neo4j user discussions <user@lists.neo4j.org>
> > > Sent: Sat, July 24, 2010 10:23:02 PM
> > > Subject: Re: [Neo4j] Batch inserter shutdown taking forever
> > >
> > > 2010/7/21 Tim Jones <bogol...@ymail.com>
> > >
> > > > Hi,
> > > >
> > > > I'm using a BatchInserter and a LuceneIndexBatchInserter to insert >5m
> > > > nodes and
> > > > >5m relationships into a graph in one go. The insertion seems to work,
> > but
> > > > shutting down takes forever - it's been 2 hours now.
> > > >
> > > > At first, the JVM gave me garbage collection exception, so I've set
> > the
> > > > heap to
> > > > 2gb.
> > > >
> > > > 'top' tells me that the application is still running:
> > > >
> > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > > > 9994 tim 17 0 2620m 2.3g 238m S 99.5 39.1 115:48.84 java
> > > >
> > > > but checking the filesystem by running 'ls -l' a few times doesn't
> > indicate
> > > > that
> > > > files are being updated.
> > > >
> > > > Is this normal? Is there a way to improve performance?
> > > >
> > >
> > > No, it sounds quite weird. Any chance to have a look at your code?
> > >
> > >
> > > >
> > > > I'm loading all my data in one go to ease creating the db - it's
> > simpler to
> > > > create it from scratch each time instead of updating an existing
> > database
> > -
> > > > so
> > > > ideally I don't want to break this job down into multiple smaller
jobs
> > > > (actually, this would be OK if performance was good, but I ran into
> > > > problems
> > > > inserting data and retrieving existing nodes).
> > > >
> > >
> > > What kind of problems? could you supply code and description of your
> > > problems?
> >
> > Problems doing something similar in relational dbs. Also, the API
> > recommends to
> > optimise the batch search index before using it for lookups. I just decided
> > not
> > to take this approach.
> >
> > >
> > >
> > > >
> > > > Thanks,
> > > > Tim
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > Neo4j mailing list
> > > > User@lists.neo4j.org
> > > > https://lists.neo4j.org/mailman/listinfo/user
> > > >
> > >
> > >
> > >
> > > --
> > > Mattias Persson, [matt...@neotechnology.com]
> > > Hacker, Neo Technology
> > > www.neotechnology.com
> > > _______________________________________________
> > > Neo4j mailing list
> > > User@lists.neo4j.org
> > > https://lists.neo4j.org/mailman/listinfo/user
> > >
> >
> >
> >
> >
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
>
>
>
> --
> Mattias Persson, [matt...@neotechnology.com]
> Hacker, Neo Technology
> www.neotechnology.com
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user