the problem you see is GC trashing, the most CPU time is spent running GC since most of the heap is occupied by objects. A traverser keeps track of which nodes it has visited and for a big traversal that can be a problem. A better solution for you here would be to call:
startNode.getRelationships directly instead since iterating over relationships like that doesn't keep such memory. We also just created a new traversal framework which deals with this issue, among other things. 2010/4/29, Bhuvan <[email protected]>: > Hello, > > We are trying to explore Neo4j for a huge number of graph nodes and > relations. > Let's say there are about 6 million users across the world and 6 million > user address elements like postal-code/city/state/country etc. > Now I am trying to get all users in a given country which has about 3 > million users. What I found is that traverser returned about 0.6 million > nodes quickly and thereafter it slows down as shown below: > -------------------------- > INFO [2010-04-28 20:15:13,082] [test.TraversalTest] - Starting... > INFO [2010-04-28 20:15:39,030] [test.TraversalTest] – 100,000 > INFO [2010-04-28 20:15:41,734] [test.TraversalTest] – 200,000 > INFO [2010-04-28 20:15:44,022] [test.TraversalTest] – 300,000 > INFO [2010-04-28 20:15:51,353] [test.TraversalTest] – 400,000 > INFO [2010-04-28 20:15:53,433] [test.TraversalTest] – 500,000 > INFO [2010-04-28 20:15:55,721] [test.TraversalTest] – 600,000 > > INFO [2010-04-28 20:20:54,433] [test.TraversalTest] – 700,000 > INFO [2010-04-28 20:25:32,407] [test.TraversalTest] – 800,000 > INFO [2010-04-28 20:30:33,274] [test.TraversalTest] – 900,000 > INFO [2010-04-28 20:35:26,405] [test.TraversalTest] – 1,000,000 > INFO [2010-04-28 20:39:17,099] [test.TraversalTest] – 1,100,000 > INFO [2010-04-28 20:42:52,856] [test.TraversalTest] – 1,200,000 > INFO [2010-04-28 20:46:57,318] [test.TraversalTest] – 1,300,000 > INFO [2010-04-28 20:50:58,397] [test.TraversalTest] – 1,400,000 > INFO [2010-04-28 20:54:53,570] [test.TraversalTest] – 1,500,000 > -------------------------- > The number in the last of line above shows the returned node count after > every 100,000 nodes which is printed in the for-loop. > I used following traverser: > > Traverser traverser = startNode.traverse(Traverser.Order.BREADTH_FIRST, > StopEvaluator.DEPTH_ONE, > ReturnableEvaluator.ALL_BUT_START_NODE, > TestRelationshipType.HAS_COUNTRY, > Direction.INCOMING); > > where startNode above is country node to which users are related with > HAS_COUNTRY relation. > > My question is why it slows down in returning nodes after a while and if > there is something which can be done to avoid it? > > Thanks > Bhuvan > > > _______________________________________________ > Neo mailing list > [email protected] > https://lists.neo4j.org/mailman/listinfo/user > -- Mattias Persson, [[email protected]] Hacker, Neo Technology www.neotechnology.com _______________________________________________ Neo mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

