Hi, I understand the graph layout as this:
(publisher node)<--PUBLISHED_BY---(book)<--BORROWED_BY--(student) There are other relationships between books and students (RETURNED_BY,RESERVED_BY) but the relationship count on a book node will still be low compared to the publisher node. Correct? On a machine with 8GB RAM and that graph calculating "amount of borrowed books" for a publisher or "get amount of books never borrowed" for a publisher should execute in under 100ms given your traversal speed is 300+ hops/ms (in 30ms with 1000 hops/ms). Normal cached traversal speed on any modern hardware should be 1000+ hops/ms. The poor performance when performing the depth two traversal to get the amount of borrowed relationships on a publisher is indeed very strange. A possible reason could be that the traverser is traversing more than it should. For example if you create a traverser that traverses the PUBLISHED_BY and BORROWED_BY relationships in both directions you would traverse (publisher)<-(book)<-(student)->(other book)->(other or same publisher)<-(other book)... and so on (and that would take time). Also make sure you are not measuring number of "borrowed books traversal speed" over all publishers instead of a single publisher (retrieving borrowed books from all publishers with a low borrowed ratio would result in a low "borrowed books/ms" value). Could you write a simple traverser using the core API that calculates the relationship count at depth two given a publisher node. Something like: Node publisher = // get a publisher node int count = 0; for ( Relationship bookRel : publisher.getRelationships( RelTypes.PUBLISHED_BY ) ) { count++; Node book = bookRel.getOtherNode( publisher ); for ( Relationship studentRel : book.getRelationships( RelTypes.BORROWED_BY ) ) { count++; } } Avg count and time to perform that operation wold be interesting. If timings are not in expected range (basically 15k [avg number of rels/publisher] x 2 [the borrowed_by rel] x traversal speed) try get the total relationship count by replacing getRelationships(type) with getRelationships() to get worst case scenario on how much traversing needs to be done. Regards, -Johan On Sun, May 16, 2010 at 4:08 AM, suryadev vasudev <suryadev.vasu...@gmail.com> wrote: > Here is a rough design and volume > > ... > > When I first read about traversal speed of 1000-3000/millisecond, I added > some buffer and assumed 500/millisecond as a realistic speed. I am not > giving up so easily after seeing 1/millisecond. I look forward to responses > from other users. > > The real challenges will be around queries for a publisher. A publisher will > have around 15,000 books and a query like "Given a published ID, what > percentage of his books were never borrowed" will need full browsing. My > hope was that I could browse through and get the answer in 30 milliseconds. > But it looks like it will take a minimum of 15 seconds. > > Some publishers will have 50,000 books and I can't imagine a response time > of 50 seconds. > > So, I have to achieve at least 500/millisecond if not the original 1000. > > Regards > SDev _______________________________________________ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user