Hi,

I understand the graph layout as this:

(publisher node)<--PUBLISHED_BY---(book)<--BORROWED_BY--(student)

There are other relationships between books and students
(RETURNED_BY,RESERVED_BY) but the relationship count on a book node
will still be low compared to the publisher node. Correct?

On a machine with 8GB RAM and that graph calculating "amount of
borrowed books" for a publisher or "get amount of books never
borrowed" for a publisher should execute in under 100ms given your
traversal speed is 300+ hops/ms (in 30ms with 1000 hops/ms). Normal
cached traversal speed on any modern hardware should be 1000+ hops/ms.

The poor performance when performing the depth two traversal to get
the amount of borrowed relationships on a publisher is indeed very
strange.

A possible reason could be that the traverser is traversing more than
it should. For example if you create a traverser that traverses the
PUBLISHED_BY and BORROWED_BY relationships in both directions you
would traverse (publisher)<-(book)<-(student)->(other book)->(other or
same publisher)<-(other book)... and so on (and that would take time).

Also make sure you are not measuring number of "borrowed books
traversal speed" over all publishers instead of a single publisher
(retrieving borrowed books from all publishers with a low borrowed
ratio would result in a low "borrowed books/ms" value).

Could you write a simple traverser using the core API that calculates
the relationship count at depth two given a publisher node. Something
like:

    Node publisher = // get a publisher node
    int count = 0;
    for ( Relationship bookRel : publisher.getRelationships(
RelTypes.PUBLISHED_BY ) )
    {
        count++;
        Node book = bookRel.getOtherNode( publisher );
        for ( Relationship studentRel : book.getRelationships(
RelTypes.BORROWED_BY ) )
        {
            count++;
        }
    }

Avg count and time to perform that operation wold be interesting. If
timings are not in expected range (basically 15k [avg number of
rels/publisher] x 2 [the borrowed_by rel] x traversal speed) try get
the total relationship count by replacing getRelationships(type) with
getRelationships() to get worst case scenario on how much traversing
needs to be done.

Regards,
-Johan

On Sun, May 16, 2010 at 4:08 AM, suryadev vasudev
<suryadev.vasu...@gmail.com> wrote:
> Here is a rough design and volume
>
> ...
>
> When I first read about traversal speed of 1000-3000/millisecond,  I added
> some buffer and assumed 500/millisecond as a realistic speed. I am not
> giving up so easily after seeing 1/millisecond. I look forward to responses
> from other users.
>
> The real challenges will be around queries for a publisher. A publisher will
> have around 15,000 books and a query like "Given a published ID, what
> percentage of his books were never borrowed"  will need full browsing. My
> hope was that I could browse through and get the answer in 30 milliseconds.
> But it looks like it will take a minimum of 15 seconds.
>
> Some publishers will have 50,000 books and I can't imagine a response time
> of 50 seconds.
>
> So, I have to achieve at least 500/millisecond if not the original 1000.
>
> Regards
> SDev
_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to