Re: [Neo] Traversal Speed is just 1 millisecond per node

Marko Rodriguez Sat, 15 May 2010 16:41:57 -0700

Hi,

Adding onto Craig's thoughts, I'd like to point you to some related work in 
this area:


1. Modeling a library as a graph.
        - slides: 
http://www.slideshare.net/slidarko/a-practical-ontology-for-the-largescale-modeling-of-scholarly-artifacts-and-their-usage-3879791
        - article: http://arxiv.org/abs/0708.1150

2. Doing 'slightly complex queries' as graph traversal over graph databases 
such as Neo4j:
        - software framework: http://pipes.tinkerpop.com
                - pipes give you fine-grained control over your walker with 
good speed: http://bit.ly/aa29MO
        - related article: http://arxiv.org/abs/0806.2274
        - related article: http://arxiv.org/abs/1004.1001

Take care,
Marko.

http://tinkerpop.com
http://markorodriguez.com

On May 15, 2010, at 4:05 PM, Craig Taverner wrote:

> My 2 cents, without knowing the structure of your data (which is needed to
> really answer the question).
> 
> I assume when you say 'slightly complex query' you are probably using a
> custom traverser that looks at properties of nodes and/or relationships to
> make the decision, or possibly even follows a relationship to make the
> decision. All of these options will slow things down. Your original
> traverser probably only considered relationship types and directions,
> loading from only the relationships table. The new one hits the properties
> tables, possibly for both nodes and relationships.
> 
> If this is the case, the improvement is much the same as you would do in a
> relational database, which is to index the data. However, indexing is
> different in a graph, and I think the best way to do that in your case is to
> build additional graph structures that allow the new traverser to only look
> at relationships. For example, you say that you are interested in books from
> a particular published currently lent out. Consider having the publisher not
> have direct relationships to their books (a publisher index), but instead
> have relationships to 'borrowed' and 'not borrowed' nodes and those are
> related to the books (effectively a combined publisher-borrowing_status
> index). When a book is borrowed, move it's relationship. Since borrowing a
> book occurs occasionally over very long times (days or weeks), this database
> edit has no performance cost, but makes the query you are looking for very
> fast. To add a time period to this situation, consider the TimeLineIndex.
> Alternatively extend the previous concept to have nodes representing books
> borrowed on certain days, for example.
> 
> The real solution is really dependent on your data and the kinds of queries
> you plan to make. You probably already made the publisher-book relationships
> because you planned to make a query like that. The more complex queries you
> wish to make the more complex structure you will probably devise. Neo4j is
> great in that you can keep optimizing by adding appropriate structure
> without removing previous capabilities.
> 
> On Sat, May 15, 2010 at 11:34 PM, suryadev vasudev <
> suryadev.vasu...@gmail.com> wrote:
> 
>> We are considering Neo4J for a decision making application. The application
>> is analogous to a Library having 15 million books. We have BOOKS,
>> PUBLISHERS
>> and STUDENTS as nodes. Every book will have a PUBLISHED_BY relationship to
>> one publisher. STUDENTS may borrow a book, reserve a book or return a
>> borrowed book. Each is a relationship type meaning BORROWED_BY, RESERVED_BY
>> and  RETURNS between BOOKS and STUDENTS.
>> When we traverse starting from a publisher, the traversing speed is
>> 200-1000
>> nodes per millisecond. This is pure traversal to get a book count by
>> publisher.
>> The Neo is failing us when we make a slightly complex query.
>> Starting with a publisher, retrieve all books that are currently lent out.
>> Starting with a publisher, retrieve all books that were borrowed between
>> May
>> 1 2010 and May 10 2010.
>> The response time we got was 1-2 millisecond per book.
>> Before running the test, we created between 0-3 relationships for each
>> book.
>> We have seeded 15,000 students ,1000 publishers and 15 million books.
>> And the server is a 8GB RAM machine.
>> I wonder why the traversal is drastically slow?
>> Regards
>> SDev
>> _______________________________________________
>> Neo mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>> 
> _______________________________________________
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo] Traversal Speed is just 1 millisecond per node

Reply via email to