Hi, Adding onto Craig's thoughts, I'd like to point you to some related work in this area:
1. Modeling a library as a graph. - slides: http://www.slideshare.net/slidarko/a-practical-ontology-for-the-largescale-modeling-of-scholarly-artifacts-and-their-usage-3879791 - article: http://arxiv.org/abs/0708.1150 2. Doing 'slightly complex queries' as graph traversal over graph databases such as Neo4j: - software framework: http://pipes.tinkerpop.com - pipes give you fine-grained control over your walker with good speed: http://bit.ly/aa29MO - related article: http://arxiv.org/abs/0806.2274 - related article: http://arxiv.org/abs/1004.1001 Take care, Marko. http://tinkerpop.com http://markorodriguez.com On May 15, 2010, at 4:05 PM, Craig Taverner wrote: > My 2 cents, without knowing the structure of your data (which is needed to > really answer the question). > > I assume when you say 'slightly complex query' you are probably using a > custom traverser that looks at properties of nodes and/or relationships to > make the decision, or possibly even follows a relationship to make the > decision. All of these options will slow things down. Your original > traverser probably only considered relationship types and directions, > loading from only the relationships table. The new one hits the properties > tables, possibly for both nodes and relationships. > > If this is the case, the improvement is much the same as you would do in a > relational database, which is to index the data. However, indexing is > different in a graph, and I think the best way to do that in your case is to > build additional graph structures that allow the new traverser to only look > at relationships. For example, you say that you are interested in books from > a particular published currently lent out. Consider having the publisher not > have direct relationships to their books (a publisher index), but instead > have relationships to 'borrowed' and 'not borrowed' nodes and those are > related to the books (effectively a combined publisher-borrowing_status > index). When a book is borrowed, move it's relationship. Since borrowing a > book occurs occasionally over very long times (days or weeks), this database > edit has no performance cost, but makes the query you are looking for very > fast. To add a time period to this situation, consider the TimeLineIndex. > Alternatively extend the previous concept to have nodes representing books > borrowed on certain days, for example. > > The real solution is really dependent on your data and the kinds of queries > you plan to make. You probably already made the publisher-book relationships > because you planned to make a query like that. The more complex queries you > wish to make the more complex structure you will probably devise. Neo4j is > great in that you can keep optimizing by adding appropriate structure > without removing previous capabilities. > > On Sat, May 15, 2010 at 11:34 PM, suryadev vasudev < > suryadev.vasu...@gmail.com> wrote: > >> We are considering Neo4J for a decision making application. The application >> is analogous to a Library having 15 million books. We have BOOKS, >> PUBLISHERS >> and STUDENTS as nodes. Every book will have a PUBLISHED_BY relationship to >> one publisher. STUDENTS may borrow a book, reserve a book or return a >> borrowed book. Each is a relationship type meaning BORROWED_BY, RESERVED_BY >> and RETURNS between BOOKS and STUDENTS. >> When we traverse starting from a publisher, the traversing speed is >> 200-1000 >> nodes per millisecond. This is pure traversal to get a book count by >> publisher. >> The Neo is failing us when we make a slightly complex query. >> Starting with a publisher, retrieve all books that are currently lent out. >> Starting with a publisher, retrieve all books that were borrowed between >> May >> 1 2010 and May 10 2010. >> The response time we got was 1-2 millisecond per book. >> Before running the test, we created between 0-3 relationships for each >> book. >> We have seeded 15,000 students ,1000 publishers and 15 million books. >> And the server is a 8GB RAM machine. >> I wonder why the traversal is drastically slow? >> Regards >> SDev >> _______________________________________________ >> Neo mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > _______________________________________________ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user _______________________________________________ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user