Hi Jürgen, I will try to answer some of your questions.
On Thu, Dec 11, 2008 at 10:17 PM, Jürgen Umbrich <juergen.umbr...@deri.org> wrote: > first off all, I am very fascinated about neo and the neo4j library. > I work as part of my PhD in the domain of web crawling and thus, I > thought about using neo4j to store the crawl traverse- and link-graph. > Assuming that we can fetch max ~150 docs/sec and extracting for each > document in average 30 links (very conservative assumption) neo should > be able to handle 4500 inserts.sec (avg 4,5 inserts/ms)! > First let me just point out that Neo is transactional meaning that the system can crash at any point in time and the system will be brought back to the correct consistent state (snapshot from last commit before crash). That will slow things down a bit. > I tried to get some benchmark values from the neo4j homepages but I was > not successful. You are right about no public benchmarks. We should work on that after we've released 1.0 final. Please let us know (all of you) if you have any specific benchmark requests. > Currently I ll run some benchmark tests with neo4j. (storing a rdf graph > with SingleValueIndex and node- and relationship objects containing one > property value the URI,BNode or Literal value). > So far,I figured out that it takes around 1,5 ms for an insert of a > link. (using transaction batches of size 1000 and 10000). > Single value index is currently pretty slow. If you just create 1 node, 1 relationship, 1 property for each "link" and no index you should get about 10-30 inserts/ms depending on hardware (would translate to about 300-900 documents/s in your case). If you add synchronous indexing to that it will drop by a factor of 10. > Can anybody provide me with some benchmarks or a general comment/design > ideas that it is possible to handle these amount of inserts/sec!? > To handle this amount (150 documents/s) during a longer time will be hard because of indexing. If you however can live with asynchronous indexing (keep latest index in memory and write in background thread/transaction to disk) it is possible to handle shorter times of that load. > Also, it seems like neo is not checking if a relationship with a for a > certain relationshiptype between two nodes exists already, so I need to > check if I insert duplicate edges. Or did I missed something!? > Yes you are right and this is because it is valid in the model. As an example lets say we are to model something like twitter. We have users (nodes) then users can follow each other (relationship of type FOLLOWS). It would then be possible to have 3 combinations: User A--FOLLOWS->User B User B--FOLLOWS->User A User A->FOLLOWS->User B--FOLLOWS->User A Regards, Johan _______________________________________________ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user