Hi > > First let me just point out that Neo is transactional meaning that the > system can crash at any point in time and the system will be brought > back to the correct consistent state (snapshot from last commit before > crash). That will slow things down a bit. > very nice feature :-). >> I tried to get some benchmark values from the neo4j homepages but I was >> not successful. > > You are right about no public benchmarks. We should work on that after > we've released 1.0 final. Please let us know (all of you) if you have > any specific benchmark requests. > Thanks, as soon as I have more understanding of neo and the graph based data structures, their advantages and disadvantages, I will come back to you with benchmark requirements ;-).
>> Currently I ll run some benchmark tests with neo4j. (storing a rdf graph >> with SingleValueIndex and node- and relationship objects containing one >> property value the URI,BNode or Literal value). >> So far,I figured out that it takes around 1,5 ms for an insert of a >> link. (using transaction batches of size 1000 and 10000). >> > > Single value index is currently pretty slow. If you just create 1 > node, 1 relationship, 1 property for each "link" and no index you > should get about 10-30 inserts/ms depending on hardware (would > translate to about 300-900 documents/s in your case). If you add > synchronous indexing to that it will drop by a factor of 10. Just to clarify this for me: if we use no index (SingleValueIndex, MultiValueIndex, LuceneIndex....) than we should be able to achieve 10-30 "link" inserts/ms. But this also implies that I store the same node multiple times!? If document A contains a link from node [A] to Node [B] and document B also a link from [B] to [A] we will find to nodes with the value "A" but different internal node ID's, the same for node B? >> Can anybody provide me with some benchmarks or a general comment/design >> ideas that it is possible to handle these amount of inserts/sec!? >> > > To handle this amount (150 documents/s) during a longer time will be > hard because of indexing. If you however can live with asynchronous > indexing (keep latest index in memory and write in background > thread/transaction to disk) it is possible to handle shorter times of > that load. Hmm, sorry again, just to understand this correct. So far I am using batches of 1K,5K and 10K and then execute the transaction.finish(). For me, this seems exactly like what you suggested, or did I get it wrong? @I had a look into the code of the indexing classes. It seems like each class is doing single transaction commitments per insert, get and lookup. Is there a class which applies a batch transaction management for the index interface. e.g. collect a batch of 10K nodes in a HashMap<String,Node> and when the Map is full insert the <"Key","Node"> pairs into the index (BtreeMap, Lucene,....). Combined with a LRU -Cache this should speed up the whole index lookup for my use case! Would be happy to get a comment or critic for this idea. > >> Also, it seems like neo is not checking if a relationship with a for a >> certain relationshiptype between two nodes exists already, so I need to >> check if I insert duplicate edges. Or did I missed something!? >> > > Yes you are right and this is because it is valid in the model. As an > example lets say we are to model something like twitter. We have users > (nodes) then users can follow each other (relationship of type > FOLLOWS). It would then be possible to have 3 combinations: > > User A--FOLLOWS->User B > > User B--FOLLOWS->User A > > User A->FOLLOWS->User B--FOLLOWS->User A Hmm ok, but I still do not understand why it is necessary to insert the same relationship multiple times. Hence, I assume that uses cases exists where the domain model allows to insert the same relationshiptype with a different meaning( e.g. relationShipType LINK with different properties between the same nodes. A -- LINK (rel:friend) --> B A -- LINK (rel:colleague) --> B A -- LINK (rel:housemate) --> B In contrary, I guess that their exists a lot of use cases, especially related to storing RDF, where you do not want to reduplicate the same information in the database (e.g. inserting a batch of RDF files, and each RDF file contains the triple foaf:Person rdf:type owl:Class. I hope that I ll explain my thoughts in an understandable way. My question is know, if it is possible to integrate efficiently a duplication check into the neo core data structure ( E.g. public Relationship reuseOrCreateRelationshipTo( Node otherNode, RelationshipType type ) ) or if the end users should take care about that problem depending on their domain model and use case?. Wishes juergen _______________________________________________ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user