Re: [Neo] NEO Performance

Jürgen Umbrich Fri, 12 Dec 2008 09:32:12 -0800

Hi
> 
> First let me just point out that Neo is transactional meaning that the
> system can crash at any point in time and the system will be brought
> back to the correct consistent state (snapshot from last commit before
> crash). That will slow things down a bit.
> 
very nice feature :-).
>> I tried to get some benchmark values from the neo4j homepages but I was
>> not successful.
> 
> You are right about no public benchmarks. We should work on that after
> we've released 1.0 final. Please let us know (all of you) if you have
> any specific benchmark requests.
> 
Thanks, as soon as I have more understanding of neo and the graph based 
data structures, their advantages and disadvantages, I will come back to 
you with benchmark requirements ;-).


>> Currently I ll run some benchmark tests with neo4j. (storing a rdf graph
>> with SingleValueIndex and node- and relationship objects containing one
>> property value the URI,BNode or Literal value).
>> So far,I figured out that it takes around 1,5 ms for an insert of a
>> link. (using transaction batches of size 1000 and 10000).
>>
> 
> Single value index is currently pretty slow. If you just create 1
> node, 1 relationship, 1 property for each "link" and no index you
> should get about 10-30 inserts/ms depending on hardware (would
> translate to about 300-900 documents/s in your case). If you add
> synchronous indexing to that it will drop by a factor of 10.
Just to clarify this for me:
if we use no index (SingleValueIndex, MultiValueIndex, LuceneIndex....) 
than we should be able to achieve 10-30 "link" inserts/ms. But this also 
implies that I store the same node multiple times!? If document A 
contains a link from node [A] to Node [B] and document B also a link 
from [B] to [A] we will find to nodes with the value "A" but different 
internal node ID's, the same for node B?


>> Can anybody provide me with some benchmarks or a general comment/design
>> ideas that it is possible to handle these amount of inserts/sec!?
>>
> 
> To handle this amount (150 documents/s) during a longer time will be
> hard because of indexing. If you however can live with asynchronous
> indexing (keep latest index in memory and write in background
> thread/transaction to disk) it is possible to handle shorter times of
> that load.
Hmm, sorry again, just to understand this correct.
So far I am using batches of 1K,5K and 10K and then execute the 
transaction.finish(). For me, this seems exactly like what you 
suggested, or did I get it wrong?
@I had a look into the code of the indexing classes. It seems like each 
class is doing single transaction commitments per insert, get and lookup.
Is there a class which applies a batch transaction management for the 
index interface.
e.g.
  collect a batch of 10K nodes in a HashMap<String,Node> and when the Map
  is full insert the <"Key","Node"> pairs into the index (BtreeMap,
  Lucene,....). Combined with a LRU -Cache this should speed up the 
whole index lookup for my use case!
Would be happy to get a comment or critic for this idea.
> 
>> Also, it seems like neo is not checking if a relationship with a for a
>> certain relationshiptype between two nodes exists already, so I need to
>> check if I insert duplicate edges. Or did I missed something!?
>>
> 
> Yes you are right and this is because it is valid in the model. As an
> example lets say we are to model something like twitter. We have users
> (nodes) then users can follow each other (relationship of type
> FOLLOWS). It would then be possible to have 3 combinations:
> 
> User A--FOLLOWS->User B
> 
> User B--FOLLOWS->User A
> 
> User A->FOLLOWS->User B--FOLLOWS->User A
Hmm ok, but I still do not understand why it is necessary to insert the 
same relationship multiple times. Hence, I assume that uses cases exists 
where the domain model allows to insert the same relationshiptype with a 
different meaning( e.g. relationShipType LINK with different properties 
between the same nodes.
A -- LINK (rel:friend)    --> B
A -- LINK (rel:colleague) --> B
A -- LINK (rel:housemate) --> B

In contrary, I guess that their exists a lot of use cases, especially 
related to storing RDF, where you do not want to reduplicate the same 
information in the database (e.g. inserting a batch of RDF files, and 
each RDF file contains the triple foaf:Person rdf:type owl:Class.
I hope that I ll explain my thoughts in an understandable way.

My question is know, if it is possible to integrate efficiently a 
duplication check into the neo core data structure (
E.g. public Relationship reuseOrCreateRelationshipTo( Node otherNode,
         RelationshipType type )
) or if the end users should take care about that problem depending on 
their domain model and use case?.


Wishes
  juergen
_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo] NEO Performance

Reply via email to