I would like to make a suggestion that would both address my feature request 
and increase performance of the database.

Right now the NodeRecord (org.neo4j.kernel.impl.nioneo.store.NodeRecord) 
contains the ID of the first Relationship, while the RelationshipRecord contain 
the ID's of the previous and next relationship for both sides of the 
relationship.

My suggestion is as follows:

Create a new store:

noderelationshiptypestore.db

The layout of this store is given by the NodeRelationshipTypeRecord:

id
previousrelationshiptype
nextrelationshiptype
firstrelationship

The NodeRecord would now need to point to the first outgoing 
NodeRelationshipType and to the first incoming NodeRelationshipType instead of 
to the first Relationship.

On insert of a Relationship, one side of the relationship will update the store 
from the outgoing side, the other side will update the store for the incoming 
side.

I will list the steps to take here for the outgoing side (the incoming side is 
almost identical).

>From the NodeRecord getFirstNodeRelationType (outgoing).

Keep following NextRelationshipType until the desired record is found. If no 
record exists, create one, make the current FirstNodeRelationshipType in the 
NodeRecord (if it exists) the NextRelationshipType of the created 
NodeRelationshipType (and make the created one the previous of the current one) 
and make the created NodeRelationshipType the FirstNodeRelationshipType in the 
NodeRecord.

In other words: find the NodeRelationshipTypeRecord in the linked list. If none 
exists, create a NodeRelationshipTypeRecord, prepend it to the existing list 
and change the entry point in the NodeRecord.

We now have found the requested NodeRelationshipTypeRecord. 

>From NodeRelationshipTypeRecord getFirstRelationship.

Create a new RelationshipRecord and make it the FirstRelationship in the 
NodeRelationshipTypeRecord. 

Make the old first RelationshipRecord (if it exists) the nextRelationship of 
the new first RelationshipRecord and make the new first RelationshipRecord the 
previous of the old first RelationshipRecord.

In other words: prepend a new RelationshipRecord to the existing list of 
Relationships and change the entry point in the NodeRelationshipTypeRecord.

Do the same for the incoming side (except for the creation of the 
RelationshipRecord, we only need one of those).

Instead of a linked list of Relationships per Node we now have two linked lists 
of RelationshipTypes per Node (one incoming, one outgoing), with a linked list 
of Relationships per NodeRelationshipType.

With this approach only those Relationships need to be read that match the 
RelationshipType and Direction given. 

Worst case this approach leads to an extra read operation per RelationshipType: 

Worst case example 1: Retrieve all Relationships, regardless of Relationship or 
Direction. Here we have extra reads for all NodeRelationshipType records. If 
the number of Relationships per NodeRelationshipType is equal to 1, we have 
twice as many reads.

Worst case example 2: If we retrieve Relationships by RelationshipType by 
Direction, we have at most one extra read to fetch the Relationship, and only 
in the case where the number of Relationships per NodeRelationshipType is equal 
to 1 and the requested RelationshipType happens to be the last in the list.

On average partitioning Relationships the way suggested here improves 
performance, while the worst case situations only take a constant time hit of 
no more than a factor 2. On top of that the absolute worst case situation, 
where the performance is reduced by a factor 2 is likely to be uncommon in 
practical Neo4j applications. 

Certainly there will be situations where users want to iterate over all 
relationships of one particular node, which will still be very fast. It is much 
less likely that users would frequently want to iterate over all relationships 
of a huge number of nodes (where all those relationships happen to be 
functional as well). Most traversals will be much more discriminate and be 
based on certain particular RelationshipType where the performance penalty is 
at most one read operation per RelationshipType, but on average the number of 
reads will be lower, certainly if there is more than one Relationship 
associated to the NodeRelationshipType.The potential performance increase can 
be enormous, especially if we want to fetch a relationship of a 
RelationshipType that has few associated Relationships, while some other 
NodeRelationshipType has a huge number of Reltionships.Best case scenario: A 
node has one Relationship of type A and 1 million Relationships of type B and 
we want to fetch t
 hat one Relationship of type A. In the current situation we have to do 
1,000,001 reads to fetch that one Relationship. With this proposal this would 
be reduced to at most 3 reads for that particular request.
Altogether this proposal has advantages in most practical situations. In some 
corner cases the performance may decrease by at most a factor 2, while the 
performance may increase by orders of magnitude in some quite common use cases. 
On top of that, we can also present the meta information I requested, because 
we can simply iterate over the NodeRelationshipType list and return the entries 
to the user.

Finally, this proposal makes it possible to guarantee functional, surjective 
and one-to-one Relationships. Due to the partitioning we will know if there 
already is a relationship of a certain type. If a relationship is stated to be 
functional, surjective, or one-to-one, we can raise an exception when a second 
relationship is about to be created for that particular NodeRelationshipType.

Niels



> From: pd_aficion...@hotmail.com
> To: user@lists.neo4j.org
> Date: Tue, 2 Aug 2011 23:03:41 +0200
> Subject: Re: [Neo4j] Node#getRelationshipTypes
> 
> 
> Building an API on top of Neo4j of course pushes the standard API to its 
> limits. So for that matter it is already a good exercise.
> Any chance this feature request will find its way into 1.5?
> Niels
> 
> > Date: Tue, 2 Aug 2011 22:33:03 +0200
> > From: matt...@neotechnology.com
> > To: user@lists.neo4j.org
> > Subject: Re: [Neo4j] Node#getRelationshipTypes
> > 
> > Those methods will of course be more efficient if implemented in the kernel
> > compared to iterating through all relationships if the whole relationship
> > chain have already been loaded for that node, otherwise it will require a
> > full iteration (or at least making sure the whole chain have been loaded).
> > I've never found a use case for it myself and this is the first I've heard.
> > 
> > 2011/8/1 Niels Hoogeveen <pd_aficion...@hotmail.com>
> > 
> > >
> > > I have two specific use cases for these methods:
> > > I'd like to present a node with the property types (names) it has content
> > > for and with the relationship types it has relationships for, while 
> > > loading
> > > those properties/relationships on demand (ie. click here to see details).
> > > This can be done for properties: there is a getPropertyKeys() method, but
> > > there is no getRelationshipTypes() method.
> > > The other use case has to do with the Enhanced API. There I want to have
> > > pluggable relationships and properties. With respect to relationships 
> > > there
> > > are already three implementations: the regular Relationship, 
> > > SortedRelations
> > > (which use an in-graph Btree for storage) and HyperRelationships which 
> > > allow
> > > n-ary relationships.
> > > Every Element in Enhanced API has a getRelationships() method, much like
> > > the getRelationships() method in Node, which should return every
> > > relationship attached to an Element, irrespective of its implementation.
> > > Right now the Element implementation has to perform the logic to 
> > > distinguish
> > > which relationship is used for what implementation (under the hood it all
> > > works using normal Relationships). It would be much more elegant to 
> > > iterate
> > > over the RelationshipTypes and dispatch the getRelationships() method to 
> > > the
> > > appropriate RelationshipType implementations. That way the logic for
> > > SortedRelationships, HyperRelationships remains in their associated 
> > > classes
> > > and is not spread around the implementation.
> > >
> > > Niels
> > > > From: michael.hun...@neotechnology.com
> > > > Date: Sun, 31 Jul 2011 23:20:50 +0200
> > > > To: user@lists.neo4j.org
> > > > Subject: Re: [Neo4j] Node#getRelationshipTypes
> > > >
> > > > Imho it would have to iterate as well.
> > > >
> > > > As the type is stored with the relationship record and so can only be
> > > accessed after having read it.
> > > >
> > > > It might be to have some minimal performance improvements that
> > > relationships would not have to be fully loaded, nor put into the cache 
> > > for
> > > that. But this is always a question of the use-case. What will be done 
> > > next
> > > with those rel-types.
> > > >
> > > > What was the use-case for this operation again?
> > > >
> > > > Cheers
> > > >
> > > > Michael
> > > >
> > > > Am 31.07.2011 um 18:59 schrieb Niels Hoogeveen:
> > > >
> > > > >
> > > > > Good point.
> > > > > It could for all practical purposes even be Iterable<RelationshipType>
> > > so they can be lazily fetched, as long as the underlying implementation
> > > makes certain that any iteration of the RelationshipTypes forms a set (no
> > > duplicates).
> > > > > There is no need to have RelationshipTypes in any particular order, 
> > > > > and
> > > if that is needed in the application, they can usually be sorted locally
> > > since Nodes will generally have associated Relationships of only a handful
> > > of RelationshipTypes.
> > > > >
> > > > > That said, the more important question is, if the Neo4j store can
> > > produce this meta-information. For sparsely connected nodes, it is 
> > > possible
> > > to iterate over the relationships and return the set of RelationshipTypes,
> > > but this is not a proper solution when nodes are densely connected. So 
> > > there
> > > is no general solution for this question yet.
> > > > > Niels
> > > > >
> > > > >> From: j...@neotechnology.com
> > > > >> Date: Sun, 31 Jul 2011 17:29:29 +0100
> > > > >> To: user@lists.neo4j.org
> > > > >> Subject: Re: [Neo4j] Node#getRelationshipTypes
> > > > >>
> > > > >> Hi Niels,
> > > > >>
> > > > >> Ignoring the operational use for getting relationship types, I do
> > > think these should be generalised from:
> > > > >>
> > > > >>> RelationshipType[] getRelationshipTypes();
> > > > >>> RelationshipType[] getRelationshipTypes(Direction);
> > > > >>
> > > > >> to:
> > > > >>
> > > > >> Set<RelationshipType> getRelationshipTypes();
> > > > >> Set<RelationshipType> getgetRelationshipTypes(Direction);
> > > > >>
> > > > >> Unless you need the ordering and you think the overhead of creating a
> > > some kind of Set is too onerous from a performance point of view.
> > > > >>
> > > > >> Jim
> > > > >> _______________________________________________
> > > > >> Neo4j mailing list
> > > > >> User@lists.neo4j.org
> > > > >> https://lists.neo4j.org/mailman/listinfo/user
> > > > >
> > > > > _______________________________________________
> > > > > Neo4j mailing list
> > > > > User@lists.neo4j.org
> > > > > https://lists.neo4j.org/mailman/listinfo/user
> > > >
> > > > _______________________________________________
> > > > Neo4j mailing list
> > > > User@lists.neo4j.org
> > > > https://lists.neo4j.org/mailman/listinfo/user
> > >
> > > _______________________________________________
> > > Neo4j mailing list
> > > User@lists.neo4j.org
> > > https://lists.neo4j.org/mailman/listinfo/user
> > >
> > 
> > 
> > 
> > -- 
> > Mattias Persson, [matt...@neotechnology.com]
> > Hacker, Neo Technology
> > www.neotechnology.com
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
>                                         
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
                                          
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to