Re: [Neo4j] Issues with IndexedRelationship

Peter Neubauer Thu, 08 Sep 2011 06:40:03 -0700

Niels, Bryce,
great! Gave you access to the repo, please merge :)

/peter


On Thu, Sep 8, 2011 at 2:32 PM, Niels Hoogeveen
<pd_aficion...@hotmail.com> wrote:
>
> Excellent... I did a code review and think this is a huge improvement over 
> what we had.
> Peter, can you pull these changes, I no longer have the privs to do so.
> Niels
>
>> Date: Thu, 8 Sep 2011 17:24:44 +1200
>> From: bryc...@gmail.com
>> To: user@lists.neo4j.org
>> Subject: Re: [Neo4j] Issues with IndexedRelationship
>>
>> I have made the changes in regards to SortedTree in regards to relationships
>> vs nodes, and have got all the tests passing.  The changes are pushed up to
>> my github account (and pull request has been raised).
>>
>> The changes can be seen here:
>> https://github.com/brycenz/graph-collections
>>
>> On Thu, Sep 8, 2011 at 3:41 PM, Bryce <bryc...@gmail.com> wrote:
>>
>> > Another thought if there is going to be a larger refactor of the code is
>> > whether the indexing mechanism should be broken out as a strategy for the
>> > IndexedRelationship.  At present it is tied to SortedTree, but if an
>> > interface was extracted out that had addNode, removeNode, iterator, and
>> > isUniqueIndex then other indexing implementations could be used in certain
>> > cases.
>> >
>> > The particular other implementation I am currently thinking of that could
>> > be of use to me would be a paged linked list.  So that would have a linked
>> > list of pages, each with min < x < max KEY_VALUE (or equivalent)
>> > relationships.  I think that could work quite well for the situation where
>> > the index is descending date ordered, and generally just appended at the
>> > most recent end, and results are retrieved in a paged manner generally from
>> > near the most recent.
>> >
>> > But more to the point there could be any number of implementations that
>> > would be good for given different situations.
>> >
>> > That does bring up a question though, there was some discussion a while ago
>> > about some functionality along the lines of IndexedRelationship being 
>> > pulled
>> > into the core, so is that overkill for now if there is going to be another
>> > core offering later?
>> >
>> >
>> > On Thu, Sep 8, 2011 at 2:38 PM, Niels Hoogeveen <pd_aficion...@hotmail.com
>> > > wrote:
>> >
>> >>
>> >> I think we don't have to worry about backwards compatibility much yet.
>> >> There has not been a formal release of the component, so if there are 
>> >> people
>> >> using the software, they will accept that they are bleeding edgers.
>> >> Indeed addNode should return the KEY_VALUE relationship and I think we
>> >> should change the signature of SortedTree to turn it into
>> >> Iterable<Relationship>. No need to maintain a Node iterator, the node is
>> >> always one getEndNode away.
>> >> Niels
>> >>
>> >> > Date: Thu, 8 Sep 2011 14:17:59 +1200
>> >> > From: bryc...@gmail.com
>> >> > To: user@lists.neo4j.org
>> >> > Subject: Re: [Neo4j] Issues with IndexedRelationship
>> >> >
>> >> > Will have to experiment with changing my id's to be stored as longs, it
>> >> does
>> >> > make perfect sense really that it would be better.  Thanks for the hint.
>> >> >
>> >> > In regards to SortedTree returning the KEY_VALUE relationship instead of
>> >> the
>> >> > end Node, I had thought of that too, and it would definitely help.
>> >>  Could
>> >> > end up being a significant change to SortedTree though, e.g.:
>> >> >   sortedTree.addNode( node );
>> >> > Would need to return the KEY_VALUE relationship instead of a boolean.
>> >>  Which
>> >> > not knowing where else SortedTree is used could be a large change?
>> >> >
>> >> > Maybe SortedTree would have two iterator's available a key_value
>> >> > relationship iterator, and a node iterator.  Having a quick look at it
>> >> now
>> >> > it seems that it could work ok that way without introducing much code
>> >> > duplication.
>> >> >
>> >> > On Thu, Sep 8, 2011 at 12:46 PM, Niels Hoogeveen
>> >> > <pd_aficion...@hotmail.com>wrote:
>> >> >
>> >> > >
>> >> > > Two longs is certainly cheaper than a string. Two longs take 128 bit
>> >> and
>> >> > > are stored in the main record of the PropertyContainer, while a String
>> >> would
>> >> > > require a 64 bit "pointer" in the main record of the
>> >> PropertyContainer, and
>> >> > > an additional read in the String store where the string representation
>> >> will
>> >> > > take up 256 bits. So both memory-wise, as perfomance wise, it is
>> >> better to
>> >> > > store a UUID as two long values.
>> >> > >
>> >> > >
>> >> > > The main issue is something that needs a deeper fix than adding ID's.
>> >> > > SortedTree now returns Nodes when traversing the tree. We should
>> >> however
>> >> > > return the KEY_VALUE Relationship to the indexed Node. Then
>> >> > > IndexedRelationship.DirectRelationship can be created with that
>> >> relationship
>> >> > > as an argument. We get the Direction and the RelationshipType for
>> >> free.
>> >> > > Niels
>> >> > >
>> >> > > > Date: Thu, 8 Sep 2011 11:36:11 +1200
>> >> > > > From: bryc...@gmail.com
>> >> > > > To: user@lists.neo4j.org
>> >> > > > Subject: Re: [Neo4j] Issues with IndexedRelationship
>> >> > > >
>> >> > > > Hi Niels,
>> >> > > >
>> >> > > > Sorry I didn't quite write the bit about (1) clearly enough.  The
>> >> problem
>> >> > > is
>> >> > > > that it presently throws an Exception where it shouldn't.
>> >> > > >
>> >> > > > This stems from IndexedRelationship.DirectRelationship:
>> >> > > > this.endRelationship = endNode.getSingleRelationship(
>> >> > > > SortedTree.RelTypes.KEY_VALUE, Direction.INCOMING );
>> >> > > >
>> >> > > > So if the end node has more than one incoming KEY_VALUE relationship
>> >> a
>> >> > > more
>> >> > > > than one relationship exception is thrown.
>> >> > > >
>> >> > > > Instead of the getSingleRelationship I was planning on iterating
>> >> over the
>> >> > > > relationships and matching the UUID stored at the root end of the IR
>> >> with
>> >> > > > one of the KEY_VALUE relationships (which is why using a unique id
>> >> is
>> >> > > > necessary rather than the relationship type).  Note: there will
>> >> actually
>> >> > > > still be an issue if the same IR has multiple relationships to the
>> >> same
>> >> > > leaf
>> >> > > > node - still thinking about that might need .
>> >> > > >
>> >> > > > Is storing the UUID as two longs much quicker than storing it as a
>> >> > > string?
>> >> > > >  Curious about this since in my current model I have all the domain
>> >> > > objects
>> >> > > > with UUID's, and these are all stored as strings.  If it was going
>> >> to
>> >> > > help
>> >> > > > with either memory or performance then I would be keen to migrate
>> >> this to
>> >> > > > two longs.
>> >> > > >
>> >> > > > Cheers
>> >> > > > Bryce
>> >> > > >
>> >> > > > On Thu, Sep 8, 2011 at 11:07 AM, Niels Hoogeveen
>> >> > > > <pd_aficion...@hotmail.com>wrote:
>> >> > > >
>> >> > > > >
>> >> > > > > Great work Bryce,
>> >> > > > > I do have a question though.
>> >> > > > > What is the rationale for the restriction mentioned under "1)". Do
>> >> you
>> >> > > need
>> >> > > > > this for the general case (to make IndexedRelationshipExpander
>> >> work
>> >> > > > > correctly), or do you need it for your own application to throw
>> >> that
>> >> > > > > exception? If the latter is the case, I think it would be
>> >> important to
>> >> > > tease
>> >> > > > > out the general case and offer this new behaviour as an option.
>> >> > > > > A unique key for the index is a good idea anyway and can be added
>> >> to
>> >> > > > > SortedTree. Generate a UUID and store it in two long properties.
>> >> That
>> >> > > way
>> >> > > > > the two values will always be read in the first fetch of the
>> >> underlying
>> >> > > > > PropertyContainer. A getId method on the TreeNodes can then return
>> >> a
>> >> > > String
>> >> > > > > representation of of the two long values.
>> >> > > > > IndexRelationships are a relatively new development, so I think
>> >> you are
>> >> > > one
>> >> > > > > of the first to actually try it out. Personally I have chosen to
>> >> > > directly
>> >> > > > > work with SortedTree, because I am working within the framework of
>> >> a
>> >> > > wrapper
>> >> > > > > API, so I can integrate the functionality behind the regular
>> >> > > > > createRelationshipTo and getRelationships methods.
>> >> > > > > I don't think API changes will be an issue at the moment.
>> >> > > > > Niels
>> >> > > > > > Date: Thu, 8 Sep 2011 10:22:11 +1200
>> >> > > > > > From: bryc...@gmail.com
>> >> > > > > > To: user@lists.neo4j.org
>> >> > > > > > Subject: [Neo4j] Issues with IndexedRelationship
>> >> > > > > >
>> >> > > > > > Hi,
>> >> > > > > >
>> >> > > > > > As I mentioned a while ago I am looking at using
>> >> > > IndexedRelationship's
>> >> > > > > > within my application.  The major thing that was missing for me
>> >> to be
>> >> > > > > able
>> >> > > > > > to do this was IndexedRelationshipExpander being able to provide
>> >> all
>> >> > > the
>> >> > > > > > relationships from the leaf end of indexed relationships through
>> >> the
>> >> > > the
>> >> > > > > > root end.  So I have been working on getting that support in
>> >> there.
>> >> > > > > >
>> >> > > > > > However in writing this I have discovered a number of other
>> >> issues
>> >> > > that I
>> >> > > > > > have also fixed, and at least one I am still working on.  Since
>> >> I was
>> >> > > > > right
>> >> > > > > > into the extra support for expanding the relationships it is
>> >> hard to
>> >> > > > > break
>> >> > > > > > out these fixes as a separate commit (which I think would be
>> >> ideal),
>> >> > > so
>> >> > > > > it
>> >> > > > > > will most likely all come in together hopefully later today (NZ
>> >> > > time).
>> >> > > > > >
>> >> > > > > > Just letting everyone know in case someone else is doing
>> >> development
>> >> > > > > against
>> >> > > > > > indexed relationships.
>> >> > > > > >
>> >> > > > > > Quick run down of the issues, note: N -- IR(X) --> {A,B} below
>> >> means
>> >> > > > > there
>> >> > > > > > is a indexed relationship from N to A & B, of type X.
>> >> > > > > >
>> >> > > > > > 1) Exception thrown when more than one IR terminates at a given
>> >> node,
>> >> > > > > e.g.:
>> >> > > > > > N1 -- IR(X) --> {A,B,C,D}
>> >> > > > > > N2 -- IR(X) --> {A,X,Y,Z}
>> >> > > > > > Will throw an exception when using the
>> >> IndexedRelationshipExpander on
>> >> > > > > either
>> >> > > > > > N1, or N2.
>> >> > > > > >
>> >> > > > > > 2) Start / End nodes are transposed when the IR has an direction
>> >> of
>> >> > > > > > incoming, i.e. the IR is created against N but across a set of
>> >> > > incoming
>> >> > > > > > relationships:
>> >> > > > > > N <-- IR(Y) -- {A,B,C}
>> >> > > > > > Will return 3 relationships N --> A, N --> B, N --> C.
>> >> > > > > >
>> >> > > > > > I have written tests for each of these, as well as a couple of
>> >> other
>> >> > > > > tests.
>> >> > > > > >
>> >> > > > > > Still completing (1) and have a little question about this.  In
>> >> order
>> >> > > to
>> >> > > > > fix
>> >> > > > > > this I may need to introduce a unique ID stored against the IR
>> >> both
>> >> > > at
>> >> > > > > the
>> >> > > > > > root and at the leaves.  Currently the relationship type is used
>> >> to
>> >> > > name
>> >> > > > > the
>> >> > > > > > IR at both root and leaves, but in the case above that means you
>> >> > > can't
>> >> > > > > tell
>> >> > > > > > from node A which KEY_VALUE relationship belongs to which IR
>> >> tree
>> >> > > without
>> >> > > > > > traversing the tree.
>> >> > > > > >
>> >> > > > > > So the question is adding this ID would mean that anyone who is
>> >> > > already
>> >> > > > > > using this wont have the ID, and therefore without care will be
>> >> data
>> >> > > > > > incompatible with the updated code.  This could be managed via a
>> >> > > check
>> >> > > > > for
>> >> > > > > > the ID when accessing the tree and if it isn't there doing a
>> >> walk
>> >> > > over
>> >> > > > > the
>> >> > > > > > tree to populate all the places where it is required.
>> >> > > > > >
>> >> > > > > > In general in developing against this code where do we sit on
>> >> data
>> >> > > > > > compatibility and API compatibility?
>> >> > > > > >
>> >> > > > > > Cheers
>> >> > > > > > Bryce
>> >> > > > > > _______________________________________________
>> >> > > > > > Neo4j mailing list
>> >> > > > > > User@lists.neo4j.org
>> >> > > > > > https://lists.neo4j.org/mailman/listinfo/user
>> >> > > > >
>> >> > > > > _______________________________________________
>> >> > > > > Neo4j mailing list
>> >> > > > > User@lists.neo4j.org
>> >> > > > > https://lists.neo4j.org/mailman/listinfo/user
>> >> > > > >
>> >> > > > _______________________________________________
>> >> > > > Neo4j mailing list
>> >> > > > User@lists.neo4j.org
>> >> > > > https://lists.neo4j.org/mailman/listinfo/user
>> >> > >
>> >> > > _______________________________________________
>> >> > > Neo4j mailing list
>> >> > > User@lists.neo4j.org
>> >> > > https://lists.neo4j.org/mailman/listinfo/user
>> >> > >
>> >> > _______________________________________________
>> >> > Neo4j mailing list
>> >> > User@lists.neo4j.org
>> >> > https://lists.neo4j.org/mailman/listinfo/user
>> >>
>> >> _______________________________________________
>> >> Neo4j mailing list
>> >> User@lists.neo4j.org
>> >> https://lists.neo4j.org/mailman/listinfo/user
>> >>
>> >
>> >
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Issues with IndexedRelationship

Reply via email to