Can't you rebalance the tree by just marking nodes as deleted? (setting properties or collecting their id's somewhere?)
Michael Am 09.07.2011 um 00:29 schrieb Craig Taverner: > Not sure if this will be the same for you, but we were not able to use the > RTree in batch inserter mode, since the batch inserter does not support node > deletion and that is required during tree rebalancing. So our OSM importer > in batch mode actually imports everything first, must like you describe, and > the indexes in the normal non-batch mode afterwards. > > As Peter says, the normal API has been catching up with the batch API for > many cases, and we have also moved our OSM importer to the normal API. I'm > guessing the same might happen everywhere. > > On Fri, Jul 8, 2011 at 11:56 AM, Niels Hoogeveen > <pd_aficion...@hotmail.com>wrote: > >> >> I think the best solution to use the batch inserter in conjunction with >> indexed relationships is to first run the batch inserter and temporarily >> store the relationships to be indexed on specifically designated nodes. Make >> sure that each of those nodes does not have more than a set maximum number >> of relationships. After batch insert, the temporary relationships can be >> traversed and inserted in the appropriate index trees. >> The nicest solution would be to have a class BatchInsertIndexedRelationship >> which has more or less the same interface as the regular IndexedRelationship >> class, which temporarily stores relationships without indexing them and when >> batch insert is finished, a call to optimize() will do the actual index >> inserts and remove the temporarily stored relationships. >> >> Niels >>> From: michael.hun...@neotechnology.com >>> Date: Fri, 8 Jul 2011 09:56:30 +0200 >>> To: user@lists.neo4j.org >>> Subject: Re: [Neo4j] Indexed relationships >>> >>> But I'm not sure when in 1.5 this is going to be addressed. >>> >>> So if Niels could look at the effort needed and perhaps pair with Andrew >> on getting this done (or at least give him some pointers to implement it) >> this would be great! >>> >>> Cheers >>> >>> Michael >>> >>> Am 08.07.2011 um 07:07 schrieb Peter Neubauer: >>> >>>> Andrew, >>>> I think in the long run we are moving away from the BatchInserter, and >>>> instead going to find better ways to avoid disk IO, e.g. ordered >>>> writes per transaction, so that normal mode can be sped up to match >>>> the BatchInserter, that hopefully will make it into 1.5, so I would >>>> not worry about BatchInserter right now. >>>> >>>> Cheers, >>>> >>>> /peter neubauer >>>> >>>> GTalk: neubauer.peter >>>> Skype peter.neubauer >>>> Phone +46 704 106975 >>>> LinkedIn http://www.linkedin.com/in/neubauer >>>> Twitter http://twitter.com/peterneubauer >>>> >>>> http://www.neo4j.org - Your high performance graph >> database. >>>> http://startupbootcamp.org/ - Ă–resund - Innovation happens HERE. >>>> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing >> party. >>>> >>>> >>>> >>>> On Fri, Jul 8, 2011 at 4:27 AM, Andrew White <li...@andrewewhite.net> >> wrote: >>>>> I don't know if this is the right place to ask this; but does it >> support >>>>> a batch insert mode? When I am bulk loading data I don't have Node >>>>> objects to pass around, only node ids. >>>>> >>>>> Thanks, >>>>> Andrew >>>>> >>>>> On 07/07/2011 06:19 PM, Niels Hoogeveen wrote: >>>>>> I created a wiki page for indexed relationships in the Git repo, see: >> >> https://github.com/peterneubauer/graph-collections/wiki/Indexed-relationships >>>>>> >>>>>>> From: michael.hun...@neotechnology.com >>>>>>> Date: Thu, 7 Jul 2011 22:53:05 +0200 >>>>>>> To: user@lists.neo4j.org >>>>>>> Subject: Re: [Neo4j] Indexed relationships >>>>>>> >>>>>>> Could you put these code examples into the Readme for the project or >> on a wiki page? >>>>>>> >>>>>>> Am 07.07.2011 um 22:11 schrieb Niels Hoogeveen: >>>>>>> >>>>>>>> IndexedRelationship and IndexedRelationshipExpander are now in Git. >> See: >> https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship >>>>>>>> An example: >>>>>>>> class IdComparator implements java.util.Comparator<Node>{ >>>>>>>> public int compare(Node n1, Node n2){ >>>>>>>> long l1 = Long.reverse(n1.getId()); >>>>>>>> long l2 = Long.reverse(n2.getId()); >>>>>>>> if(l1 == l2) return 0; >>>>>>>> else if(l1< l2) return -1; >>>>>>>> else return 1; >>>>>>>> } >>>>>>>> }static enum RelTypes implements RelationshipType{ >>>>>>>> DIRECT_RELATIONSHIP, >>>>>>>> INDEXED_RELATIONSHIP, >>>>>>>> }; >>>>>>>> Node indexedNode = graphDb().createNode(); >>>>>>>> IndexedRelationship ir = new >> IndexedRelationship(RelTypes.INDEXED_RELATIONSHIP, Direction.OUTGOING, new >> IdComparator(), true, indexedNode, graphDb()); >>>>>>>> >>>>>>>> Node n1 = graphDb().createNode(); >>>>>>>> n1.setProperty("name", "n1"); >>>>>>>> Node n2 = graphDb().createNode(); >>>>>>>> n2.setProperty("name", "n2"); >>>>>>>> Node n3 = graphDb().createNode(); >>>>>>>> n3.setProperty("name", "n3"); >>>>>>>> Node n4 = graphDb().createNode(); >>>>>>>> n4.setProperty("name", "n4"); >>>>>>>> >>>>>>>> indexedNode.createRelationshipTo(n1, RelTypes.DIRECT_RELATIONSHIP); >>>>>>>> indexedNode.createRelationshipTo(n3, RelTypes.DIRECT_RELATIONSHIP); >>>>>>>> ir.createRelationshipTo(n2); >>>>>>>> ir.createRelationshipTo(n4); >>>>>>>> >>>>>>>> IndexedRelationshipExpander re1 = new >> IndexedRelationshipExpander(graphDb(), Direction.OUTGOING, >> RelTypes.DIRECT_RELATIONSHIP); >>>>>>>> IndexedRelationshipExpander re2 = new >> IndexedRelationshipExpander(graphDb(), Direction.OUTGOING, >> RelTypes.INDEXED_RELATIONSHIP); >>>>>>>> >>>>>>>> for(Relationship rel: re1.expand(indexedNode)){ >>>>>>>> System.out.println(rel.getEndNode().getProperty("name")); >>>>>>>> } >>>>>>>> for(Relationship rel: re2.expand(indexedNode)){ >>>>>>>> System.out.println(re2.getEndNode().getProperty("name")); >>>>>>>> } >>>>>>>>> From: pd_aficion...@hotmail.com >>>>>>>>> To: user@lists.neo4j.org >>>>>>>>> Date: Thu, 7 Jul 2011 16:55:36 +0200 >>>>>>>>> Subject: Re: [Neo4j] Indexed relationships >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Michael,I realize that the implementation of >> IndexedRelationship can in fact support returning relationships, and I have >> a preliminary version running locally now.The returned relationships can >> support all methods of the Relationship interface, returning the node >> pointing to the treeRoot as the startNode, and returning the node set as the >> key_value as the endNode.All relationship properties will be stored on the >> KEY_VALUE relationship pointing to the endNode.There is one caveat to this >> solution, the returned relationships cannot support the getId() method,and >> will throw an UnsupportedOperationException when being >> called.IndexedRelationship will implement Iterable<Relationship>.With these >> changes, it is possible to create an Expander and I am working right now to >> implement that.Niels >>>>>>>>> >>>>>>>>>> From: pd_aficion...@hotmail.com >>>>>>>>>> To: user@lists.neo4j.org >>>>>>>>>> Date: Thu, 7 Jul 2011 14:46:35 +0200 >>>>>>>>>> Subject: Re: [Neo4j] Indexed relationships >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Michael, >>>>>>>>>> >>>>>>>>>> I haven't yet worked on an example. >>>>>>>>>> >>>>>>>>>> There are tests for the SortedTree implementation, >>>>>>>>>> but didn't add those to the IndexedRelationship class, >>>>>>>>>> which is simply a wrapper around SortedTree. >>>>>>>>>> Having a test would have caught the error >>>>>>>>>> that no relationship to the treeNode was created >>>>>>>>>> (fixed that bug and pushed it to Git) >>>>>>>>>> (note to self: always create a unit test, >>>>>>>>>> especially when code seems trivial). >>>>>>>>>> >>>>>>>>>> There is no relationship expander that uses this. >>>>>>>>>> The RelationshipExpander has a method Iterable<Relationship> >> expand(Node node) >>>>>>>>>> which cannot be supported, since there is no direct relationship >> from startnode to endnode. >>>>>>>>>> Instead there is a path through the index tree. >>>>>>>>>> >>>>>>>>>> It's not possible to support the original relationship-traversal >> API >>>>>>>>>> since the IndexedRelationship class is not a wrapper around a >> node, >>>>>>>>>> but a wrapper around the relationships of a certain >> RelationshipType in the OUTGOING direction. >>>>>>>>>> >>>>>>>>>> As to the name of the class. >>>>>>>>>> It is essentially an indexed relationship, >>>>>>>>>> and not just a solution to the densely-connected-node problem. >>>>>>>>>> An indexed relationship can also be used to maintain >>>>>>>>>> a sorted set of relationships of any size, >>>>>>>>>> and can be used to guarantee unicity constraints. Niels >>>>>>>>>>> From: michael.hun...@neotechnology.com >>>>>>>>>>> Date: Thu, 7 Jul 2011 13:27:00 +0200 >>>>>>>>>>> To: user@lists.neo4j.org >>>>>>>>>>> Subject: Re: [Neo4j] Indexed relationships >>>>>>>>>>> >>>>>>>>>>> Good work, >>>>>>>>>>> >>>>>>>>>>> do you have an example ready (and/or some tests that show how it >> works/is used) ? >>>>>>>>>>> >>>>>>>>>>> In creation, manual traversal and automatic traversal (i.e. is >> there a RelationshipExpander that uses it). >>>>>>>>>>> >>>>>>>>>>> And in the constructor if there is no relationship to the >> treeNode, you create a new one, but that new treeNode is not connected to >> the actual node? >>>>>>>>>>> >>>>>>>>>>> I'm not sure if it should support the original >> relationship-traversal API / methods (getRelationships(Dir,type), etc). >>>>>>>>>>> >>>>>>>>>>> Perhaps that IndexedRelationship should rather be just a wrapper >> around a SuperNode ? So probably rename it to "SuperNode(Wrapper) or >> HeavilyConnectedNode(Wrapper) ?) >>>>>>>>>>> >>>>>>>>>>> Cheers >>>>>>>>>>> >>>>>>>>>>> Michael >>>>>>>>>>> >>>>>>>>>>> Am 07.07.2011 um 12:51 schrieb Niels Hoogeveen: >>>>>>>>>>> >>>>>>>>>>>> Finished the implementation of indexed relationships. The graph >> collections component now contains the package >> https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship, >> containing the IndexedRelationship class. >>>>>>>>>>>> This class can be used instead of regular relationships >> when:relationships need to be stored in a particular sort ordera unicity >> constraint needs to be guaranteed nodes become densely populated with >> relationships. >>>>>>>>>>>> The implementation is traverser friendly. Given a start nodes >> all end nodes can be found by following four relationships types in outgoing >> direction. Given an end node the start node can be found by following these >> four relationship types in incoming direction. Of course this functionality >> is also covered in the API. >>>>>>>>>>>> Niels >>>>>>>>>>>> >>>>>>>>>>>>> From: pd_aficion...@hotmail.com >>>>>>>>>>>>> To: user@lists.neo4j.org >>>>>>>>>>>>> Date: Thu, 7 Jul 2011 02:36:29 +0200 >>>>>>>>>>>>> Subject: Re: [Neo4j] Indexed relationships >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Pushed SortedTree to Git after adding a unit test and doing >> some debugging. >>>>>>>>>>>>> TODO:Add API for indexed relationships using SortedTree as the >> implementation.Make SortedTree thread safe. >>>>>>>>>>>>> With regard to the latter issue. I am considering the >> following solution. Acquire a lock (delete a non existent property) on the >> node that points to the root of the tree at the start of AddNode, RemoveNode >> and Delete. No other node in the SortedTree is really stable, even the >> rootnode may be moved down, turning another node into the new rootnode, >> while after a couple of remove actions the original rootnode may even be >> deleted. >>>>>>>>>>>>> Locking the node pointing to the rootnode, prevents all other >> threads/transactions from updating the tree. This may seem restrictive, but >> a single new entry or a single removal may in fact have impact on much of >> the tree, due to balancing. More selective locking would require a >> prebalancing tree walk, determining the affected subtrees, lock them and >> once every affected subtree is locked, perform the actual balancing. >>>>>>>>>>>>> Please let me hear if there are any objections to locking the >> node pointing to the tree as the a solution to make SortedTree thread safe. >>>>>>>>>>>>> Niels >>>>>>>>>>>>> >>>>>>>>>>>>>> Date: Tue, 5 Jul 2011 08:27:57 +0200 >>>>>>>>>>>>>> From: neubauer.pe...@gmail.com >>>>>>>>>>>>>> To: user@lists.neo4j.org >>>>>>>>>>>>>> Subject: Re: [Neo4j] Indexed relationships >>>>>>>>>>>>>> >>>>>>>>>>>>>> Great work Nils! >>>>>>>>>>>>>> >>>>>>>>>>>>>> /peter >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sent from my phone. >>>>>>>>>>>>>> On Jul 4, 2011 11:39 PM, "Niels Hoogeveen"< >> pd_aficion...@hotmail.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> Made some more changes to the SortedTree implementation. >> Previously >>>>>>>>>>>>>> SortedTree would throw an exception if a duplicate entry was >> being added. >>>>>>>>>>>>>>> I changed SortedTree to allow a key to point to more than >> one node, unless >>>>>>>>>>>>>> the SortedTree is created as a unique index, in which case an >> exception is >>>>>>>>>>>>>> raised when an attempt is made to add a node to an existing >> key entry. >>>>>>>>>>>>>>> A SortedTree once defined as unique can not be changed to a >> non-unique >>>>>>>>>>>>>> index or vice-versa. >>>>>>>>>>>>>>> SortedTrees now have a name, which is stored in the a >> property of the >>>>>>>>>>>>>> TREE_ROOT relationship and in the KEY_VALUE relationship (a >> new relationship >>>>>>>>>>>>>> that points from the SortedTree to the Node inserted in the >> SortedTree). The >>>>>>>>>>>>>> name of a SortedTree can not be changed. >>>>>>>>>>>>>>> SortedTrees now store the class of the Comparator, so a >> SortedTree, once >>>>>>>>>>>>>> created, can not be used with a different Comparator. >>>>>>>>>>>>>>> SortedTree is now an Iterable, making it possible to use it >> in a >>>>>>>>>>>>>> foreach-loop. >>>>>>>>>>>>>>> Since there are as of yet, no unit tests for SortedTree, I >> will create >>>>>>>>>>>>>> those first before pushing my changes to Git. Preliminary >> results so far are >>>>>>>>>>>>>> good. I integrated the changes in my own application and it >> seems to work >>>>>>>>>>>>>> fine. >>>>>>>>>>>>>>> Todo: >>>>>>>>>>>>>>> Decide on an API for indexed relationships. (Community input >> still >>>>>>>>>>>>>> welcome).Write unit tests.Make SortedTree thread safe >> (Community help still >>>>>>>>>>>>>> welcome). >>>>>>>>>>>>>>> Niels >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> From: pd_aficion...@hotmail.com >>>>>>>>>>>>>>>> To: user@lists.neo4j.org >>>>>>>>>>>>>>>> Date: Mon, 4 Jul 2011 15:49:45 +0200 >>>>>>>>>>>>>>>> Subject: Re: [Neo4j] Indexed relationships >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I forgot to add another recurrent issue that can be solved >> with indexed >>>>>>>>>>>>>> relationships: guaranteed unicity constraints. >>>>>>>>>>>>>>>>> From: pd_aficion...@hotmail.com >>>>>>>>>>>>>>>>> To: user@lists.neo4j.org >>>>>>>>>>>>>>>>> Date: Mon, 4 Jul 2011 01:55:08 +0200 >>>>>>>>>>>>>>>>> Subject: [Neo4j] Indexed relationships >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> In the thread [Neo4j] traversing densely populated nodes >> we discussed >>>>>>>>>>>>>> the problems arising when large numbers of relationships are >> added to the >>>>>>>>>>>>>> same node. >>>>>>>>>>>>>>>>> Over the weekend, I have worked on a solution for the >>>>>>>>>>>>>> dense-relationship-nodes using SortedTree in the >> neo-graph-collections >>>>>>>>>>>>>> component. After some minor tweaks to the implementation of >> SortedTree, I >>>>>>>>>>>>>> have managed to get a workable solution, where two nodes are >> not directly >>>>>>>>>>>>>> linked by a relationship, but by means of a BTree (entirely >> stored in the >>>>>>>>>>>>>> graph). >>>>>>>>>>>>>>>>> Before continuing this work, I'd like to have a discussion >> about >>>>>>>>>>>>>> features, since what we have now is not just a solution for >> the dense >>>>>>>>>>>>>> populated node issue, but is actually a full fledges indexed >> relationship, >>>>>>>>>>>>>> which makes it suitable for other purposes too. >>>>>>>>>>>>>>>>> An indexed relationship can for example be used to >> maintain a sorted >>>>>>>>>>>>>> set of relationships in the graph, that is not necessarily >> huge, but large >>>>>>>>>>>>>> enough to make sorting on internal memory too expensive an >> operation, or >>>>>>>>>>>>>> situations where only one out of a large number of >> relationships is actually >>>>>>>>>>>>>> traversed in most cases. >>>>>>>>>>>>>>>>> There are probably more use cases for in-graph indexed >> relationships, >>>>>>>>>>>>>> so I'd like to know what features are desirable and what API >> would Neo4J >>>>>>>>>>>>>> users appreciate. >>>>>>>>>>>>>>>>> P.S. I still think it would be good to consider, if >> technically >>>>>>>>>>>>>> possible, partitioning the relationship store per >> relationship type and per >>>>>>>>>>>>>> direction. The indexed relationship solution works, but is of >> course slower >>>>>>>>>>>>>> than a direct relationship, both with respect to insert time >> and traversal >>>>>>>>>>>>>> time. If dense relationships are never traversed going out of >> the dense >>>>>>>>>>>>>> node, the extra structure maintained by the BTree is only >> extra burden. >>>>>>>>>>>>>>>>> P.P.S. If there are people with experience to make an >> implementation >>>>>>>>>>>>>> thread safe, please volunteer to help make the implementation >> production >>>>>>>>>>>>>> proof. >>>>>>>>>>>>>>>>> Niels >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> Neo4j mailing list >>>>>>>>>>>>>>>>> User@lists.neo4j.org >>>>>>>>>>>>>>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> Neo4j mailing list >>>>>>>>>>>>>>>> User@lists.neo4j.org >>>>>>>>>>>>>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> Neo4j mailing list >>>>>>>>>>>>>>> User@lists.neo4j.org >>>>>>>>>>>>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Neo4j mailing list >>>>>>>>>>>>>> User@lists.neo4j.org >>>>>>>>>>>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Neo4j mailing list >>>>>>>>>>>>> User@lists.neo4j.org >>>>>>>>>>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Neo4j mailing list >>>>>>>>>>>> User@lists.neo4j.org >>>>>>>>>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Neo4j mailing list >>>>>>>>>>> User@lists.neo4j.org >>>>>>>>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Neo4j mailing list >>>>>>>>>> User@lists.neo4j.org >>>>>>>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Neo4j mailing list >>>>>>>>> User@lists.neo4j.org >>>>>>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Neo4j mailing list >>>>>>>> User@lists.neo4j.org >>>>>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>>>> _______________________________________________ >>>>>>> Neo4j mailing list >>>>>>> User@lists.neo4j.org >>>>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>>> >>>>>> _______________________________________________ >>>>>> Neo4j mailing list >>>>>> User@lists.neo4j.org >>>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Neo4j mailing list >>>>> User@lists.neo4j.org >>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>> >>>> _______________________________________________ >>>> Neo4j mailing list >>>> User@lists.neo4j.org >>>> https://lists.neo4j.org/mailman/listinfo/user >>> >>> _______________________________________________ >>> Neo4j mailing list >>> User@lists.neo4j.org >>> https://lists.neo4j.org/mailman/listinfo/user >> >> _______________________________________________ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user