Perhaps you can catch up with Niels Hoogeven who's currently implementing Collection data structures, B- and R-Trees to be transparently mapped into graph structures at:
https://github.com/peterneubauer/graph-collections Cheers Michael Perhaps you should describe the intent / idea behind your project. Dropping too early into implementation details might remove some possibilities that are able with graph databases by loosing the big picture. And I wouldn't care about performance right now (btw. System.currentTimeMillis() is mostly good enough for measurement). But rather on modeling your domain to the graph and specify which concepts and operations/use-cases you'd like to support. You can optimize afterwards anyway. Am 23.07.2011 um 14:29 schrieb John cyuczieekc: > Hail :) > Beware, lot of noise follows (ie. webnoise kinda noise), while I thank > anyone in advance for reading, I must also apologize for failing to be > concise :-" > On Sat, Jul 23, 2011 at 10:42 AM, Michael Hunger < > michael.hun...@neotechnology.com> wrote: > >> John, >> >> no problem :) >> >> You pointed out both problems: >> - cold caches >> - lots of rels on the one side >> >> There are some performance implications of loading millions of rels from >> disk. We'll be working on improving those in 1.5. >> >> The easiest way to solve that is to switch start and end-node which you >> already did. It is much easier for you _with domain knowledge_ about the >> graph than for the algorithm. >> > Yeah I would probably apply that manually, that is: call a particular method > when I know which one of the nodes has least rels. (as a ' `temporary` or > `acceptable so far` workaround ' [u see the graph here?! xD]* I will for now > use findSinglePath as it is*, simply because I don't want the overhead of > using a particular method for some use case scenario and the generic one for > others - besides it's fast enough, for now) > My problem here is that I might not know (in the future) which node (first > or second) will have the least nodes. So I was hoping to create a generic > method for such find. > For now, I could say, I can know. > For example: > AllLists->A > AllLists->B > AllLists->C > where A,B,C are considered nodes representing lists, in this case, I could > maybe say that while AllLists can have lots of outgoing rels (aka lots of > lists), list A may not have that many nodes pointing to it (aka list is not > used in as many places as there are lists), so I could consider A as being > node with least nodes. > Though I would want a generic method that would always know which node has > least nodes and parse that one, especially in my 1 hop case. > I am, as seen here, basically tagging the nodes A,B,C with AllLists, by > having AllLists point to them, and this way, I know that node A is of > AllLists type (so to speak). This was my old way of thinking while using bdb > je, I might need to revise it since then I was completely ignoring > relationships and not considering them as a single and accessible object. (I > am currently trying to port that old way of thinking to neo4j) > In this old way, I was using the node AllLists pointing/outgoing to A as > being (in neo4j) a relationship of the type AllLists incoming to A. > >> >> Especially if you have > 1 hop paths. >> > In my particular use case, I will probably not (yet!) have cases with >1 > hop. (I can't really visualize that far in the future xD ) but for sure I > will have lots of ==1 hops cases where I just want to see if thisnode is > tagged with thatnode by checking thatnode-->thisnode > >> >> There might be ways in improving the algorithm, e.g. by iterating both >> sides at the same time, which would lead to the end with the fewer >> relationships being exhausted and resolved first. >> > for now, I totally avoided getting into the findSinglePath 's source and > considering doing changes there or even seeing how it works:) > But that algo sounds like something I'd do, in like 2 (reusable) threads > even xD (I remember having considered this in the past, when using bdb je > but I opted to parse the node with least rels instead) > >> >> How large is your graph at all? And how is it structured, e.g. how many >> rels do the other 9 nodes have that your second node points to? >> > I was merely trying this as an example(but this should be a practical use > case in my project), the thing is, the other 9 nodes can have any number or > rels, in fact *any node could have any number of rels incoming and outgoing > from it*,... (but in this case, the 9 nodes each had a new unused node > point(outgoing) to them; I'd post the source but i'm already using wrappers > and the whole example would be more than 1 java file, but I can link to it: > https://github.com/13th-floor/neo4john/blob/52e2d75a39a28200b34032aaf45a5a09c1e1b22c/src/org/benchtests/neo4j/TestLinkage.java > that's the main file to run, tho you don't have to and you prolly don't have > the time; just putting it here for consistency/historical-reasons I guess xD > ) > In my project, I am trying something generic... something like building > sets and lists of nodes based on graphs. Imagine having no metadata > key-value properties, and instead *trying to use only nodes to represent > *those > properties (well actually I do have one key aka "name" so i could reference > the same node id across application restarts). > > >> >> What are both times in a second or third run (cached) ? >> > Path from one to two: (one)-->(two) timedelta=2,151,018,862 ns > Path from one to two: (one)-->(two) timedelta=131,458,292 ns > Path from one to two: (one)-->(two) timedelta=119,015,019 ns > another run: > Path from one to two: (one)-->(two) timedelta=1,824,099,688 ns > Path from one to two: (one)-->(two) timedelta=275,772,087 ns > Path from one to two: (one)-->(two) timedelta=352,562,121 ns > ---- and when swapped: > Path from one to two: (two)<--(one) timedelta=20,601,181 ns > Path from one to two: (two)<--(one) timedelta=284,212 ns > Path from one to two: (two)<--(one) timedelta=255,431 ns > and another run: > Path from one to two: (two)<--(one) timedelta=20,513,338 ns > Path from one to two: (two)<--(one) timedelta=280,314 ns > Path from one to two: (two)<--(one) timedelta=255,731 ns > (I almost got scared there on the second and third xD) > and here's if I count them first (twice): > total relations count=100,011 timedelta=3,077,046,964 ns > total relations count=100,011 timedelta=151,727,293 ns > Path from one to two: (one)-->(two) timedelta=173,697,069 ns > Path from one to two: (one)-->(two) timedelta=70,160,217 ns > Path from one to two: (one)-->(two) timedelta=91,760,936 ns > ---- swapped: > total relations count=100,011 timedelta=3,219,749,623 ns > total relations count=100,011 timedelta=151,675,727 ns > Path from one to two: (two)<--(one) timedelta=10,717,914 ns > Path from one to two: (two)<--(one) timedelta=204,465 ns > Path from one to two: (two)<--(one) timedelta=143,005 ns > >> >> Relationships are stored as a chained/linked list in the disk format so it >> is not possible to know how many there are per node. >> > Will this(storage way) possibly change in 1.5 or in the future? I don't > really know how ie. berkeleydb je stores key-value pairs (BTree?) but it > might potential help get a perspective on that(actually I believe y'all know > already); > In bdb I am able to count the number of values for the same key (using a > cursor via *cursor.count()* ) > By the way, I was storing relationships in bdb je, like this: > key->value > where each *key *would be the *start node*, and each *value *would be *end > node*, thus a key->value would form a relationship, though I would never > consider a relationship as a single object(I would always need both start > and end nodes to identify/access a relationship). And a key could have more > than one value associated with it. (one to many) > ie. > sameKey->value1 > sameKey->value2 > sameKey->value3 > and using a cursor in bdb, I would be able to parse all values (and count() > them tho not sure how it would do that internally, hopefully not by parsing > them all). > Of course i would have a second database storing them backwards(endnode as > key, startnode as value) like: > value1->sameKey > value2->sameKey > value3->sameKey > this way I would be able to search by endnode too, since I could only find > stuff (BTree/hash fast) only by looking up by key. > (each key/value would be a nodeID btw; I would also have another database > for nodeID->String and String->nodeID, so I can search for both, where the > equivalent of that in neo4j would be the node having a "name" property with > that String value - this so I could access the same nodeID access java > application restarts, that is by using that fixed String) > > My problem with bdb je, was that I couldn't find a way (for my use case) to > use transactions without eventually deadlocking...(but if I was as good as > you guys I probably would've found it)... so last time I totally disabled > the use of transactions so that it would seem simpler while I would focus on > some other project aspect. > (ie. parsing from value1 to value 3 in one thread, and from value3 to value1 > in another; but not just this case; not to mention *it doesn't support > nested transactions for java edition*) > >> >> Cheers >> > Cheerios =)) > >> >> Michael >> > John > > > PS: btw, this is a lifelong project(so to speak), which I failed to complete > or even begin (in 6 years so far) due to postponing it each time I ran into > problems (I know this isn't the way to do it like that, instead keep at it > and maybe allow compromises - so fail on my part). And finding your neo4j > database is giving me a new opportunity to redo it using neo4j instead of > bdbje :) I know(!) graph databases(and stuff based on them, that is: THAT > kind of connectivity/accessibility) are the future, and I aim for immediate > accessibility and customizability ;) > >> >> Am 23.07.2011 um 04:42 schrieb John cyuczieekc: >> >>> Hey guys, me bugging you again :) >>> >>> (This whole thing is kind of based on the lack of being able to get the >>> number of relationships a node has) >>> >>> If I have two nodes, and the first one has 1 million outgoing >> relationships >>> of the type X to 1 million unique/different nodes, >>> and the second node has 10 incoming relationships of type X (same type) >> of >>> which one is from the first node, >>> then using GraphAlgoFactory.shortestPath (or suggest a better way?) >>> How can I tell neo4j to iterate the search on the second node's incoming >>> rels simply because it has 10 relationships instead of 1 million, in >> order >>> to check if each relationship is in the form of firstNode-->secondNode ? >>> >>> For the case when first node has 100,000 relationships and second node >> has >>> 10, >>> it takes *1.7 seconds* for shortestPath to find the only one link between >>> them using: >>> >>> final PathFinder<Path> finder = GraphAlgoFactory.shortestPath( >>> Traversal.expanderForTypes( rel, Direction.OUTGOING ), 1 ); >>> final Path foundPath = finder.findSinglePath( *one, two* ); >>> >>> I can put Direction.*BOTH *and get the same amount of time >>> *Path from one to two: (one)-->(two) timedelta=1,862,726,634 ns* >>> >>> *BUT*, get this: if I swap the nodes: >>> finder.findSinglePath(* two, one*); >>> and i use either Direction.INCOMING or Direction.*BOTH *(which makes >> sense >>> for the second node ,right) then I get *20ms* the time until it >> finishes... >>> *Path from one to two: (two)<--(one) timedelta=20,830,111 ns* >>> >>> (both cases are without data being priorly cached) >>> >>> I was expecting it to act like this: (but only when using Direction.BOTH) >>> see which node has the least number of relationships and iterate on >> those, >>> but this would work if findSinglePath would be made for depth 1 (aka >>> particular case), but as I read "Tries to find a single path between >> startand >>> end nodes." then it makes sense to me why it works like it does... that >> is, >>> iterate on relationships from start node, rather than from end node... >> but >>> I'm not sure if it would *not *make sense to iterate on the end node >> instead >>> of start node, when knowing that end node has less relationships, for >> make >>> the search faster (well at least if depth is one) - I didn't look into >> how >>> neo4j actually does stuff yet :D >>> >>> anyway, it's fairly clear to me that I could make a simple wrapper method >> to >>> make this kind of search faster, *IF* I had the ability to know how many >>> relationships each node has, so I can call findSinglePath with the first >>> param being the node with the least relationship count :) But as I >>> understood it, it's not possible to find how many rels a node has... >> gimme >>> feat! :)) [by not possible I mean, without having to iterate thru all and >>> count them, which would make the use case here obsolete] >>> >>> PS: clearly all the text I wrote here would benefit from being >> represented >>> by a graph, just think about all those grouping with autohiding the ie. >> "[]" >>> and all kinds of stuff... heh >>> _______________________________________________ >>> Neo4j mailing list >>> User@lists.neo4j.org >>> https://lists.neo4j.org/mailman/listinfo/user >> >> _______________________________________________ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user