Hail :) Beware, lot of noise follows (ie. webnoise kinda noise), while I thank anyone in advance for reading, I must also apologize for failing to be concise :-" On Sat, Jul 23, 2011 at 10:42 AM, Michael Hunger < michael.hun...@neotechnology.com> wrote:
> John, > > no problem :) > > You pointed out both problems: > - cold caches > - lots of rels on the one side > > There are some performance implications of loading millions of rels from > disk. We'll be working on improving those in 1.5. > > The easiest way to solve that is to switch start and end-node which you > already did. It is much easier for you _with domain knowledge_ about the > graph than for the algorithm. > Yeah I would probably apply that manually, that is: call a particular method when I know which one of the nodes has least rels. (as a ' `temporary` or `acceptable so far` workaround ' [u see the graph here?! xD]* I will for now use findSinglePath as it is*, simply because I don't want the overhead of using a particular method for some use case scenario and the generic one for others - besides it's fast enough, for now) My problem here is that I might not know (in the future) which node (first or second) will have the least nodes. So I was hoping to create a generic method for such find. For now, I could say, I can know. For example: AllLists->A AllLists->B AllLists->C where A,B,C are considered nodes representing lists, in this case, I could maybe say that while AllLists can have lots of outgoing rels (aka lots of lists), list A may not have that many nodes pointing to it (aka list is not used in as many places as there are lists), so I could consider A as being node with least nodes. Though I would want a generic method that would always know which node has least nodes and parse that one, especially in my 1 hop case. I am, as seen here, basically tagging the nodes A,B,C with AllLists, by having AllLists point to them, and this way, I know that node A is of AllLists type (so to speak). This was my old way of thinking while using bdb je, I might need to revise it since then I was completely ignoring relationships and not considering them as a single and accessible object. (I am currently trying to port that old way of thinking to neo4j) In this old way, I was using the node AllLists pointing/outgoing to A as being (in neo4j) a relationship of the type AllLists incoming to A. > > Especially if you have > 1 hop paths. > In my particular use case, I will probably not (yet!) have cases with >1 hop. (I can't really visualize that far in the future xD ) but for sure I will have lots of ==1 hops cases where I just want to see if thisnode is tagged with thatnode by checking thatnode-->thisnode > > There might be ways in improving the algorithm, e.g. by iterating both > sides at the same time, which would lead to the end with the fewer > relationships being exhausted and resolved first. > for now, I totally avoided getting into the findSinglePath 's source and considering doing changes there or even seeing how it works:) But that algo sounds like something I'd do, in like 2 (reusable) threads even xD (I remember having considered this in the past, when using bdb je but I opted to parse the node with least rels instead) > > How large is your graph at all? And how is it structured, e.g. how many > rels do the other 9 nodes have that your second node points to? > I was merely trying this as an example(but this should be a practical use case in my project), the thing is, the other 9 nodes can have any number or rels, in fact *any node could have any number of rels incoming and outgoing from it*,... (but in this case, the 9 nodes each had a new unused node point(outgoing) to them; I'd post the source but i'm already using wrappers and the whole example would be more than 1 java file, but I can link to it: https://github.com/13th-floor/neo4john/blob/52e2d75a39a28200b34032aaf45a5a09c1e1b22c/src/org/benchtests/neo4j/TestLinkage.java that's the main file to run, tho you don't have to and you prolly don't have the time; just putting it here for consistency/historical-reasons I guess xD ) In my project, I am trying something generic... something like building sets and lists of nodes based on graphs. Imagine having no metadata key-value properties, and instead *trying to use only nodes to represent *those properties (well actually I do have one key aka "name" so i could reference the same node id across application restarts). > > What are both times in a second or third run (cached) ? > Path from one to two: (one)-->(two) timedelta=2,151,018,862 ns Path from one to two: (one)-->(two) timedelta=131,458,292 ns Path from one to two: (one)-->(two) timedelta=119,015,019 ns another run: Path from one to two: (one)-->(two) timedelta=1,824,099,688 ns Path from one to two: (one)-->(two) timedelta=275,772,087 ns Path from one to two: (one)-->(two) timedelta=352,562,121 ns ---- and when swapped: Path from one to two: (two)<--(one) timedelta=20,601,181 ns Path from one to two: (two)<--(one) timedelta=284,212 ns Path from one to two: (two)<--(one) timedelta=255,431 ns and another run: Path from one to two: (two)<--(one) timedelta=20,513,338 ns Path from one to two: (two)<--(one) timedelta=280,314 ns Path from one to two: (two)<--(one) timedelta=255,731 ns (I almost got scared there on the second and third xD) and here's if I count them first (twice): total relations count=100,011 timedelta=3,077,046,964 ns total relations count=100,011 timedelta=151,727,293 ns Path from one to two: (one)-->(two) timedelta=173,697,069 ns Path from one to two: (one)-->(two) timedelta=70,160,217 ns Path from one to two: (one)-->(two) timedelta=91,760,936 ns ---- swapped: total relations count=100,011 timedelta=3,219,749,623 ns total relations count=100,011 timedelta=151,675,727 ns Path from one to two: (two)<--(one) timedelta=10,717,914 ns Path from one to two: (two)<--(one) timedelta=204,465 ns Path from one to two: (two)<--(one) timedelta=143,005 ns > > Relationships are stored as a chained/linked list in the disk format so it > is not possible to know how many there are per node. > Will this(storage way) possibly change in 1.5 or in the future? I don't really know how ie. berkeleydb je stores key-value pairs (BTree?) but it might potential help get a perspective on that(actually I believe y'all know already); In bdb I am able to count the number of values for the same key (using a cursor via *cursor.count()* ) By the way, I was storing relationships in bdb je, like this: key->value where each *key *would be the *start node*, and each *value *would be *end node*, thus a key->value would form a relationship, though I would never consider a relationship as a single object(I would always need both start and end nodes to identify/access a relationship). And a key could have more than one value associated with it. (one to many) ie. sameKey->value1 sameKey->value2 sameKey->value3 and using a cursor in bdb, I would be able to parse all values (and count() them tho not sure how it would do that internally, hopefully not by parsing them all). Of course i would have a second database storing them backwards(endnode as key, startnode as value) like: value1->sameKey value2->sameKey value3->sameKey this way I would be able to search by endnode too, since I could only find stuff (BTree/hash fast) only by looking up by key. (each key/value would be a nodeID btw; I would also have another database for nodeID->String and String->nodeID, so I can search for both, where the equivalent of that in neo4j would be the node having a "name" property with that String value - this so I could access the same nodeID access java application restarts, that is by using that fixed String) My problem with bdb je, was that I couldn't find a way (for my use case) to use transactions without eventually deadlocking...(but if I was as good as you guys I probably would've found it)... so last time I totally disabled the use of transactions so that it would seem simpler while I would focus on some other project aspect. (ie. parsing from value1 to value 3 in one thread, and from value3 to value1 in another; but not just this case; not to mention *it doesn't support nested transactions for java edition*) > > Cheers > Cheerios =)) > > Michael > John PS: btw, this is a lifelong project(so to speak), which I failed to complete or even begin (in 6 years so far) due to postponing it each time I ran into problems (I know this isn't the way to do it like that, instead keep at it and maybe allow compromises - so fail on my part). And finding your neo4j database is giving me a new opportunity to redo it using neo4j instead of bdbje :) I know(!) graph databases(and stuff based on them, that is: THAT kind of connectivity/accessibility) are the future, and I aim for immediate accessibility and customizability ;) > > Am 23.07.2011 um 04:42 schrieb John cyuczieekc: > > > Hey guys, me bugging you again :) > > > > (This whole thing is kind of based on the lack of being able to get the > > number of relationships a node has) > > > > If I have two nodes, and the first one has 1 million outgoing > relationships > > of the type X to 1 million unique/different nodes, > > and the second node has 10 incoming relationships of type X (same type) > of > > which one is from the first node, > > then using GraphAlgoFactory.shortestPath (or suggest a better way?) > > How can I tell neo4j to iterate the search on the second node's incoming > > rels simply because it has 10 relationships instead of 1 million, in > order > > to check if each relationship is in the form of firstNode-->secondNode ? > > > > For the case when first node has 100,000 relationships and second node > has > > 10, > > it takes *1.7 seconds* for shortestPath to find the only one link between > > them using: > > > > final PathFinder<Path> finder = GraphAlgoFactory.shortestPath( > > Traversal.expanderForTypes( rel, Direction.OUTGOING ), 1 ); > > final Path foundPath = finder.findSinglePath( *one, two* ); > > > > I can put Direction.*BOTH *and get the same amount of time > > *Path from one to two: (one)-->(two) timedelta=1,862,726,634 ns* > > > > *BUT*, get this: if I swap the nodes: > > finder.findSinglePath(* two, one*); > > and i use either Direction.INCOMING or Direction.*BOTH *(which makes > sense > > for the second node ,right) then I get *20ms* the time until it > finishes... > > *Path from one to two: (two)<--(one) timedelta=20,830,111 ns* > > > > (both cases are without data being priorly cached) > > > > I was expecting it to act like this: (but only when using Direction.BOTH) > > see which node has the least number of relationships and iterate on > those, > > but this would work if findSinglePath would be made for depth 1 (aka > > particular case), but as I read "Tries to find a single path between > startand > > end nodes." then it makes sense to me why it works like it does... that > is, > > iterate on relationships from start node, rather than from end node... > but > > I'm not sure if it would *not *make sense to iterate on the end node > instead > > of start node, when knowing that end node has less relationships, for > make > > the search faster (well at least if depth is one) - I didn't look into > how > > neo4j actually does stuff yet :D > > > > anyway, it's fairly clear to me that I could make a simple wrapper method > to > > make this kind of search faster, *IF* I had the ability to know how many > > relationships each node has, so I can call findSinglePath with the first > > param being the node with the least relationship count :) But as I > > understood it, it's not possible to find how many rels a node has... > gimme > > feat! :)) [by not possible I mean, without having to iterate thru all and > > count them, which would make the use case here obsolete] > > > > PS: clearly all the text I wrote here would benefit from being > represented > > by a graph, just think about all those grouping with autohiding the ie. > "[]" > > and all kinds of stuff... heh > > _______________________________________________ > > Neo4j mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user