ok fully reverting the entire project worked (that is without that fix): with Direction.BOTH: (one)-->(three) 8,438 ms (one)-->(two) 873 ms (three)<--(one) 0 ms (two)<--(one) 0 ms (one)-->(three) 734 ms (one)-->(two) 743 ms (three)<--(one) 0 ms (two)<--(one) 0 ms (one)-->(three) 784 ms (one)-->(two) 621 ms (three)<--(one) 0 ms (two)<--(one) 0 ms (one)-->(three) 682 ms (one)-->(two) 733 ms (three)<--(one) 0 ms (two)<--(one) 0 ms (one)-->(three) 620 ms (one)-->(two) 683 ms (three)<--(one) 0 ms (two)<--(one) 0 ms with Direction.INCOMING: (three)<--(one) 0 ms (two)<--(one) 0 ms (three)<--(one) 0 ms (two)<--(one) 0 ms (three)<--(one) 0 ms (two)<--(one) 0 ms (three)<--(one) 0 ms (two)<--(one) 0 ms (three)<--(one) 0 ms (two)<--(one) 0 ms with Direction.OUTGOING: (one)-->(three) 762 ms (one)-->(two) 621 ms (one)-->(three) 712 ms (one)-->(two) 631 ms (one)-->(three) 672 ms (one)-->(two) 773 ms (one)-->(three) 620 ms (one)-->(two) 743 ms (one)-->(three) 620 ms (one)-->(two) 683 ms Environment is about to shut down the neo4j database Environment shutting down complete 2,721 ms
and with that *fix*: with Direction.BOTH: (one)-->(three) 8,710 ms (one)-->(two) 0 ms (three)<--(one) 0 ms (two)<--(one) 0 ms (one)-->(three) 1,143 ms (one)-->(two) 0 ms (three)<--(one) 0 ms (two)<--(one) 0 ms (one)-->(three) 763 ms (one)-->(two) 0 ms (three)<--(one) 0 ms (two)<--(one) 0 ms (one)-->(three) 814 ms (one)-->(two) 0 ms (three)<--(one) 0 ms (two)<--(one) 0 ms (one)-->(three) 897 ms (one)-->(two) 0 ms (three)<--(one) 0 ms (two)<--(one) 0 ms with Direction.INCOMING: (three)<--(one) 0 ms (two)<--(one) 0 ms (three)<--(one) 0 ms (two)<--(one) 0 ms (three)<--(one) 0 ms (two)<--(one) 0 ms (three)<--(one) 0 ms (two)<--(one) 0 ms (three)<--(one) 0 ms (two)<--(one) 0 ms with Direction.OUTGOING: (one)-->(three) 630 ms (one)-->(two) 0 ms (one)-->(three) 763 ms (one)-->(two) 0 ms (one)-->(three) 842 ms (one)-->(two) 0 ms (one)-->(three) 621 ms (one)-->(two) 0 ms (one)-->(three) 742 ms Environment is about to shut down the neo4j database (one)-->(two) 0 ms Environment shutting down complete 2,711 ms So yeah, one-->two got fixed, what say you about one-->three though ? :D *whistle* :-" On Sun, Jul 24, 2011 at 5:28 PM, John cyuczieekc <cyuczie...@gmail.com>wrote: > updated to latest from github, "Stops the algo as soon as possible in > findSinglePath" > > graphdb contains: > one-->three > one-->{ a million other random nodes } > one-->two > { 10 random nodes } --> two > > (added in that order, except that `one-->two` is somewhere between those 10 > random nodes) > > output: > > with Direction.BOTH: > (one)-->(three) 7,660 ms > (one)-->(two) 0 ms > (three)<--(one) 0 ms > (two)<--(one) 0 ms > (one)-->(three) 2,195 ms > (one)-->(two) 0 ms > *(three)<--(one) 0 ms* > (two)<--(one) 0 ms > *(one)-->(three) 875 ms* > (one)-->(two) 0 ms > (three)<--(one) 0 ms > (two)<--(one) 0 ms > (one)-->(three) 630 ms > (one)-->(two) 0 ms > (three)<--(one) 0 ms > (two)<--(one) 0 ms > (one)-->(three) 723 ms > (one)-->(two) 0 ms > (three)<--(one) 0 ms > (two)<--(one) 0 ms > with Direction.INCOMING: > (three)<--(one) 0 ms > (two)<--(one) 0 ms > (three)<--(one) 0 ms > (two)<--(one) 0 ms > (three)<--(one) 0 ms > (two)<--(one) 0 ms > (three)<--(one) 0 ms > (two)<--(one) 0 ms > (three)<--(one) 0 ms > (two)<--(one) 0 ms > with Direction.OUTGOING: > (one)-->(three) 802 ms > (one)-->(two) 0 ms > with Direction.OUTGOING: > (one)-->(three) 631 ms > (one)-->(two) 0 ms > with Direction.OUTGOING: > (one)-->(three) 732 ms > (one)-->(two) 0 ms > with Direction.OUTGOING: > (one)-->(three) 873 ms > (one)-->(two) 0 ms > with Direction.OUTGOING: > (one)-->(three) 630 ms > > Environment is about to shut down the neo4j database > (one)-->(two) 0 ms > Environment shutting down complete 3,171 ms > > For some reason, one-->three takes more than 0ms, even though three has > only 1 incoming rel, and it's the first rel of `one` that outgoes to `three` > > (for whoever cares)the test program used is in this commit here: > > https://github.com/13th-floor/neo4john/commit/b819a0d418d953a675aaa74749f284b88e4f47ee > > I tried to revert that commit, and for some reason I'm getting the same > results [one-->three is the only one over 0ms] (maybe I failed in reverting > it, though the 2 sources seem reverted) > > Peace :) > > > On Sun, Jul 24, 2011 at 4:49 PM, Mattias Persson < > matt...@neotechnology.com> wrote: > >> It doesn't matter since the algorithm is bi-directional... so: >> >> one --> two >> >> will start from "one" OUTGOING and "two" INCOMING, whereas >> >> two <-- one >> >> will start from "two" INCOMING and "one" OUTGOING.... see, no difference. >> >> It alternates side for each relationship. It will, however depend on where >> the INCOMING/OUTGOING relationships reside in the relationship chain, but >> sticking to this discussion: those two calls will yield the exact same >> speed. Though I just discovered a little thingie where findSinglePath >> didn't >> stop right away when after finding the first one, but now it does! >> >> 2011/7/24 Niels Hoogeveen <pd_aficion...@hotmail.com> >> >> > >> > Are you sure this is true, Mattias?The response time of a >> getRelationship >> > call depends on the total number of relationships on the node. So it >> makes a >> > difference which side of the relationship makes the call. It is always >> > faster to ask it from the side that has the lowest total number of >> > relationships attached. This is even true, if for both sides of the >> > relationship there is only one relationship of that particular >> relationship >> > type.Niels >> > > Date: Sun, 24 Jul 2011 16:10:42 +0200 >> > > From: matt...@neotechnology.com >> > > To: user@lists.neo4j.org >> > > Subject: Re: [Neo4j] shortestPath slower than it could be >> > > >> > > What I'm saying is that: >> > > >> > > one --> two (one,two) for OUTGOING >> > > >> > > and >> > > >> > > two <-- one (two,one) for INCOMING >> > > >> > > should yield the same timing. >> > > >> > > 2011/7/24 John cyuczieekc <cyuczie...@gmail.com> >> > > >> > > > Thanks Mattias. The way I understand what you said is, that swapping >> > `one` >> > > > and `two` in *finder.findSinglePath( one, two );* should yield the >> > same >> > > > timing when Direction.BOTH is used; this would be great. But I don't >> > see >> > > > how >> > > > this is possible (due to not knowing how it's stored too) unless you >> > know >> > > > how many rels each node has, OR even better storage is using BTree-s >> > (?!) >> > > > Btw, is it possible/practical that neo4j could store IDs in BTrees ? >> > > > (someone said that for rels it's a double linked list instead- I'll >> > need to >> > > > recheck) >> > > > >> > > > irrelevant stuff follows (ie. don't read) >> > > > ---- >> > > > Until then , I lame-tested something (with both neo4j and >> berkeleydb): >> > > > a node `one` having 1mil rels to 1million unique nodes, and one more >> to >> > a >> > > > node `two`, and node `two` having 10 incoming rels from other unique >> > nodes, >> > > > +1 the one rel which was already from `one`, trying shortestPath >> > between >> > > > `one` and `two` (not `two` and `one`) with Direction.BOTH >> > > > output: >> > > > just created 1,000,000 rels, took=58,381 ms (1) >> > > > Path from one to two: (one)-->(two) 121,310 ms (2) >> > > > Path from one to two: (one)-->(two) timedelta=1,066 ms >> > > > Path from one to two: (one)-->(two) timedelta=795 ms >> > > > Path from one to two: (one)-->(two) timedelta=772 ms >> > > > >> > > > (1) and (2) happened in the same transaction (ie. before >> tx.finish()), >> > the >> > > > others after transaction finished (an in no new transaction - since >> > they >> > > > were only reads) - this also means some caching must've happened >> from >> > > > before. >> > > > Because it happened in same transaction, adding 1 mil relationships >> is >> > > > slower than adding them in bursts of x, I am aware of this (and I'll >> > try to >> > > > make a bench with that). For ie. 100k nodes neo4j is actually >> faster. >> > > > I even got this once (but in another modified bench): >> > > > first time creating the relationships... >> > > > just created 1,000,000 rels, took=49,307 ms (cpu was 100% here, >> instead >> > of >> > > > the usual 77% limit) >> > > > Exception in thread "main" java.lang.OutOfMemoryError: Java heap >> space >> > > > Environment is about to shut down the neo4j database >> > > > Exception in thread "Thread-0" java.lang.OutOfMemoryError: GC >> overhead >> > > > limit >> > > > exceeded >> > > > at java.util.logging.LogManager.reset(LogManager.java:835) >> > > > at java.util.logging.LogManager$Cleaner.run(LogManager.java:240) >> > > > Environment shutting down complete 23,962 ms >> > > > >> > > > Exception: java.lang.OutOfMemoryError thrown from the >> > > > UncaughtExceptionHandler in thread "main" >> > > > >> > > > >> > > > this is the output when running cold, (ie. the data already >> existed): >> > > > search started >> > > > Path from one to two: (one)-->(two) timedelta=7,934 ms >> > > > search started >> > > > Path from one to two: (one)-->(two) timedelta=2,027 ms >> > > > search started >> > > > Path from one to two: (one)-->(two) timedelta=875 ms >> > > > Environment is about to shut down the neo4j database >> > > > Environment shutting down complete 2,971 ms >> > > > >> > > > and the exact same thing when using berkeleydb (not using >> transactions >> > > > though): >> > > > just created 1,000,000 rels, took=29,418 ms >> > > > Path from one to two: true 0 ms >> > > > Path from one to two: true 0 ms >> > > > Path from one to two: true 0 ms >> > > > class org.bdb.BerkEnv$1 shutting down complete 0 ms (most was 10 ms >> > here) >> > > > >> > > > and when cold: >> > > > Path from one to two: true 10 ms >> > > > Path from one to two: true 0 ms >> > > > Path from one to two: true 0 ms >> > > > class org.bdb.BerkEnv$1 shutting down complete 0 ms >> > > > (the most was 10ms on first(sometimes 0ms, 10ms, 0ms), least was 0ms >> on >> > > > all) >> > > > >> > > > in bdb, this is actually how bdb does the search (I don't need to >> make >> > sure >> > > > I iterate or something similar, on the node `two` which has least >> > incoming >> > > > rels), in fact I execute a search on both nodes, one after the other >> > and >> > > > make sure one->two and two<-one (since I am using two databases), >> > anyway, >> > > > by >> > > > using db.getSearchBoth(...) the search is done *that *fast. I hope >> to >> > see >> > > > this kind of fastness with neo4j too xD (unless I'm doing it wrong) >> > > > >> > > > if I put neo4j to find shortest path on incoming on `two` then: >> > > > search started >> > > > Path from one to two: (two)<--(one) timedelta=20 ms >> > > > search started >> > > > Path from one to two: (two)<--(one) timedelta=0 ms >> > > > search started >> > > > Path from one to two: (two)<--(one) timedelta=0 ms >> > > > Environment is about to shut down the neo4j database >> > > > Environment shutting down complete 2,590 ms >> > > > (most was 20ms, least was 20ms on first) >> > > > >> > > > TestLinkage.java (both progs) were used from here: >> > > > >> > > > >> > >> https://github.com/13th-floor/neo4john/commit/a9f4b274de1d6c9ec9f1ea4a338b5c42325f19a4 >> > > > >> > > > -------------- >> > > > >> > > > Here's a lame benchmark for neo4j when I added another node `three` >> > which >> > > > is >> > > > the first rel `one`->`three`, then added 1mil `one`->(random nodes) >> , >> > and >> > > > then(as before) `one`->`two` >> > > > `three` has no other relationships >> > > > Also, creating nodes 10k per transaction until 1mil are reached. >> > > > >> > > > first time creating the relationships... >> > > > just created 1,000,000 rels, took=28,889 ms (btw my cpu is limited >> to >> > 77% >> > > > of its normal speed; usually!) >> > > > closed transaction >> > > > search started >> > > > (one)-->(three) 3,234 ms >> > > > search started >> > > > (one)-->(two) 661 ms >> > > > search started >> > > > (three)<--(one) 10 ms >> > > > search started >> > > > (two)<--(one) 0 ms >> > > > search started >> > > > (one)-->(three) 631 ms >> > > > search started >> > > > (one)-->(two) 913 ms >> > > > search started >> > > > (three)<--(one) 0 ms >> > > > search started >> > > > (two)<--(one) 0 ms >> > > > search started >> > > > (one)-->(three) 610 ms >> > > > search started >> > > > (one)-->(two) 773 ms >> > > > search started >> > > > (three)<--(one) 0 ms >> > > > search started >> > > > (two)<--(one) 0 ms >> > > > Environment is about to shut down the neo4j database >> > > > Environment shutting down complete 3,801 ms >> > > > >> > > > and on cold run: >> > > > search started >> > > > (one)-->(three) 8,454 ms >> > > > search started >> > > > (one)-->(two) 885 ms >> > > > search started >> > > > (three)<--(one) 0 ms >> > > > search started >> > > > (two)<--(one) 0 ms >> > > > search started >> > > > (one)-->(three) 752 ms >> > > > search started >> > > > (one)-->(two) 755 ms >> > > > search started >> > > > (three)<--(one) 0 ms >> > > > search started >> > > > (two)<--(one) 0 ms >> > > > search started >> > > > (one)-->(three) 804 ms >> > > > search started >> > > > (one)-->(two) 621 ms >> > > > search started >> > > > (three)<--(one) 0 ms >> > > > search started >> > > > (two)<--(one) 0 ms >> > > > Environment is about to shut down the neo4j database >> > > > Environment shutting down complete 3,141 ms >> > > > ===== >> > > > and same thing in berkeleydb (no transactions tho): >> > > > first time creating the relationships... >> > > > just created 1,000,000 rels, took=29,658 ms >> > > > Path from one to three: true 10 ms >> > > > Path from one to two: true 0 ms >> > > > Path from three to one: false 0 ms >> > > > Path from two to one: false 0 ms >> > > > Path from one to three: true 0 ms >> > > > Path from one to two: true 0 ms >> > > > Path from three to one: false 0 ms >> > > > Path from two to one: false 0 ms >> > > > Path from one to three: true 0 ms >> > > > Path from one to two: true 0 ms >> > > > Path from three to one: false 0 ms >> > > > Path from two to one: false 0 ms >> > > > class org.bdb.BerkEnv$1 shutting down complete 152 ms >> > > > >> > > > and on cold: >> > > > Path from one to three: true 10 ms >> > > > Path from one to two: true 10 ms >> > > > Path from three to one: false 0 ms >> > > > Path from two to one: false 0 ms >> > > > Path from one to three: true 0 ms >> > > > Path from one to two: true 0 ms >> > > > Path from three to one: false 0 ms >> > > > Path from two to one: false 0 ms >> > > > Path from one to three: true 0 ms >> > > > Path from one to two: true 0 ms >> > > > Path from three to one: false 0 ms >> > > > Path from two to one: false 0 ms >> > > > class org.bdb.BerkEnv$1 shutting down complete 0 ms >> > > > >> > > > >> > > > these progs are on: >> > > > >> > > > >> > >> https://github.com/13th-floor/neo4john/commit/d2fd5b27dcde5560e6ff980fe9320aedc4421ab7 >> > > > >> > > > Cheerios, >> > > > John. >> > > > >> > > > Song of the day: Bic Runga - She Left On A Monday >> > > > PS: asserts were enable ie. vm arg "-ea" (no quotes) >> > > > >> > > > On Sat, Jul 23, 2011 at 4:31 PM, Mattias Persson >> > > > <matt...@neotechnology.com>wrote: >> > > > >> > > > > Hi John, >> > > > > >> > > > > the algorithm is written to dodge these kinds of pitfalls. Maybe >> > there's >> > > > > some issue with the implementation, but in principal it should >> make >> > no >> > > > > difference. I'll look at it when I get the time (I wrote that >> > > > > implementation). >> > > > > >> > > > > 2011/7/23 John cyuczieekc <cyuczie...@gmail.com> >> > > > > >> > > > > > Hey guys, me bugging you again :) >> > > > > > >> > > > > > (This whole thing is kind of based on the lack of being able to >> get >> > the >> > > > > > number of relationships a node has) >> > > > > > >> > > > > > If I have two nodes, and the first one has 1 million outgoing >> > > > > > relationships >> > > > > > of the type X to 1 million unique/different nodes, >> > > > > > and the second node has 10 incoming relationships of type X >> (same >> > type) >> > > > > of >> > > > > > which one is from the first node, >> > > > > > then using GraphAlgoFactory.shortestPath (or suggest a better >> > way?) >> > > > > > How can I tell neo4j to iterate the search on the second node's >> > > > incoming >> > > > > > rels simply because it has 10 relationships instead of 1 >> million, >> > in >> > > > > order >> > > > > > to check if each relationship is in the form of >> > firstNode-->secondNode >> > > > ? >> > > > > > >> > > > > > For the case when first node has 100,000 relationships and >> second >> > node >> > > > > has >> > > > > > 10, >> > > > > > it takes *1.7 seconds* for shortestPath to find the only one >> link >> > > > between >> > > > > > them using: >> > > > > > >> > > > > > final PathFinder<Path> finder = GraphAlgoFactory.shortestPath( >> > > > > > Traversal.expanderForTypes( rel, Direction.OUTGOING ), 1 ); >> > > > > > final Path foundPath = finder.findSinglePath( *one, two* ); >> > > > > > >> > > > > > I can put Direction.*BOTH *and get the same amount of time >> > > > > > *Path from one to two: (one)-->(two) timedelta=1,862,726,634 ns* >> > > > > > >> > > > > > *BUT*, get this: if I swap the nodes: >> > > > > > finder.findSinglePath(* two, one*); >> > > > > > and i use either Direction.INCOMING or Direction.*BOTH *(which >> > makes >> > > > > sense >> > > > > > for the second node ,right) then I get *20ms* the time until it >> > > > > finishes... >> > > > > > *Path from one to two: (two)<--(one) timedelta=20,830,111 ns* >> > > > > > >> > > > > > (both cases are without data being priorly cached) >> > > > > > >> > > > > > I was expecting it to act like this: (but only when using >> > > > Direction.BOTH) >> > > > > > see which node has the least number of relationships and >> iterate >> > on >> > > > > those, >> > > > > > but this would work if findSinglePath would be made for depth 1 >> > (aka >> > > > > > particular case), but as I read "Tries to find a single path >> > between >> > > > > > startand >> > > > > > end nodes." then it makes sense to me why it works like it >> does... >> > that >> > > > > is, >> > > > > > iterate on relationships from start node, rather than from end >> > node... >> > > > > but >> > > > > > I'm not sure if it would *not *make sense to iterate on the end >> > node >> > > > > > instead >> > > > > > of start node, when knowing that end node has less >> relationships, >> > for >> > > > > make >> > > > > > the search faster (well at least if depth is one) - I didn't >> look >> > into >> > > > > how >> > > > > > neo4j actually does stuff yet :D >> > > > > > >> > > > > > anyway, it's fairly clear to me that I could make a simple >> wrapper >> > > > method >> > > > > > to >> > > > > > make this kind of search faster, *IF* I had the ability to know >> how >> > > > many >> > > > > > relationships each node has, so I can call findSinglePath with >> the >> > > > first >> > > > > > param being the node with the least relationship count :) But as >> I >> > > > > > understood it, it's not possible to find how many rels a node >> > has... >> > > > > gimme >> > > > > > feat! :)) [by not possible I mean, without having to iterate >> thru >> > all >> > > > and >> > > > > > count them, which would make the use case here obsolete] >> > > > > > >> > > > > > PS: clearly all the text I wrote here would benefit from being >> > > > > represented >> > > > > > by a graph, just think about all those grouping with autohiding >> the >> > ie. >> > > > > > "[]" >> > > > > > and all kinds of stuff... heh >> > > > > > _______________________________________________ >> > > > > > Neo4j mailing list >> > > > > > User@lists.neo4j.org >> > > > > > https://lists.neo4j.org/mailman/listinfo/user >> > > > > > >> > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > Mattias Persson, [matt...@neotechnology.com] >> > > > > Hacker, Neo Technology >> > > > > www.neotechnology.com >> > > > > _______________________________________________ >> > > > > Neo4j mailing list >> > > > > User@lists.neo4j.org >> > > > > https://lists.neo4j.org/mailman/listinfo/user >> > > > > >> > > > _______________________________________________ >> > > > Neo4j mailing list >> > > > User@lists.neo4j.org >> > > > https://lists.neo4j.org/mailman/listinfo/user >> > > > >> > > >> > > >> > > >> > > -- >> > > Mattias Persson, [matt...@neotechnology.com] >> > > Hacker, Neo Technology >> > > www.neotechnology.com >> > > _______________________________________________ >> > > Neo4j mailing list >> > > User@lists.neo4j.org >> > > https://lists.neo4j.org/mailman/listinfo/user >> > >> > _______________________________________________ >> > Neo4j mailing list >> > User@lists.neo4j.org >> > https://lists.neo4j.org/mailman/listinfo/user >> > >> >> >> >> -- >> Mattias Persson, [matt...@neotechnology.com] >> Hacker, Neo Technology >> www.neotechnology.com >> _______________________________________________ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > > _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user