Re: [Neo4j] BFS help using neo4j on a graph with 200 million edges, 10 million nodes, (branching factor ca. 50)

Mattias Persson Thu, 28 Oct 2010 13:09:38 -0700

2010/10/28 david lightstone <david.lightst...@gmail.com>

> Mattias,
>
> So Direction.BOTH means to traverse from both sides, not that the
> relationships are directed, correct? I completely misinterpreted that and
> used Direction.OUTGOING to indicate that my relationships were not
> bidirectional. Thank you!
>


Direction.BOTH means traverse relationships disregarding their directions...
it _doesnt_ mean traverse from both sides, that's an implementation detail
of the ShortestPath algorithm. Try to keep those two meanings separate! So
your initial interpretation was correct regarding Direction.BOTH.


>
> And I will look into using the REST API to try and keep the JVM up as long
> as possible. I was thinking of it exactly as you had mentioned it should
> not
> be (an SQLConnection equivalent). In fact, I just moved my database from
> MySQL tables into Neo4j. In the end I am trying to interface the Neo4j
> database with a PHP front end (which I had already developed for use with
> the old MySQL database).
>
> I was looking at the PHP-REST link on the Wiki but was taken to a broken
> link at:
> https://svn.neo4j.org/laboratory/components/rest/
>
> Thanks again.
>
> David.
>
> On Thu, Oct 28, 2010 at 3:39 PM, Mattias Persson
> <matt...@neotechnology.com>wrote:
>
> > 2010/10/28 david lightstone <david.lightst...@gmail.com>
> >
> > > Thank you both for the quick replies.
> > >
> > > My dbs are of the following size:
> > >
> > > FILESIZES
> > > ----------------
> > > neostore.nodestore.db    70MB
> > > neostore.propertystore.db    583MB
> > > neostore.relationshipstore.db     6.6GB
> > > neostore.propertystore.db.strings 1 GB
> > >
> > > I played around with the props configuration, trying many different
> > values.
> > > here is what I currently have in it. It seems like caching does not
> work
> > > well at all when I use the configuration.
> > >
> > > CONFIGURATION
> > > --------------------------
> > > neostore.nodestore.db.mapped_memory=80M
> > > neostore.relationshipstore.db.mapped_memory=2500M
> > > neostore.propertystore.db.mapped_memory=500M
> > > neostore.propertystore.db.strings.mapped_memory=100M
> > > neostore.propertystore.db.arrays.mapped_memory=0M
> > >
> > > Peter, I have seen that it does work faster in Windows than in Linux
> but
> > > again Windows has 2x128GB SSD in RAID 0 (~300MB/s read). I have also
> > tried
> > > running using the parameters "-Xms2000M -Xmx4000M -server"
> > >
> > > Mattias, I do want to do a BFS, finding the shortest path between two
> > given
> > > nodes that I pass in as parameters. One thing I do not understand about
> > > caching (perhaps I am going about this entirely wrong) is how does the
> > > database stay in memory? In my Neo4jBFS.java file, I call the following
> > > commands:
> > >
> >
> > That isn't a very efficient way of finding the shortest path between two
> > nodes. Instead try out an algorithm written for just that in the
> graph-algo
> > component <http://components.neo4j.org/neo4j-graph-algo/> and in that
> page
> > you can see a usage example of the ShortestPath algorithm. Key benefits
> > with
> > that implementation are traversing from both directions and using less
> > memory.
> >
> >
> > >
> > > graphDb = new EmbeddedGraphDatabase (DB_PATH);
> > > /* do stuff */
> > > indexService.shutdown();
> > > graphDb.shutdown();
> >
> >
> > > The program is called with the following command line:
> > >
> > > java Neo4jBFS sourceNode targetNode maxDepth (and the Xmx, Xms, server
> > > command lines as described above)
> > >
> > > Wouldn't that make it always the first instance of
> GraphDatabaseService?
> > > How
> > > can I get around that and have it always running in the background? If
> > > there
> > > is any code that you would like me to post to please let me know.
> > >
> > > Thank you again for the prompt replies!
> > >
> >
> > You generally want to keep your database JVM alive as long as possible to
> > gain the effects of the cached values. Don't look at an
> > EmbeddedGraphDatabase as the equivalent to an SQLConnection or something,
> > it
> > shouldn't be closed as soon as possible.
> >
> > If you plan on having a neo4j graph database sitting and answering these
> > shortest path requests then you could either bring up a standalone REST
> > server <http://components.neo4j.org/neo4j-rest/> which exposes a REST
> API,
> > and also includes the shortest path algorithm. Or deploy it in a
> container,
> > f.ex. a web container or Spring.
> >
> >
> >
> > >
> > > David.
> > >
> > > On Thu, Oct 28, 2010 at 2:54 PM, Mattias Persson
> > > <matt...@neotechnology.com>wrote:
> > >
> > > > Do you actually want to do a BFS (assuming that means breath first
> > > search)
> > > > and get all those paths back, or are you just testing performance?
> Also
> > > if
> > > > it's the first run for that GraphDatabaseService instance you're
> > > basically
> > > > testing your I/O performance on your hard drive since everything will
> > > have
> > > > to be read up into memory. Consecutive runs should be much faster.
> > > >
> > > > 2010/10/28 david lightstone <david.lightst...@gmail.com>
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > I'm running Neo4j on both Ubuntu and Windows 7 boxes. I have a
> > dataset
> > > > with
> > > > > 200 million edges and 10 million nodes with a median branching
> factor
> > > of
> > > > > about 50 outgoing, directed edges/node. I'm trying to run the BFS
> > > search
> > > > on
> > > > > the data but am fairing unsuccessful in being able to do so in a
> > timely
> > > > > fashion. I have tried to follow the advice on
> > > > > http://wiki.neo4j.org/content/Neo4j_Performance_Guide but I still
> > have
> > > > > queries that can take up to 300 seconds or so to run. My Ubuntu box
> > has
> > > > 6GB
> > > > > of RAM, and is running on a 7200RPM hard drive, while my windows
> box
> > > has
> > > > > 8GB
> > > > > RAM and is running off of SSDs (HDtune reports ~300 MB/s reads).
> > > > >
> > > > > I had also added an index for the nodes.
> > > > >
> > > > > Can anyone offer advice on why this process may be taking so long?
> > The
> > > > CPU
> > > > > usage on both is very low (2-5%) and I'm pretty sure the whole
> thing
> > is
> > > > HDD
> > > > > i/o limited, but I was wondering if there were any techniques or
> > > anything
> > > > > to
> > > > > actually get the query to go any faster?
> > > > >
> > > > > Judging by what I had read about Neo4j in descriptions I assumed
> that
> > > my
> > > > > data size was not too large to justify a long BFS (the paths can
> take
> > > up
> > > > to
> > > > > 300 seconds just 4 nodes away.)
> > > > >
> > > > > Thank you in advance.
> > > > > _______________________________________________
> > > > > Neo4j mailing list
> > > > > User@lists.neo4j.org
> > > > > https://lists.neo4j.org/mailman/listinfo/user
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Mattias Persson, [matt...@neotechnology.com]
> > > > Hacker, Neo Technology
> > > > www.neotechnology.com
> > > > _______________________________________________
> > > > Neo4j mailing list
> > > > User@lists.neo4j.org
> > > > https://lists.neo4j.org/mailman/listinfo/user
> > > >
> > > _______________________________________________
> > > Neo4j mailing list
> > > User@lists.neo4j.org
> > > https://lists.neo4j.org/mailman/listinfo/user
> > >
> >
> >
> >
> > --
> > Mattias Persson, [matt...@neotechnology.com]
> > Hacker, Neo Technology
> > www.neotechnology.com
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] BFS help using neo4j on a graph with 200 million edges, 10 million nodes, (branching factor ca. 50)

Reply via email to