Re: [Neo4j] User Digest, Vol 45, Issue 35

Dave Hesketh Wed, 12 Jan 2011 01:57:04 -0800

Craig
First of all, I must apologise for my slow response - I have been on
extended leave (with, deliberately, no internet access - you can live
without it!).
I am in the process of writing-up (for internal record) the testing process
I have been using to benchmark RTree versus PostGIS. I will post a copy.
Clearly, from your response, there are a number of possible routes for
improvement but, at this stage, I want to be certain I have the optimum
tuning settings for Neo4j for the current RTree implementation. At present,
all settings are 'out of the box'. I have little experience of Neo4j (but a
lot database experince - relational, multi-dimensional, hierarchical) so
some basic reworking of the settings will be a great help before I scale-up
the test.
Thanks Dave


On 18 December 2010 13:00, <user-requ...@lists.neo4j.org> wrote:

> Send User mailing list submissions to
>        user@lists.neo4j.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        https://lists.neo4j.org/mailman/listinfo/user
> or, via email, send a message with subject or body 'help' to
>        user-requ...@lists.neo4j.org
>
> You can reach the person managing the list at
>        user-ow...@lists.neo4j.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of User digest..."
>
>
> Today's Topics:
>
>   1. Re:  Reference node pains. (Marko Rodriguez)
>   2. Re:  R-Tree indexing performance (Craig Taverner)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 17 Dec 2010 12:00:38 -0700
> From: Marko Rodriguez <okramma...@gmail.com>
> Subject: Re: [Neo4j] Reference node pains.
> To: Neo4j user discussions <user@lists.neo4j.org>
> Message-ID: <c9ed4a35-25a2-4e18-b76f-9da472b69...@gmail.com>
> Content-Type: text/plain; charset=us-ascii
>
> Hi,
>
> Here is the problem.
>
> 1. create graph database. (now you have one reference node)
> 3. export graph database.
> ...
> 1. create graph database
> 2. import previous graph data (now you have two "reference" nodes)
> ...
> so forth and so on.
>
> Moreover, the reference node is considered special by the Neo4j REST
> server. If you graph.getNodeById(0).delete(), then Neo4j REST server throws
> exceptions as it uses that as a "starting point" (which is a weird concept
> of a graph -- as a graph can be cyclic and thus, there is no "starting
> point").
>
> And if you are not conscious about deleting the reference node, then you
> run into "data bug" problems down the road -- "ah damn, that freaking
> reference node is why X, Y, Z is happening... :(".
>
> Hope that helps,
> Marko.
>
> http://markorodriguez.com
>
> On Dec 17, 2010, at 11:49 AM, Todd Rader wrote:
>
> > (Going back to the original problem statement...)
> >
> > I'm not sure I fully understand the pain here.  Is the problem that the
> reference node is migrated into other stores, and then, if that date is
> migrated back to neo4j, the original reference node comes back to a neo4j
> database that already has a reference node (causing clutter)?
> >
> > If that's true, is the problem here the sheer existence of a reference
> node, or is it the lack of graceful migrating?  For example, would this be
> solved if there was an API just like getAllNodes() except it doesn't return
> the reference node?  I ask this not as a solution but as a way of clarifying
> the problem for me.
> >
> > Todd Rader,  Sr. Manager
> > vFabric, Cloud Application Platform
> > VMware
> > tra...@vmware.com
> > www.springsource.org | www.springsource.com | www.vmware.com
> >
> >
> > ----- Original Message -----
> > From: "Marko Rodriguez" <okramma...@gmail.com>
> > To: "Neo4j user discussions" <user@lists.neo4j.org>
> > Sent: Friday, December 10, 2010 10:35:49 AM
> > Subject: [Neo4j] Reference node pains.
> >
> > Hello.
> >
> > I have one question and a comment:
> >
> > QUESTION: Is the reference node always id 0 on a newly created graph?
> >
> > COMMENT: By chance, will you guys remove the concept of a reference node
> into the future. I've noticed this to be a pain in the side for people
> moving between various graph systems. Going from Neo4j to iGraph to
> TinkerPop to etc. The reference node, if the user is not conscious, begins
> to build as data is migrated into and from Neo4j graphs. And what ensues is
> a data bug. Perhaps a GraphDatabaseServer = new GraphDatabaseService(String
> directory, boolean createReferenceNode). ...?
> >
> > Thanks,
> > Marko.
> >
> > http://markorodriguez.com
> > http://tinkerpop.com
> >
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
>
>
>
> ------------------------------
>
> Message: 2
> Date: Sat, 18 Dec 2010 00:48:00 +0100
> From: Craig Taverner <cr...@amanzi.com>
> Subject: Re: [Neo4j] R-Tree indexing performance
> To: Neo4j user discussions <user@lists.neo4j.org>
> Message-ID:
>        <aanlktik62ahk0qgcgxwtjpilpwlbqp8vsyyymv80z...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi all,
>
> Yes, there are some plans for improvements to the index. However, I should
> start by saying that we have not done extensive benchmarking of the RTree
> against the PostGIS implementation, so the work done by Dave is very
> interesting and I would like to learn more about his test case. One thing
> that would be interesting to find out is whether the performance
> differences
> are due to the RTree implementation itself, or due to some other underlying
> geometry test code, JTS versus something else, or Java versus C.
>
> Peter points out that we currently have a search algorithm that does not
> perhaps make the best use of the graph, since it does not use traversals,
> but uses recursion and produces a result-set instead of a result stream. It
> is not completely clear all the ways this might affect performance, but it
> seems likely that two two cases we should see performance issues, large
> result sets and deep traversals. Moving the logic to a real traverser, as
> used in some other indexes we have tried, will resolve those issues. But,
> it
> is possible this has nothing to do with Dave's case.
>
> So in summary, I think there are a few areas that can account for these
> differences:
>
>   - General database performance, and I see others have answered with
>   suggestions on dealing with that. Neo4j is generally very fast, but
>   sometimes needs some tuning.
>   - The RTree implementation itself - I know RTree's are not all equal, so
>   there may be room for general RTree improvements and optimizations. As
>   mentioned we have not put much time into optimizing the RTree very much,
> so
>   hopefully there is room to move here.
>   - The search algorithm's known issues with not leveraging the Neo4j
>   traversal framework which is a very good, and high performance framework.
>
> Peter mentions a new multi-dimensional index I am working on, which I call
> a
> 'composite index'. I think this will not out-perform the RTree because it
> is
> targeting a very different data domain, primarily point data with large
> numbers of attributes to be indexed in the same index and queried with
> complex queries. For purely spatial queries, the RTree should perform much
> better. But for combined spatial and statistical queries, the new index
> should perform better. But there are a few tricks we are using to improve
> the performance of the composite index that might be reused for the RTree,
> but they require first porting it to the traversal framework, and then fine
> tuning the traversal performance. So, my preference is to complete the
> composite index, optimize it and then see if some of those optimizations
> can
> be ported to the RTree at the same time as moving the RTree to the
> traverser
> framework.
>
> Regards, Craig
>
> On Fri, Dec 17, 2010 at 6:15 PM, Peter Neubauer <
> peter.neuba...@neotechnology.com> wrote:
>
> > Dave,
> > Craig is planning to improve the R-Tree index in several ways:
> >
> > - introduce streaming instead of set based returns from the traversal
> > - work on generic multidimensional indexing.
> >
> > Craig, what do you say?
> >
> > Cheers,
> >
> > /peter neubauer
> >
> > GTalk:      neubauer.peter
> > Skype       peter.neubauer
> > Phone       +46 704 106975
> > LinkedIn   http://www.linkedin.com/in/neubauer
> > Twitter      http://twitter.com/peterneubauer
> >
> > http://www.neo4j.org               - Your high performance graph
> database.
> > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
> >
> >
> >
> > On Fri, Dec 17, 2010 at 3:43 PM, Dave Hesketh
> > <dave.hesk...@compassengine.com> wrote:
> > > I'm currently comparing the performance of R-Tree indexing in Neo4j
> with
> > > PostGIS/PostgreSQL. The database and index has been created and
> searched
> > in
> > > Neo4j using Davide Savazzi routines : ShapefileImported and
> SearchWithin.
> > > The test dataset is 28,000 points (clustered around San Franciso and
> > > Vancouver) and the search is for the points within 1000 randomly
> > generated
> > > 'circles' (ie 16 sided polygons). On average, each search in Neo4j
> takes
> > 4
> > > times longer than in PostGIS. Now I know the processing is working
> > correctly
> > > I want to progressively increase the number of points to 10,000,000.
> > > Can anybody give me advice/tips on improving the performance in Neo4j
> > before
> > > I start scaling-up the test? At this stage, I am only interested in the
> > > search performance.
> > > Neo4j Version: 1.2.M05
> > > Environment: Windows 7, i5 64bit processor, quad core 4GB
> > > Thanks Dave
> > > _______________________________________________
> > > Neo4j mailing list
> > > User@lists.neo4j.org
> > > https://lists.neo4j.org/mailman/listinfo/user
> > >
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
>
>
> ------------------------------
>
> _______________________________________________
> User mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
>
> End of User Digest, Vol 45, Issue 35
> ************************************
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] User Digest, Vol 45, Issue 35

Reply via email to