Craig First of all, I must apologise for my slow response - I have been on extended leave (with, deliberately, no internet access - you can live without it!). I am in the process of writing-up (for internal record) the testing process I have been using to benchmark RTree versus PostGIS. I will post a copy. Clearly, from your response, there are a number of possible routes for improvement but, at this stage, I want to be certain I have the optimum tuning settings for Neo4j for the current RTree implementation. At present, all settings are 'out of the box'. I have little experience of Neo4j (but a lot database experince - relational, multi-dimensional, hierarchical) so some basic reworking of the settings will be a great help before I scale-up the test. Thanks Dave
On 18 December 2010 13:00, <user-requ...@lists.neo4j.org> wrote: > Send User mailing list submissions to > user@lists.neo4j.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.neo4j.org/mailman/listinfo/user > or, via email, send a message with subject or body 'help' to > user-requ...@lists.neo4j.org > > You can reach the person managing the list at > user-ow...@lists.neo4j.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of User digest..." > > > Today's Topics: > > 1. Re: Reference node pains. (Marko Rodriguez) > 2. Re: R-Tree indexing performance (Craig Taverner) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 17 Dec 2010 12:00:38 -0700 > From: Marko Rodriguez <okramma...@gmail.com> > Subject: Re: [Neo4j] Reference node pains. > To: Neo4j user discussions <user@lists.neo4j.org> > Message-ID: <c9ed4a35-25a2-4e18-b76f-9da472b69...@gmail.com> > Content-Type: text/plain; charset=us-ascii > > Hi, > > Here is the problem. > > 1. create graph database. (now you have one reference node) > 3. export graph database. > ... > 1. create graph database > 2. import previous graph data (now you have two "reference" nodes) > ... > so forth and so on. > > Moreover, the reference node is considered special by the Neo4j REST > server. If you graph.getNodeById(0).delete(), then Neo4j REST server throws > exceptions as it uses that as a "starting point" (which is a weird concept > of a graph -- as a graph can be cyclic and thus, there is no "starting > point"). > > And if you are not conscious about deleting the reference node, then you > run into "data bug" problems down the road -- "ah damn, that freaking > reference node is why X, Y, Z is happening... :(". > > Hope that helps, > Marko. > > http://markorodriguez.com > > On Dec 17, 2010, at 11:49 AM, Todd Rader wrote: > > > (Going back to the original problem statement...) > > > > I'm not sure I fully understand the pain here. Is the problem that the > reference node is migrated into other stores, and then, if that date is > migrated back to neo4j, the original reference node comes back to a neo4j > database that already has a reference node (causing clutter)? > > > > If that's true, is the problem here the sheer existence of a reference > node, or is it the lack of graceful migrating? For example, would this be > solved if there was an API just like getAllNodes() except it doesn't return > the reference node? I ask this not as a solution but as a way of clarifying > the problem for me. > > > > Todd Rader, Sr. Manager > > vFabric, Cloud Application Platform > > VMware > > tra...@vmware.com > > www.springsource.org | www.springsource.com | www.vmware.com > > > > > > ----- Original Message ----- > > From: "Marko Rodriguez" <okramma...@gmail.com> > > To: "Neo4j user discussions" <user@lists.neo4j.org> > > Sent: Friday, December 10, 2010 10:35:49 AM > > Subject: [Neo4j] Reference node pains. > > > > Hello. > > > > I have one question and a comment: > > > > QUESTION: Is the reference node always id 0 on a newly created graph? > > > > COMMENT: By chance, will you guys remove the concept of a reference node > into the future. I've noticed this to be a pain in the side for people > moving between various graph systems. Going from Neo4j to iGraph to > TinkerPop to etc. The reference node, if the user is not conscious, begins > to build as data is migrated into and from Neo4j graphs. And what ensues is > a data bug. Perhaps a GraphDatabaseServer = new GraphDatabaseService(String > directory, boolean createReferenceNode). ...? > > > > Thanks, > > Marko. > > > > http://markorodriguez.com > > http://tinkerpop.com > > > > _______________________________________________ > > Neo4j mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > _______________________________________________ > > Neo4j mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > > > ------------------------------ > > Message: 2 > Date: Sat, 18 Dec 2010 00:48:00 +0100 > From: Craig Taverner <cr...@amanzi.com> > Subject: Re: [Neo4j] R-Tree indexing performance > To: Neo4j user discussions <user@lists.neo4j.org> > Message-ID: > <aanlktik62ahk0qgcgxwtjpilpwlbqp8vsyyymv80z...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Hi all, > > Yes, there are some plans for improvements to the index. However, I should > start by saying that we have not done extensive benchmarking of the RTree > against the PostGIS implementation, so the work done by Dave is very > interesting and I would like to learn more about his test case. One thing > that would be interesting to find out is whether the performance > differences > are due to the RTree implementation itself, or due to some other underlying > geometry test code, JTS versus something else, or Java versus C. > > Peter points out that we currently have a search algorithm that does not > perhaps make the best use of the graph, since it does not use traversals, > but uses recursion and produces a result-set instead of a result stream. It > is not completely clear all the ways this might affect performance, but it > seems likely that two two cases we should see performance issues, large > result sets and deep traversals. Moving the logic to a real traverser, as > used in some other indexes we have tried, will resolve those issues. But, > it > is possible this has nothing to do with Dave's case. > > So in summary, I think there are a few areas that can account for these > differences: > > - General database performance, and I see others have answered with > suggestions on dealing with that. Neo4j is generally very fast, but > sometimes needs some tuning. > - The RTree implementation itself - I know RTree's are not all equal, so > there may be room for general RTree improvements and optimizations. As > mentioned we have not put much time into optimizing the RTree very much, > so > hopefully there is room to move here. > - The search algorithm's known issues with not leveraging the Neo4j > traversal framework which is a very good, and high performance framework. > > Peter mentions a new multi-dimensional index I am working on, which I call > a > 'composite index'. I think this will not out-perform the RTree because it > is > targeting a very different data domain, primarily point data with large > numbers of attributes to be indexed in the same index and queried with > complex queries. For purely spatial queries, the RTree should perform much > better. But for combined spatial and statistical queries, the new index > should perform better. But there are a few tricks we are using to improve > the performance of the composite index that might be reused for the RTree, > but they require first porting it to the traversal framework, and then fine > tuning the traversal performance. So, my preference is to complete the > composite index, optimize it and then see if some of those optimizations > can > be ported to the RTree at the same time as moving the RTree to the > traverser > framework. > > Regards, Craig > > On Fri, Dec 17, 2010 at 6:15 PM, Peter Neubauer < > peter.neuba...@neotechnology.com> wrote: > > > Dave, > > Craig is planning to improve the R-Tree index in several ways: > > > > - introduce streaming instead of set based returns from the traversal > > - work on generic multidimensional indexing. > > > > Craig, what do you say? > > > > Cheers, > > > > /peter neubauer > > > > GTalk: neubauer.peter > > Skype peter.neubauer > > Phone +46 704 106975 > > LinkedIn http://www.linkedin.com/in/neubauer > > Twitter http://twitter.com/peterneubauer > > > > http://www.neo4j.org - Your high performance graph > database. > > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. > > > > > > > > On Fri, Dec 17, 2010 at 3:43 PM, Dave Hesketh > > <dave.hesk...@compassengine.com> wrote: > > > I'm currently comparing the performance of R-Tree indexing in Neo4j > with > > > PostGIS/PostgreSQL. The database and index has been created and > searched > > in > > > Neo4j using Davide Savazzi routines : ShapefileImported and > SearchWithin. > > > The test dataset is 28,000 points (clustered around San Franciso and > > > Vancouver) and the search is for the points within 1000 randomly > > generated > > > 'circles' (ie 16 sided polygons). On average, each search in Neo4j > takes > > 4 > > > times longer than in PostGIS. Now I know the processing is working > > correctly > > > I want to progressively increase the number of points to 10,000,000. > > > Can anybody give me advice/tips on improving the performance in Neo4j > > before > > > I start scaling-up the test? At this stage, I am only interested in the > > > search performance. > > > Neo4j Version: 1.2.M05 > > > Environment: Windows 7, i5 64bit processor, quad core 4GB > > > Thanks Dave > > > _______________________________________________ > > > Neo4j mailing list > > > User@lists.neo4j.org > > > https://lists.neo4j.org/mailman/listinfo/user > > > > > _______________________________________________ > > Neo4j mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > > > > ------------------------------ > > _______________________________________________ > User mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > > > End of User Digest, Vol 45, Issue 35 > ************************************ > _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user