Re: [Neo4j] Performance issue on nodes with lots of relationships

Andrew White Wed, 06 Jul 2011 20:03:28 -0700

Here is some interesting stats to consider. First, I split my nodes into 
two groups, one node with 1.4M children and the other with 3.4M 
children. While I do see some cache warm-up improvements, the 
transversal doesn't seem to scale linearly; ie the larger super-node has 
2.4x more children but takes 17x longer to transverse.


neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
+----------+
| count(r) |
+----------+
| 1468486  |
+----------+
1 rows, 25724 ms
neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
+----------+
| count(r) |
+----------+
| 1468486  |
+----------+
1 rows, 19763 ms

neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
+----------+
| count(r) |
+----------+
| 3472174  |
+----------+
1 rows, 565448 ms
neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
+----------+
| count(r) |
+----------+
| 3472174  |
+----------+
1 rows, 337975 ms

Any ideas on this?
Andrew

On 07/06/2011 09:55 AM, Peter Neubauer wrote:
> Andrew,
> if you upgrade to 1.4.M06, your shell should be able to do Cypher in
> order to count the relationships of a node, not returning them:
>
> start n=(1) match (n)-[r]-(x) return count(r)
>
> and try that several times to see if cold caches are initially slowing
> down things.
>
> or something along these lines. In the LS and Neoclipse the output and
> visualization will be slow for that amount of data.
>
> Cheers,
>
> /peter neubauer
>
> GTalk:      neubauer.peter
> Skype       peter.neubauer
> Phone       +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter      http://twitter.com/peterneubauer
>
> http://www.neo4j.org               - Your high performance graph database.
> http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
>
>
>
> On Wed, Jul 6, 2011 at 4:15 PM, Andrew White<li...@andrewewhite.net>  wrote:
>> I have a graph with roughly 10M nodes. Some of these nodes are highly
>> connected to other nodes. For example I may have a single node with 1M+
>> relationships. A good analogy is a population that has a  "lives-in"
>> relationship to a state. Now the problem...
>>
>> Both neoclipse or neo4j-shell are terribly slow when working with these
>> nodes. In the shell I would expect a `cd<node-id>` to be very fast,
>> much like selecting via a rowid in a standard DB. Instead, I usually see
>> several seconds delay. Doing a `ls` takes so long that I usually have to
>> just kill the process. In fact `ls` never outputs anything which is odd
>> since I would expect it to "stream" the output as it found it. I have
>> very similar performance issues with neoclipse.
>>
>> I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM.
>> Disclaimer, I am new to Neo4j.
>>
>> Thanks,
>> Andrew
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>

_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Performance issue on nodes with lots of relationships

Reply via email to