Re: [Neo4j] traversing densely populated nodes

Michael Hunger Wed, 29 Jun 2011 08:50:21 -0700

I think this is the same problem that Angelos is facing, we are currently 
evaluating options to improve the performance on those highly connected 
supernodes.


A traditional option is really to split them into group or even kind of shard 
their relationships to a second layer.

We're looking into storage improvement options as well as modifications to 
retrieval of that many relationships at once.

Cheers

Michael

Am 29.06.2011 um 17:13 schrieb Niels Hoogeveen:

> 
> I achieve more or less the same result placing the relationships in the 
> Timeline index, which distributes the relationships over many nodes. 
> There are workarounds for this issue, but I would really like to see a more 
> transparent solution which doesn't require special interventions for special 
> cases. 
> I don't know the inner details of the relationship store and wonder if it is 
> possible to partition relationships per node per relationship type per 
> direction. It makes intuitive sense if there are many relationships of the 
> same type and same direction that traversing those takes a lot of time. It 
> doesn't make intuitive sense that relationships with another type and/or 
> direction take a lot of time too.
> Niels
> 
>> Date: Wed, 29 Jun 2011 16:36:57 +0200
>> From: ntausc...@gmail.com
>> To: user@lists.neo4j.org
>> Subject: Re: [Neo4j] traversing densely populated nodes
>> 
>> I focused the same problem. Nodes with a lot of relationships are very
>> difficult (needs a lot of time) to be traversed. I solved the problem by
>> grouping the relationships using additional nodes. The dense node then
>> has only a few relationships to different 'group' nodes. Each 'group'
>> node then has again many relationships to other nodes.
>> 
>> Although this helps, it is a very ugly solution.
>> 
>> Best regards
>> 
>> Norbert Tausch
>> 
>> 
>> Am 29.06.2011 16:07, schrieb Niels Hoogeveen:
>>> Recently I have worked on loading the content of DbPedia into my database 
>>> and run into a performance issue.
>>> My application has a meta-layer; inspired by the meta model component, but 
>>> rewritten in Scala.
>>> All DbPedia resources are said to be an instance of "topic", 
>>> creating a relationship from that resource node to the node that describes 
>>> the topic class.
>>> This makes the "topic class" node of course densely populated.
>>> The "topic class" node has relationships other than "HAS_INSTANCE", 
>>> for example "SUB_CLASS_OF", which states that the "topic class" node is a 
>>> subclass of "typable". 
>>> When trying to retrieve the "SUB_CLASS_OF" relationships of the "topic 
>>> class" node performance degrades enormously. 
>>> 
>>> It looks (please correct me if I am wrong in my assumption) as if all 
>>> relationships are being scanned 
>>> to filter out the "SUB_CLASS_OF" relationships (of which there are very 
>>> few, especially compared to the "HAS_INSTANCE" relationship)
>>> I ended up placing all "HAS_INSTANCE" into the Timeline index from 
>>> Neo4j-graph-collections for two reasons,it's nice to know when a resource 
>>> became an instance of a class (bonus), and to make sure that not a single 
>>> nodebecomes heavily populated.
>>> So far so good, but delving deeper into the Timeline index, I notice that 
>>> the relationship between an entry nodeand the root of the tree is partially 
>>> established by the use of a property on "entry node" which names the 
>>> timeline index.
>>> The simplest way to establish the relationship between an "entry node" and 
>>> the tree root is by means of a Lucene index lookup.
>>> This is of course not a very fastest solution and actually would mean the 
>>> same as adding a property to the "resource node", listing the classes a 
>>> resource is an instance of.
>>> Adding a relationship from "entry node" to "tree root" in the Timeline 
>>> component would create yet another densely populated nodein the database 
>>> (in this case the tree root). 
>>> Is there a way out of this situation? 
>>> Would it be possible to partition the relationships in the database per 
>>> relationship type per direction, so densely populated nodescan get 
>>> traversed fast for those relationships types that are sparsely populated?
>>> Niels
>>>                                       
>>> _______________________________________________
>>> Neo4j mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>                                         
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] traversing densely populated nodes

Reply via email to