Re: [Neo4j] traversing densely populated nodes

Marko Rodriguez Wed, 29 Jun 2011 08:28:07 -0700

Hi,

In situations where there is a high degree vertex [ 
http://en.wikipedia.org/wiki/Degree_(graph_theory) ], I tend to do graph 
sampling. That is, I do not traverse everything outgoing/incoming from the 
vertex. For example, in Gremlin:


        rand = new Random();
        highDegreeVertex.outE{rand.nextBoolean()}.inV

In the above traversal, probabilistically, only 50% of the edges out of 
highDegreeVertex are traversed. If your edges are weighted, then you can do a 
biased coin toss to only sample those paths that are heavily weighted.

        highDegreeVertex.outE{rand.nextDouble() < it.weight}.inV

If the edge weight is in [0,1] and larger weight is a "stronger" connection, 
then this will ensure that you have a higher probability of taking strong 
edges, while still having the potential to take low probability edges --- thus, 
not so "cut and dry" as a clip-based filter. E.g.

        highDegreeVertex.outE{it.weight > 0.5}.inV

Finally, sampling assumes a query/traversal situation that is non-deterministic 
and as such, not a "get me the data if it exists"-situation. For me, most of my 
query/traversal situations are based on local/global-ranks where approximations 
are acceptable and necessary for real-time speed. I see this as a big 
distinguisher amongst graph vs. other database models. In the world of graphs, 
thinking of your data like "a neural network (graph) of action potentials 
(traversals)" leads to novel models of data retrieval, data organization, and 
relaxes deterministic constraints allowing for increased speed.

Thanks,
Marko.

http://markorodriguez.com

On Jun 29, 2011, at 8:36 AM, Norbert Tausch wrote:

> I focused the same problem. Nodes with a lot of relationships are very
> difficult (needs a lot of time) to be traversed. I solved the problem by
> grouping the relationships using additional nodes. The dense node then
> has only a few relationships to different 'group' nodes. Each 'group'
> node then has again many relationships to other nodes.
> 
> Although this helps, it is a very ugly solution.
> 
> Best regards
> 
> Norbert Tausch
> 
> 
> Am 29.06.2011 16:07, schrieb Niels Hoogeveen:
>> Recently I have worked on loading the content of DbPedia into my database 
>> and run into a performance issue.
>> My application has a meta-layer; inspired by the meta model component, but 
>> rewritten in Scala.
>> All DbPedia resources are said to be an instance of "topic", 
>> creating a relationship from that resource node to the node that describes 
>> the topic class.
>> This makes the "topic class" node of course densely populated.
>> The "topic class" node has relationships other than "HAS_INSTANCE", 
>> for example "SUB_CLASS_OF", which states that the "topic class" node is a 
>> subclass of "typable". 
>> When trying to retrieve the "SUB_CLASS_OF" relationships of the "topic 
>> class" node performance degrades enormously. 
>> 
>> It looks (please correct me if I am wrong in my assumption) as if all 
>> relationships are being scanned 
>> to filter out the "SUB_CLASS_OF" relationships (of which there are very few, 
>> especially compared to the "HAS_INSTANCE" relationship)
>> I ended up placing all "HAS_INSTANCE" into the Timeline index from 
>> Neo4j-graph-collections for two reasons,it's nice to know when a resource 
>> became an instance of a class (bonus), and to make sure that not a single 
>> nodebecomes heavily populated.
>> So far so good, but delving deeper into the Timeline index, I notice that 
>> the relationship between an entry nodeand the root of the tree is partially 
>> established by the use of a property on "entry node" which names the 
>> timeline index.
>> The simplest way to establish the relationship between an "entry node" and 
>> the tree root is by means of a Lucene index lookup.
>> This is of course not a very fastest solution and actually would mean the 
>> same as adding a property to the "resource node", listing the classes a 
>> resource is an instance of.
>> Adding a relationship from "entry node" to "tree root" in the Timeline 
>> component would create yet another densely populated nodein the database (in 
>> this case the tree root). 
>> Is there a way out of this situation? 
>> Would it be possible to partition the relationships in the database per 
>> relationship type per direction, so densely populated nodescan get traversed 
>> fast for those relationships types that are sparsely populated?
>> Niels
>>                                        
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] traversing densely populated nodes

Reply via email to