I'm currently experimenting with Neo4j and have loaded in a fairly 
straightforward dataset consisting of 2 node types - :User and :Merchant. 
There's a single relationship type too - [:PURCHASED_FROM]

Whilst trying to find recommended merchants for a particular user (a 
typical "friend of friend" style cypher query), my query hangs. Upon 
inspecting the graph I can see why. Some users may have purchased from 3 or 
4 merchants that are very popular. These merchants may have 1m+ 
:PURCHASED_FROM relationships. When considering the user is connected to 3 
or 4 of these, we're quickly heading towards millions of pointer chases. 
Given that I'd like to use this system in real-time, it's not performant 
enough as-is.

I've seen posts describing fan-out techniques for dealing with these 
so-called supernodes, however these posts tend to pre-date a release of 
Neo4j that added functionality to address these nodes internally, so I'm 
unsure whether these techniques would still be useful or not.

I have a feeling that I need to perform some kind of pre-computation of 
users in order to cluster them and reduce the number of relationships that 
edge the merchant nodes - since that is the part of the graph that I think 
will be traversed most often by the queries that I'm running. 

However, I wanted to avoid pre-computations / batch processing and instead 
have a living, breathing graph that evolved on it's own. Has any body else 
come against challenges like this? 

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to