I'm currently experimenting with Neo4j and have loaded in a fairly straightforward dataset consisting of 2 node types - :User and :Merchant. There's a single relationship type too - [:PURCHASED_FROM]
Whilst trying to find recommended merchants for a particular user (a typical "friend of friend" style cypher query), my query hangs. Upon inspecting the graph I can see why. Some users may have purchased from 3 or 4 merchants that are very popular. These merchants may have 1m+ :PURCHASED_FROM relationships. When considering the user is connected to 3 or 4 of these, we're quickly heading towards millions of pointer chases. Given that I'd like to use this system in real-time, it's not performant enough as-is. I've seen posts describing fan-out techniques for dealing with these so-called supernodes, however these posts tend to pre-date a release of Neo4j that added functionality to address these nodes internally, so I'm unsure whether these techniques would still be useful or not. I have a feeling that I need to perform some kind of pre-computation of users in order to cluster them and reduce the number of relationships that edge the merchant nodes - since that is the part of the graph that I think will be traversed most often by the queries that I'm running. However, I wanted to avoid pre-computations / batch processing and instead have a living, breathing graph that evolved on it's own. Has any body else come against challenges like this? -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.