I think leveraging existing relationships is obviously valuable, but I
thought I'd throw in an idea for doing the original suggestion, pure random
search:

Reword the original problem to instead of looking for a set of random
potential matches for every node, rather looking for new random
relationships. What I mean it find both A and B randomly. This can be done
at high performance by simply generating a random number between 0 and the
maximum node ID. Assuming most nodes are people, you will be able to
generate a sample set of random people almost instantly (need to trim the
set to real people nodes of course, removing invalid nodes and non-people
nodes, hence the word 'almost').

The sample set can be some pre-defined size, eg. 100 nodes. Then compute all
node-node relationships between the nodes in this set (up to 10k
relationships) with the following rules:

   - Ignore if a relationship already exists
   - Possibly limit to only 10 relationships per node (your suggestion
   above)
   - If the total number of pre-existing relationships are high (or
   relationships per node are high), invoke a trimming algorithm, for example
   removing relationships of low weight, since you care less about them

This idea can be run continously in the background thread, and if the
trimmer works well, will allow the total graph size to reach some stable
state with time.

Then you can add features like when a node's properties are changed, delete
all those relationships to the node and pass the id into the background
process for immediate inclusion in the sample set (bypass the random
sampling for new or edited nodes, so they get some relationships
immediately).

On Wed, Jul 28, 2010 at 6:36 PM, David Montag <
david.mon...@neotechnology.com> wrote:

> Hi Alberto,
>
> On Wed, Jul 28, 2010 at 5:02 PM, Alberto Perdomo
> <alberto.perd...@gmail.com>wrote:
>
> > Hi David,
> >
> >
> > > But then you need to store the result. You can store these metrics as
> > > relationships in neo4j, and then just update them for each user when
> > > you recompute. You can find the user nodes via indexing. Maybe it's
> > > acceptable that some metrics are out of date, so you can just
> > > background process them continuously.
> >
> > I already have background processes that go through all users and
> > calculate new new pairs. But then in order to do that I do need to
> > exclude the pairs I already have... because it would be silly and as
> > the relationship density grows the probablity of calculating a pair
> > again would be higher and higher...
> > Would I be able to do that kind of query using indexing?
> >
>
> From your description it sounds like the factors that influence the metric
> don't change, so a single calculation per pair is enough. In this case, you
> could just determine the pairs in some way and then do the computation,
> storing the relationship in Neo4j. You can do it all in one go, nothing
> fancy. You would of course have to compute the metric to N peers for each
> new user.
>
> In other scenarios, the factors that influence the metric might change over
> time, e.g. a user's city or favorite movie. Then you actually need to keep
> recomputing the metric between existing users, and yes, then you probably
> want some scheme to make sure that you don't starve some users. You might
> for example want to prioritize the most active users first. Again, I don't
> know if this applies to your case though.
>
> As for the indexing, I'm not sure how you would use it here. Like, what
> kind
> of querying were you picturing?
>
>
> >
> > > Depending on your scenario, if your users know each other, it might be
> > > interesting to start computing in a foaf style order (breadth first).
> > > Remember, the power is in the relationships. Isolated nodes are not
> > > interesting.
> >
> > You mean I look first for possible pairs with users that are friends
> > of friends instead of randomly? We are also interesting in storing
> > friendship relationship so that sounds interesting.
> > That would be a different type of query: Traverse the graph from node
> > A to nodes which are friends of friends of A and have no match
> > relationship with A. I guess that is not difficult to implement using
> > Neo4j?
> >
>
> Exactly, so you might want to start with the most relevant other people,
> i.e. people you can realistically meet IRL via friends. Don't know if
> that's
> relevant to your application though.
>
> Neo4j would be a perfect fit for storing friendship relationships between
> users. It opens up all kinds of interesting data mining possibilities.
>
> The FOAF query would be easy to write using the Neo4j APIs, or some other
> tool such as Gremlin on top of Neo4j.
>
> So you could combine the friendship relationships with your processing step
> and prioritize active users, and start by checking people close to them in
> their social network. Again, if it's relevant. And, as Mattias suggested,
> if
> you can leverage friendship relationships between users, you might be able
> to calculate your metric on the fly, given that you limit the search to the
> user's extended social network. Of course, if you go deep enough, you might
> reach all users this way too.
>
>
> >
> > Thanks for your input David!
> >
>
> Glad to be of service. Ask as much as you like! We're all learning here :)
>
>
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to