Re: [Neo4j] Collaborative filtering in Cypher

Marko Rodriguez Mon, 01 Aug 2011 13:09:08 -0700

Hi,


> Hi, I'm new to graph databases and have been trying to understand the power
> of Cypher and/or Gremlin as a way to develop suggestion queries. I've
> watched a few webinars and read through some of the documentation but I've
> had a hard time figuring out complex suggestion type queries other than
> stuff like "find me the top 3 movies my friends have recommended," or "given
> that I rated this movie 5 stars, find me other people who liked this movie."

This is a common query that looks something like this in Gremlin.

        m = [:] // create final rankings map
        g.v(1).outE('rated'}{it.stars == 5}.inV.inE('rated'}{it.stars == 
5}.outV.outE('rated'){it.stars == 5}.inV.groupCount(m)

The last line says:
        v.outE('rated'}{it.stars == 5}.inV // what movies do I like (5 stars) 
assuming you are g.v(1)
        .inV.inE('rated'}{it.stars == 5}.outV // who else really likes those 
moves (5 stars)
        .outE('rated'){it.stars == 5}.inV.groupCount(m) // what else do they 
really like

You can do some stuff with aggregate() and except() if you want to filter out 
those movies in m that you have already seen.

You can do some wicked stuff like this with Gremlin 1.2 (just released):

        m = [:] // create final rankings map
        g.v(1).outE('rated'}{it.stars == 5}.inV.inE('rated'}{it.stars == 
5}.outV.outE('rated'){it.stars > 
3}.sideEffect{x=it.stars}.inV.groupCount(m){it.name}{it + x}

In the above, you are saying which movies do they like (> 3 -- not necessarily 
love, but just like), and then use that value (the stars) as the value you put 
into the ranking map (see the key/value closures off of groupCount(), where it 
is the previous value of that particular key). With respect to your weighting 
stuff you want to do below in your email, stuff like that comes in handy.

> To give a simplified example of what I'm trying to achieve is something like
> suggesting users based on attributes they weight from 0-1 and those
> attributes can have relationships between each other and I want to find
> something like the Tanimoto coefficient but weights as opposed to strictly
> binary attributes or slope one for a start user to end users and then list
> the top X. I have the O'Reilly book on Collective Intelligence but I've only
> seen examples for set theory, not for graph theory.
> 
> I was thinking of Something like:
> 
> UserA likes Digital Photography with a weight of 1
> UserA likes Wine with a weight of .8
> UserA likes Rock music with a weight of 1
> UserB likes Film Photography with a weight of 1
> UserB likes Rock music with a weight of .5
> UserB likes Beer with a weight of 1
> Digital Photography is related to Film Photography with a weight of .5
> 
> I want to return how alike these two users are. I understand the graph
> theory behind it in that I want to follow all likes relationships and is
> related relationships till I hit another user and then with those paths that
> I have to that user I want to multiply the weights of the relationships
> along that path to get a score and then add those scores to get a final
> score.
> 
> So in the above example it would look something like:
> UserA->Digital Photography->Film Photography->UserB = 1 * .5 * 1 = .5
> UserA->Rock->UserB = 1 * .5 = .5
> Final tally = 1.


I didn't read the above. Got lazy.


> And then if there were mere users it would follow the paths to find those
> users as well and find the scores and then sort the users by score. Looking
> at Gremlin and Cypher, I'm not sure where to even start to work on query
> that can do this and if it even is possible.

Just bust a paths() step at the end:

        g.v(1).outE('rated'}{it.stars == 5}.inV.inE('rated'}{it.stars == 
5}.outV.outE('rated'){it.stars == 5}.inV.groupCount(m).paths

That is the path from you to the movies that are recommended to you. You can do 
path closures in Gremlin 1.2 so you can yield computations on the path 
elements, but I won't get into that. See the Gremlin documentation and 
examples. http://gremlin.tinkerpop.com


> I know what I described isn't slope one or the Tanimoto coefficient because
> it doesn't take into account the full set of attributes for the second user,
> but I'm just getting used to this and right now my potential solution is
> just have all unrelated attributes have edges of weight 0, but yeah I'm
> probably getting ahead of myself. I'm just looking for a point in the right
> direction for places to research and perhaps if they're available see some
> actual Cypher queries that have done weighted suggestions based on
> attributes.

I didn't read everything you wrote, so if what I provided isn't sufficient, 
please ask a particular question, and preferably not at great length.

WARNING: All Gremlin examples were typed into the email. You may have to fiddle 
as I might have missed a ( { ' " and as such, might be bugged.

Thanks,
Marko.

http://markorodriguez.com
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Collaborative filtering in Cypher

Reply via email to