Here are two ideas:

Recommend users to users.
Your users and items are both e-mail senders. The strength of the
association could be the number of e-mails from A to B (or perhaps the
logarithm). This would find people that people like you e-mail a lot.
Sounds interesting, if not immediately useful, because people e-mail
others for very different reasons.

Recommend threads to users.
Users are people, items are threads. This might suggest discussions
you should be a party to, or may be of interest since it concerns
people you often share a thread with. I think it has slightly more
potential to be useful, but, probably a non-starter in practice as
it's not generally true that you'er welcome to see a thread you
weren't copied on.

Recommend users to threads.
Kind of the "have you forgotten to include X" function from Gmail.
Users are threads and items are people.


All of these are sort of novelties -- I don't think CF applies so well
-- but surely worth trying to see what you get out.

In the book I tried recommending Wikipedia articles to Wikipedia
articles -- discovering missing hyperlinks so to speak -- and while it
was a bit novelty the results were intriguing and entertaining.


On Mon, Aug 22, 2011 at 3:48 PM, Grant Ingersoll <gsing...@apache.org> wrote:
> I'm working on an example (well, examples) of using Mahout with the ASF 
> Public Data Set up on Amazon 
> (http://aws.amazon.com/datasets/7791434387204566) and I wanted to show how to 
> use the 3 "C's" (collab filtering, clustering, classification) with the data 
> set.  Clustering and classification are pretty straight forward, but I'm 
> wondering about the setup around collaborative filtering.
>
> The motivation for recommendations is pretty straightforward:  provide people 
> recs on emails that they might find useful based on what other people have 
> interacted with.  The tricky part is I am not totally sure on a valid setup 
> of the problem.  My current thinking is that I build up the rec. matrix based 
> on whether someone has interacted with (initiated/replied) a thread or not.  
> Thus, the columns are the thread ids and the rows are the users.  Each cell 
> contains the count of the number of times user X has interacted with thread 
> Y.  This feels to me like it is a stand in for that user's preference in that 
> if they are replying multiple times, they have an interest in that topic.  I 
> have no idea if this will be effective or not, but it seems like it could be 
> interesting.  Does it sound reasonable?  I worry that even in a really large 
> data set as above it will simply be too sparse.
>
> Is there a better way to think about this from a strict collaborative 
> filtering context?  In other words, I know I could do content-based 
> recommendations but that is not what I am after here.
>
> -Grant
>
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
>

Reply via email to