Re: Significant - serendipity in recommending

2012-03-25 Thread Steven Bourke
You could just penalise popular items and promote rarer items from a users top100 recommendation list. Nothing in mahout is geared towards serendipity, novelty or diversity which appear to be this years interest at recsys (recsys.acm.org). But the jist is that its very difficult to do right

RE: Mahout beginner questions...

2012-03-25 Thread Razon, Oren
Thanks for the detailed answer Sean. I want to understand more clearly the non-distributed code limitations. I saw that you advise that for more than 100,000,000 ratings the non-distributed engine won't do the job. The question is why? Is it memory issue (and then if I will have a bigger

Re: Significant - serendipity in recommending

2012-03-25 Thread Sean Owen
Au contraire, you can do exactly this with an IDRescorer. Divide by (the log of) and item's occurrences for example to penalize popular items. I don't recommend this. Stuff like the log-likelihood metric is already in a sense accounting for things that are just generally popular and normalizing

RE: Mahout beginner questions...

2012-03-25 Thread Razon, Oren
Correct me if I'm wrong but a good way to boost up speed could be to use caching recommender, meaning computing the recommendations in advanced (refresh it every X min\hours) and always recommend using the most updated recommendations, right?! -Original Message- From: Sean Owen

Re: Mahout beginner questions...

2012-03-25 Thread Ted Dunning
It rounds like the original poster isn't clear about the division between off-line and on-line work. Almost all production recommendation systems have a large off-line component which analyzes logs of behavior and produces a recommendation model. This model typically consists of item-item

Re: Mahout beginner questions...

2012-03-25 Thread Ted Dunning
Not really. See my previous posting. The best way to get fast recommendations is to use an item-based recommender. Pre-computing recommendations for all users is not usually a win because you wind up doing a lot of wasted work and you still don't have anything for new users who appear between

Re: Mahout beginner questions...

2012-03-25 Thread Sean Owen
Caching recommendations is a good use of memory, sure. It doesn't decrease memory requirements and doesn't speed up the initial recommendation though. Yes pre-computing recommendations is also possible. This is more or less what the Hadoop-based implementation is for. That scales just fine but is

Re: wordcounts are not integer

2012-03-25 Thread Necati Demir
You are right. I asked my question in a wrong way. I want to ask that some values are something like 25.5. How a wordcount can have 0.5 value? You can see a part of this file below: Key: 108 1 1: Value: 241.7667508731829 Key: 108 4: Value: 8.554995151411276 Key: 108 4 during: Value:

RE: Mahout beginner questions...

2012-03-25 Thread Razon, Oren
Ok, so that was a good clarification, which lead me to new questions :) The system I need should of course give the recommendation itself in no time. And as Sean said, it need to have some real time components to enable a different recommendation after the user interact with the application. But

Re: Mahout beginner questions...

2012-03-25 Thread Ted Dunning
On Sun, Mar 25, 2012 at 3:36 PM, Razon, Oren oren.ra...@intel.com wrote: ... The system I need should of course give the recommendation itself in no time. ... But because I'm talking about very large scales, I guess that I want to push much of my model computation to offline mode (which

RE: Mahout beginner questions...

2012-03-25 Thread Razon, Oren
Thanks Ted, So let's continue with your example... I will do I 2 I similarity matrix on Hadoop and then will do online recommendation based on it and the user ranked items. So where does the online part will sit at? Is it a good design to implement it on the same machine that Hadoop run on

Re: Mahout beginner questions...

2012-03-25 Thread Ted Dunning
On Sun, Mar 25, 2012 at 4:02 PM, Razon, Oren oren.ra...@intel.com wrote: So let's continue with your example... I will do I 2 I similarity matrix on Hadoop and then will do online recommendation based on it and the user ranked items. Yes. So where does the online part will sit at? Is it

Re: Mahout beginner questions...

2012-03-25 Thread Sean Owen
On Sun, Mar 25, 2012 at 11:36 PM, Razon, Oren oren.ra...@intel.com wrote: In order to be able to do so, I will probably need a machine that have high memory capacity to contain all the calculations inside the memory. I can even go further and prepare a cached recommender that will be