You could just penalise popular items and promote rarer items from a users
top100 recommendation list. Nothing in mahout is geared towards serendipity,
novelty or diversity which appear to be this years interest at recsys
(recsys.acm.org). But the jist is that its very difficult to do right
Thanks for the detailed answer Sean.
I want to understand more clearly the non-distributed code limitations.
I saw that you advise that for more than 100,000,000 ratings the
non-distributed engine won't do the job.
The question is why? Is it memory issue (and then if I will have a bigger
Au contraire, you can do exactly this with an IDRescorer. Divide by (the
log of) and item's occurrences for example to penalize popular items.
I don't recommend this. Stuff like the log-likelihood metric is already in
a sense accounting for things that are just generally popular and
normalizing
Correct me if I'm wrong but a good way to boost up speed could be to use
caching recommender, meaning computing the recommendations in advanced (refresh
it every X min\hours) and always recommend using the most updated
recommendations, right?!
-Original Message-
From: Sean Owen
It rounds like the original poster isn't clear about the division between
off-line and on-line work.
Almost all production recommendation systems have a large off-line
component which analyzes logs of behavior and produces a recommendation
model. This model typically consists of item-item
Not really. See my previous posting.
The best way to get fast recommendations is to use an item-based
recommender. Pre-computing recommendations for all users is not usually a
win because you wind up doing a lot of wasted work and you still don't have
anything for new users who appear between
Caching recommendations is a good use of memory, sure. It doesn't decrease
memory requirements and doesn't speed up the initial recommendation though.
Yes pre-computing recommendations is also possible. This is more or less
what the Hadoop-based implementation is for. That scales just fine but is
You are right. I asked my question in a wrong way.
I want to ask that some values are something like 25.5. How a wordcount can
have 0.5 value? You can see a part of this file below:
Key: 108 1 1: Value: 241.7667508731829
Key: 108 4: Value: 8.554995151411276
Key: 108 4 during: Value:
Ok, so that was a good clarification, which lead me to new questions :)
The system I need should of course give the recommendation itself in no time.
And as Sean said, it need to have some real time components to enable a
different recommendation after the user interact with the application.
But
On Sun, Mar 25, 2012 at 3:36 PM, Razon, Oren oren.ra...@intel.com wrote:
...
The system I need should of course give the recommendation itself in no
time.
...
But because I'm talking about very large scales, I guess that I want to
push much of my model computation to offline mode (which
Thanks Ted,
So let's continue with your example... I will do I 2 I similarity matrix on
Hadoop and then will do online recommendation based on it and the user ranked
items.
So where does the online part will sit at? Is it a good design to implement it
on the same machine that Hadoop run on
On Sun, Mar 25, 2012 at 4:02 PM, Razon, Oren oren.ra...@intel.com wrote:
So let's continue with your example... I will do I 2 I similarity matrix
on Hadoop and then will do online recommendation based on it and the user
ranked items.
Yes.
So where does the online part will sit at? Is it
On Sun, Mar 25, 2012 at 11:36 PM, Razon, Oren oren.ra...@intel.com wrote:
In order to be able to do so, I will probably need a machine that have
high memory capacity to contain all the calculations inside the memory.
I can even go further and prepare a cached recommender that will be
13 matches
Mail list logo