Oh I see, right.

Well, one general strategy is to use Hadoop to compute the
recommendations regularly, but not nearly in real-time. Then, use the
latest data to imperfectly update the recommendations in real-time.
So, you always have slightly stale recommendations, and item-item
similarities to fall back on, and are reloading those periodically.
Then you're trying to update any recently changed item or user in
real-time using item-based recommendation, which can be fast.

It's a really big topic in its own right, and there's no complete
answer for you here, but you can piece this together from Mahout
rather than build it from scratch.)

(This is more or less exactly what I have been working on separately,
a hybrid Hadoop-based / real-time recommender that can handle this
scale but also respond reasonably to new data.)

On Tue, Oct 25, 2011 at 4:44 PM, Vishal Santoshi
<[email protected]> wrote:
> They are all active in a day. I am talking about 8.3 million active users a
> day.
> A significant fraction of them will be new users ( say about 2-3 million of
> them ).
> Further the churn on items is likely to make historical recommendations
> obsolete.
> Thus if I have recommendations that were good of user A yesterday, they are
> likely to be far less a weight as of today.
>
>
>
>
>
>
>
>
> On Tue, Oct 25, 2011 at 11:32 AM, Sean Owen <[email protected]> wrote:
>
>> On Tue, Oct 25, 2011 at 4:08 PM, Vishal Santoshi
>> <[email protected]> wrote:
>> > In our case the preferences is  a user clicking on an article ( which
>> > doubles as an item ).
>> > And these articles are introduced at a frequent rate. Thus the number of
>> new
>> > items that
>> > occur in the dataset has a very frequent churn and thus not necessarily
>> > having any history.
>> > Of course we need to recommend the latest item.
>>
>> OK, but I'm still not seeing why all users need an update every time.
>> Surely most of the 8.3M users aren't even active in a given day.
>>
>

Reply via email to