This is much the same problem as in an item-based recommender. The
simplest thing would be to read the whole item-item similarity matrix
into memory, but it would never fit. Instead the algorithm has to be
reshaped entirely so that only perhaps one column is in memory at a
time.

It really should look at a lot like the item-based recommender. It's
just that the step that predicts the rating looks different. Diffs
aren't weights to be multiplied onto real ratings, but deltas to be
added to real ratings, to reach a prediction.

On Mon, Apr 11, 2011 at 8:50 AM, ke xie <[email protected]> wrote:
> Well, I know it's not in Mahout, and I'll surely make it public if I can
> implement one.
> The problem here is, how can I predict ratings for users using map-reduce...
> I mean, what's the mapper and reducer class would be like? As I can not read
> the diff-matrix and users profiles at the same time. Even if I can, I'm not
> sure how to write the mapper and reducer.
> I'm re-reding the chapter 6 of your book, perhaps can find some clues.
> Any suggestions?
>
> On Mon, Apr 11, 2011 at 3:41 PM, Sean Owen <[email protected]> wrote:
>>
>> You can certainly use map-reduce. But Mahout does not yet have an
>> implementation for you, for slope-one. If you implement slope-one,
>> which isn't terribly hard and would resemble the item-based
>> recommender implementation you see in the project already, consider
>> contributing it to the Mahout project.
>>
>> 120GB is a huge amount of RAM... just use the MemoryDiffStorage
>> constructor that lets you limit the number of diffs it actually stores
>> in RAM, and you can fit about half the diffs into memory. That should
>> work 99% as well as using all of them.
>>
>> On Mon, Apr 11, 2011 at 8:39 AM, ke xie <[email protected]> wrote:
>> > Actually, I want to do SlopeOne on KDD-MUSIC dataset...
>> > As you know, it's really big, and the diff-matrix is 160GB size. Though
>> > I
>> > have a 120GB RAM machine, that's not enough. Now I'm going to predict
>> > the
>> > rating for the users in the test set, so I think I need to import the
>> > user-profile.
>> >
>> > I wonder can't I use a map-reduce program to calculate the predictions?
>> > If I
>> > can, would you please give me some hints?
>> > Thank you.
>> >
>> > On Mon, Apr 11, 2011 at 3:31 PM, Sean Owen <[email protected]> wrote:
>> >>
>> >> There is no distributed slope-one implementation at this time. You
>> >> need to copy the resulting diffs output off HDFS to a local disk. Then
>> >> you simply use it as input to a MemoryDiffStorage for
>> >> SlopeOneRecommender.
>> >>
>> >> However, if you have computed diffs over a large number of items, it
>> >> may not fit in memory. You can try JDBCDiffStorage and put diffs in a
>> >> database, but you may find it's just too slow. Or you can set
>> >> MemoryDiffStorage to cap the number of diffs it store.
>> >>
>> >> None of these algorithms involve a user profile.
>> >>
>> >> On Mon, Apr 11, 2011 at 8:20 AM, ke xie <[email protected]> wrote:
>> >> > Hi there:
>> >> >
>> >> > I've successfully used a hadoop program to calculate the diff-matrix,
>> >> > and
>> >> > stored the data in my HDFS...
>> >> >
>> >> > But now I'm confusing, how can I read the users' profile as well as
>> >> > the
>> >> > diff-matrix at the same time(they are at different location in my
>> >> > HDFS)
>> >> > to
>> >> > predict a specific user's ratings?
>> >> >
>> >> > I've already checked the mahout implementation of Slopeone with
>> >> > hadoop,
>> >> > but
>> >> > that one just did the calculation of diff-matrix.. and no prediction
>> >> > part is
>> >> > included...
>> >> >
>> >> > Anyone can help me? How to read two kinds of data in Hadoop program
>> >> > at
>> >> > the
>> >> > same time?
>> >> >
>> >> >
>> >> > --
>> >> > Name: Ke Xie   Eddy
>> >> > Research Group of Information Retrieval
>> >> > State Key Laboratory of Intelligent Technology and Systems
>> >> > Tsinghua University
>> >> >
>> >
>> >
>> >
>> > --
>> > Name: Ke Xie   Eddy
>> > Research Group of Information Retrieval
>> > State Key Laboratory of Intelligent Technology and Systems
>> > Tsinghua University
>> >
>> >
>
>
>
> --
> Name: Ke Xie   Eddy
> Research Group of Information Retrieval
> State Key Laboratory of Intelligent Technology and Systems
> Tsinghua University
>
>

Reply via email to