Re: How to implement SlopeOne with Hadoop? Anyone from Mahout community can help me?

ke xie Mon, 11 Apr 2011 01:03:45 -0700

Thank you Sean for your response. I would read the chapter again, and try to
implement one. Hope I can do some contribution for the community.


Really appreciate for your warm and quick answers!

Best wishes


On Mon, Apr 11, 2011 at 3:58 PM, Sean Owen <[email protected]> wrote:

> This is much the same problem as in an item-based recommender. The
> simplest thing would be to read the whole item-item similarity matrix
> into memory, but it would never fit. Instead the algorithm has to be
> reshaped entirely so that only perhaps one column is in memory at a
> time.
>
> It really should look at a lot like the item-based recommender. It's
> just that the step that predicts the rating looks different. Diffs
> aren't weights to be multiplied onto real ratings, but deltas to be
> added to real ratings, to reach a prediction.
>
> On Mon, Apr 11, 2011 at 8:50 AM, ke xie <[email protected]> wrote:
> > Well, I know it's not in Mahout, and I'll surely make it public if I can
> > implement one.
> > The problem here is, how can I predict ratings for users using
> map-reduce...
> > I mean, what's the mapper and reducer class would be like? As I can not
> read
> > the diff-matrix and users profiles at the same time. Even if I can, I'm
> not
> > sure how to write the mapper and reducer.
> > I'm re-reding the chapter 6 of your book, perhaps can find some clues.
> > Any suggestions?
> >
> > On Mon, Apr 11, 2011 at 3:41 PM, Sean Owen <[email protected]> wrote:
> >>
> >> You can certainly use map-reduce. But Mahout does not yet have an
> >> implementation for you, for slope-one. If you implement slope-one,
> >> which isn't terribly hard and would resemble the item-based
> >> recommender implementation you see in the project already, consider
> >> contributing it to the Mahout project.
> >>
> >> 120GB is a huge amount of RAM... just use the MemoryDiffStorage
> >> constructor that lets you limit the number of diffs it actually stores
> >> in RAM, and you can fit about half the diffs into memory. That should
> >> work 99% as well as using all of them.
> >>
> >> On Mon, Apr 11, 2011 at 8:39 AM, ke xie <[email protected]> wrote:
> >> > Actually, I want to do SlopeOne on KDD-MUSIC dataset...
> >> > As you know, it's really big, and the diff-matrix is 160GB size.
> Though
> >> > I
> >> > have a 120GB RAM machine, that's not enough. Now I'm going to predict
> >> > the
> >> > rating for the users in the test set, so I think I need to import the
> >> > user-profile.
> >> >
> >> > I wonder can't I use a map-reduce program to calculate the
> predictions?
> >> > If I
> >> > can, would you please give me some hints?
> >> > Thank you.
> >> >
> >> > On Mon, Apr 11, 2011 at 3:31 PM, Sean Owen <[email protected]> wrote:
> >> >>
> >> >> There is no distributed slope-one implementation at this time. You
> >> >> need to copy the resulting diffs output off HDFS to a local disk.
> Then
> >> >> you simply use it as input to a MemoryDiffStorage for
> >> >> SlopeOneRecommender.
> >> >>
> >> >> However, if you have computed diffs over a large number of items, it
> >> >> may not fit in memory. You can try JDBCDiffStorage and put diffs in a
> >> >> database, but you may find it's just too slow. Or you can set
> >> >> MemoryDiffStorage to cap the number of diffs it store.
> >> >>
> >> >> None of these algorithms involve a user profile.
> >> >>
> >> >> On Mon, Apr 11, 2011 at 8:20 AM, ke xie <[email protected]> wrote:
> >> >> > Hi there:
> >> >> >
> >> >> > I've successfully used a hadoop program to calculate the
> diff-matrix,
> >> >> > and
> >> >> > stored the data in my HDFS...
> >> >> >
> >> >> > But now I'm confusing, how can I read the users' profile as well as
> >> >> > the
> >> >> > diff-matrix at the same time(they are at different location in my
> >> >> > HDFS)
> >> >> > to
> >> >> > predict a specific user's ratings?
> >> >> >
> >> >> > I've already checked the mahout implementation of Slopeone with
> >> >> > hadoop,
> >> >> > but
> >> >> > that one just did the calculation of diff-matrix.. and no
> prediction
> >> >> > part is
> >> >> > included...
> >> >> >
> >> >> > Anyone can help me? How to read two kinds of data in Hadoop program
> >> >> > at
> >> >> > the
> >> >> > same time?
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Name: Ke Xie   Eddy
> >> >> > Research Group of Information Retrieval
> >> >> > State Key Laboratory of Intelligent Technology and Systems
> >> >> > Tsinghua University
> >> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Name: Ke Xie   Eddy
> >> > Research Group of Information Retrieval
> >> > State Key Laboratory of Intelligent Technology and Systems
> >> > Tsinghua University
> >> >
> >> >
> >
> >
> >
> > --
> > Name: Ke Xie   Eddy
> > Research Group of Information Retrieval
> > State Key Laboratory of Intelligent Technology and Systems
> > Tsinghua University
> >
> >
>



-- 
Name: Ke Xie   Eddy
Research Group of Information Retrieval
State Key Laboratory of Intelligent Technology and Systems
Tsinghua University

Re: How to implement SlopeOne with Hadoop? Anyone from Mahout community can help me?

Reply via email to