Re: How to implement SlopeOne with Hadoop? Anyone from Mahout community can help me?

ke xie Mon, 11 Apr 2011 00:51:03 -0700

Well, I know it's not in Mahout, and I'll surely make it public if I can
implement one.


The problem here is, how can I predict ratings for users using map-reduce...
I mean, what's the mapper and reducer class would be like? As I can not read
the diff-matrix and users profiles at the same time. Even if I can, I'm not
sure how to write the mapper and reducer.

I'm re-reding the chapter 6 of your book, perhaps can find some clues.

Any suggestions?

On Mon, Apr 11, 2011 at 3:41 PM, Sean Owen <[email protected]> wrote:

> You can certainly use map-reduce. But Mahout does not yet have an
> implementation for you, for slope-one. If you implement slope-one,
> which isn't terribly hard and would resemble the item-based
> recommender implementation you see in the project already, consider
> contributing it to the Mahout project.
>
> 120GB is a huge amount of RAM... just use the MemoryDiffStorage
> constructor that lets you limit the number of diffs it actually stores
> in RAM, and you can fit about half the diffs into memory. That should
> work 99% as well as using all of them.
>
> On Mon, Apr 11, 2011 at 8:39 AM, ke xie <[email protected]> wrote:
> > Actually, I want to do SlopeOne on KDD-MUSIC dataset...
> > As you know, it's really big, and the diff-matrix is 160GB size. Though I
> > have a 120GB RAM machine, that's not enough. Now I'm going to predict the
> > rating for the users in the test set, so I think I need to import the
> > user-profile.
> >
> > I wonder can't I use a map-reduce program to calculate the predictions?
> If I
> > can, would you please give me some hints?
> > Thank you.
> >
> > On Mon, Apr 11, 2011 at 3:31 PM, Sean Owen <[email protected]> wrote:
> >>
> >> There is no distributed slope-one implementation at this time. You
> >> need to copy the resulting diffs output off HDFS to a local disk. Then
> >> you simply use it as input to a MemoryDiffStorage for
> >> SlopeOneRecommender.
> >>
> >> However, if you have computed diffs over a large number of items, it
> >> may not fit in memory. You can try JDBCDiffStorage and put diffs in a
> >> database, but you may find it's just too slow. Or you can set
> >> MemoryDiffStorage to cap the number of diffs it store.
> >>
> >> None of these algorithms involve a user profile.
> >>
> >> On Mon, Apr 11, 2011 at 8:20 AM, ke xie <[email protected]> wrote:
> >> > Hi there:
> >> >
> >> > I've successfully used a hadoop program to calculate the diff-matrix,
> >> > and
> >> > stored the data in my HDFS...
> >> >
> >> > But now I'm confusing, how can I read the users' profile as well as
> the
> >> > diff-matrix at the same time(they are at different location in my
> HDFS)
> >> > to
> >> > predict a specific user's ratings?
> >> >
> >> > I've already checked the mahout implementation of Slopeone with
> hadoop,
> >> > but
> >> > that one just did the calculation of diff-matrix.. and no prediction
> >> > part is
> >> > included...
> >> >
> >> > Anyone can help me? How to read two kinds of data in Hadoop program at
> >> > the
> >> > same time?
> >> >
> >> >
> >> > --
> >> > Name: Ke Xie   Eddy
> >> > Research Group of Information Retrieval
> >> > State Key Laboratory of Intelligent Technology and Systems
> >> > Tsinghua University
> >> >
> >
> >
> >
> > --
> > Name: Ke Xie   Eddy
> > Research Group of Information Retrieval
> > State Key Laboratory of Intelligent Technology and Systems
> > Tsinghua University
> >
> >
>



-- 
Name: Ke Xie   Eddy
Research Group of Information Retrieval
State Key Laboratory of Intelligent Technology and Systems
Tsinghua University

Re: How to implement SlopeOne with Hadoop? Anyone from Mahout community can help me?

Reply via email to