Well, I know it's not in Mahout, and I'll surely make it public if I can implement one.
The problem here is, how can I predict ratings for users using map-reduce... I mean, what's the mapper and reducer class would be like? As I can not read the diff-matrix and users profiles at the same time. Even if I can, I'm not sure how to write the mapper and reducer. I'm re-reding the chapter 6 of your book, perhaps can find some clues. Any suggestions? On Mon, Apr 11, 2011 at 3:41 PM, Sean Owen <[email protected]> wrote: > You can certainly use map-reduce. But Mahout does not yet have an > implementation for you, for slope-one. If you implement slope-one, > which isn't terribly hard and would resemble the item-based > recommender implementation you see in the project already, consider > contributing it to the Mahout project. > > 120GB is a huge amount of RAM... just use the MemoryDiffStorage > constructor that lets you limit the number of diffs it actually stores > in RAM, and you can fit about half the diffs into memory. That should > work 99% as well as using all of them. > > On Mon, Apr 11, 2011 at 8:39 AM, ke xie <[email protected]> wrote: > > Actually, I want to do SlopeOne on KDD-MUSIC dataset... > > As you know, it's really big, and the diff-matrix is 160GB size. Though I > > have a 120GB RAM machine, that's not enough. Now I'm going to predict the > > rating for the users in the test set, so I think I need to import the > > user-profile. > > > > I wonder can't I use a map-reduce program to calculate the predictions? > If I > > can, would you please give me some hints? > > Thank you. > > > > On Mon, Apr 11, 2011 at 3:31 PM, Sean Owen <[email protected]> wrote: > >> > >> There is no distributed slope-one implementation at this time. You > >> need to copy the resulting diffs output off HDFS to a local disk. Then > >> you simply use it as input to a MemoryDiffStorage for > >> SlopeOneRecommender. > >> > >> However, if you have computed diffs over a large number of items, it > >> may not fit in memory. You can try JDBCDiffStorage and put diffs in a > >> database, but you may find it's just too slow. Or you can set > >> MemoryDiffStorage to cap the number of diffs it store. > >> > >> None of these algorithms involve a user profile. > >> > >> On Mon, Apr 11, 2011 at 8:20 AM, ke xie <[email protected]> wrote: > >> > Hi there: > >> > > >> > I've successfully used a hadoop program to calculate the diff-matrix, > >> > and > >> > stored the data in my HDFS... > >> > > >> > But now I'm confusing, how can I read the users' profile as well as > the > >> > diff-matrix at the same time(they are at different location in my > HDFS) > >> > to > >> > predict a specific user's ratings? > >> > > >> > I've already checked the mahout implementation of Slopeone with > hadoop, > >> > but > >> > that one just did the calculation of diff-matrix.. and no prediction > >> > part is > >> > included... > >> > > >> > Anyone can help me? How to read two kinds of data in Hadoop program at > >> > the > >> > same time? > >> > > >> > > >> > -- > >> > Name: Ke Xie Eddy > >> > Research Group of Information Retrieval > >> > State Key Laboratory of Intelligent Technology and Systems > >> > Tsinghua University > >> > > > > > > > > > -- > > Name: Ke Xie Eddy > > Research Group of Information Retrieval > > State Key Laboratory of Intelligent Technology and Systems > > Tsinghua University > > > > > -- Name: Ke Xie Eddy Research Group of Information Retrieval State Key Laboratory of Intelligent Technology and Systems Tsinghua University
