Thank you Sean for your response. I would read the chapter again, and try to implement one. Hope I can do some contribution for the community.
Really appreciate for your warm and quick answers! Best wishes On Mon, Apr 11, 2011 at 3:58 PM, Sean Owen <[email protected]> wrote: > This is much the same problem as in an item-based recommender. The > simplest thing would be to read the whole item-item similarity matrix > into memory, but it would never fit. Instead the algorithm has to be > reshaped entirely so that only perhaps one column is in memory at a > time. > > It really should look at a lot like the item-based recommender. It's > just that the step that predicts the rating looks different. Diffs > aren't weights to be multiplied onto real ratings, but deltas to be > added to real ratings, to reach a prediction. > > On Mon, Apr 11, 2011 at 8:50 AM, ke xie <[email protected]> wrote: > > Well, I know it's not in Mahout, and I'll surely make it public if I can > > implement one. > > The problem here is, how can I predict ratings for users using > map-reduce... > > I mean, what's the mapper and reducer class would be like? As I can not > read > > the diff-matrix and users profiles at the same time. Even if I can, I'm > not > > sure how to write the mapper and reducer. > > I'm re-reding the chapter 6 of your book, perhaps can find some clues. > > Any suggestions? > > > > On Mon, Apr 11, 2011 at 3:41 PM, Sean Owen <[email protected]> wrote: > >> > >> You can certainly use map-reduce. But Mahout does not yet have an > >> implementation for you, for slope-one. If you implement slope-one, > >> which isn't terribly hard and would resemble the item-based > >> recommender implementation you see in the project already, consider > >> contributing it to the Mahout project. > >> > >> 120GB is a huge amount of RAM... just use the MemoryDiffStorage > >> constructor that lets you limit the number of diffs it actually stores > >> in RAM, and you can fit about half the diffs into memory. That should > >> work 99% as well as using all of them. > >> > >> On Mon, Apr 11, 2011 at 8:39 AM, ke xie <[email protected]> wrote: > >> > Actually, I want to do SlopeOne on KDD-MUSIC dataset... > >> > As you know, it's really big, and the diff-matrix is 160GB size. > Though > >> > I > >> > have a 120GB RAM machine, that's not enough. Now I'm going to predict > >> > the > >> > rating for the users in the test set, so I think I need to import the > >> > user-profile. > >> > > >> > I wonder can't I use a map-reduce program to calculate the > predictions? > >> > If I > >> > can, would you please give me some hints? > >> > Thank you. > >> > > >> > On Mon, Apr 11, 2011 at 3:31 PM, Sean Owen <[email protected]> wrote: > >> >> > >> >> There is no distributed slope-one implementation at this time. You > >> >> need to copy the resulting diffs output off HDFS to a local disk. > Then > >> >> you simply use it as input to a MemoryDiffStorage for > >> >> SlopeOneRecommender. > >> >> > >> >> However, if you have computed diffs over a large number of items, it > >> >> may not fit in memory. You can try JDBCDiffStorage and put diffs in a > >> >> database, but you may find it's just too slow. Or you can set > >> >> MemoryDiffStorage to cap the number of diffs it store. > >> >> > >> >> None of these algorithms involve a user profile. > >> >> > >> >> On Mon, Apr 11, 2011 at 8:20 AM, ke xie <[email protected]> wrote: > >> >> > Hi there: > >> >> > > >> >> > I've successfully used a hadoop program to calculate the > diff-matrix, > >> >> > and > >> >> > stored the data in my HDFS... > >> >> > > >> >> > But now I'm confusing, how can I read the users' profile as well as > >> >> > the > >> >> > diff-matrix at the same time(they are at different location in my > >> >> > HDFS) > >> >> > to > >> >> > predict a specific user's ratings? > >> >> > > >> >> > I've already checked the mahout implementation of Slopeone with > >> >> > hadoop, > >> >> > but > >> >> > that one just did the calculation of diff-matrix.. and no > prediction > >> >> > part is > >> >> > included... > >> >> > > >> >> > Anyone can help me? How to read two kinds of data in Hadoop program > >> >> > at > >> >> > the > >> >> > same time? > >> >> > > >> >> > > >> >> > -- > >> >> > Name: Ke Xie Eddy > >> >> > Research Group of Information Retrieval > >> >> > State Key Laboratory of Intelligent Technology and Systems > >> >> > Tsinghua University > >> >> > > >> > > >> > > >> > > >> > -- > >> > Name: Ke Xie Eddy > >> > Research Group of Information Retrieval > >> > State Key Laboratory of Intelligent Technology and Systems > >> > Tsinghua University > >> > > >> > > > > > > > > > -- > > Name: Ke Xie Eddy > > Research Group of Information Retrieval > > State Key Laboratory of Intelligent Technology and Systems > > Tsinghua University > > > > > -- Name: Ke Xie Eddy Research Group of Information Retrieval State Key Laboratory of Intelligent Technology and Systems Tsinghua University
