You can certainly use map-reduce. But Mahout does not yet have an implementation for you, for slope-one. If you implement slope-one, which isn't terribly hard and would resemble the item-based recommender implementation you see in the project already, consider contributing it to the Mahout project.
120GB is a huge amount of RAM... just use the MemoryDiffStorage constructor that lets you limit the number of diffs it actually stores in RAM, and you can fit about half the diffs into memory. That should work 99% as well as using all of them. On Mon, Apr 11, 2011 at 8:39 AM, ke xie <[email protected]> wrote: > Actually, I want to do SlopeOne on KDD-MUSIC dataset... > As you know, it's really big, and the diff-matrix is 160GB size. Though I > have a 120GB RAM machine, that's not enough. Now I'm going to predict the > rating for the users in the test set, so I think I need to import the > user-profile. > > I wonder can't I use a map-reduce program to calculate the predictions? If I > can, would you please give me some hints? > Thank you. > > On Mon, Apr 11, 2011 at 3:31 PM, Sean Owen <[email protected]> wrote: >> >> There is no distributed slope-one implementation at this time. You >> need to copy the resulting diffs output off HDFS to a local disk. Then >> you simply use it as input to a MemoryDiffStorage for >> SlopeOneRecommender. >> >> However, if you have computed diffs over a large number of items, it >> may not fit in memory. You can try JDBCDiffStorage and put diffs in a >> database, but you may find it's just too slow. Or you can set >> MemoryDiffStorage to cap the number of diffs it store. >> >> None of these algorithms involve a user profile. >> >> On Mon, Apr 11, 2011 at 8:20 AM, ke xie <[email protected]> wrote: >> > Hi there: >> > >> > I've successfully used a hadoop program to calculate the diff-matrix, >> > and >> > stored the data in my HDFS... >> > >> > But now I'm confusing, how can I read the users' profile as well as the >> > diff-matrix at the same time(they are at different location in my HDFS) >> > to >> > predict a specific user's ratings? >> > >> > I've already checked the mahout implementation of Slopeone with hadoop, >> > but >> > that one just did the calculation of diff-matrix.. and no prediction >> > part is >> > included... >> > >> > Anyone can help me? How to read two kinds of data in Hadoop program at >> > the >> > same time? >> > >> > >> > -- >> > Name: Ke Xie Eddy >> > Research Group of Information Retrieval >> > State Key Laboratory of Intelligent Technology and Systems >> > Tsinghua University >> > > > > > -- > Name: Ke Xie Eddy > Research Group of Information Retrieval > State Key Laboratory of Intelligent Technology and Systems > Tsinghua University > >
