I see, yes, the latter is actually distributed. They are very different algorithms anyway.
On Fri, Nov 19, 2010 at 11:24 PM, Sanjib Kumar Das <[email protected]>wrote: > it takes 14 hrs to run the *pseudo*.RecommenderJob with the > SVDRecommender. > Ran the following command: > hadoop jar recommender.jar > org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderJob > -Dmapred.input.dir=testdata/ratings.csv -Dmapred.output.dir=outputBR > --recommenderClassName > org.apache.mahout.cf.taste.example.bucky.BuckyRecommender > > Here BuckyRecommender is SVDRecommender(30,50) > > > it takes 38 minutes if I run the *item*.RecomenderJob with the following > command : > hadoop jar recommender.jar > org.apache.mahout.cf.taste.hadoop.item.RecommenderJob > -Dmapred.input.dir=testdata/ratings.csv -Dmapred.output.dir=output > > item.RecommenderJob is very different from pseudo.RecommenderJob (in terms > of the distributed implementation) hence the difference in timings, i > guess. > > > On Fri, Nov 19, 2010 at 4:04 PM, Sean Owen <[email protected]> wrote: > > > That result sounds confusing. It should take about the same number of > > wall-clock hours either way. I don't see why it would take 14 hours -- > that > > sounds wrong. If anything it should take 38 / N minutes where N is the > > number of recommenders > > you ran. > > > > SVDRecommender is not distributed at all, no. > > > > On Fri, Nov 19, 2010 at 9:34 PM, Sanjib Kumar Das <[email protected] > > >wrote: > > > > > Hi All, > > > > > > I wanted to run a distributed RecommenderJob with the SVDRecommender > > > implementation. > > > So i ran the pseudo.RecommenderJob with an > > > SVDRecommender(numFeatures=30,trainingSteps=50) on the 1M Movielens > > > data(6040 users). So this generated 10 recommendations for each of the > > 6040 > > > users but took 14 hours to do so! My hadoop cluster had 12 m/cs. So i > > guess > > > it just ran multiple instances of the non-distributed SVD > implementation > > > and > > > each of these instances did the same thing again and again. So unless > the > > > implementation of the recommender is distributed, we dont get any > special > > > benefit with the pseudo.RecommenderJob. > > > > > > But the item.RecommenderJob does the same 10 recommendations each for > the > > > 6040 users in 38 minutes. This is because it has an underlying > > distributed > > > implementation. > > > > > > So my doubt is do we have a distributed SVDRecommender implementation? > If > > > not, how should i go about writing one? Can I use the new LanczosSolver > > to > > > achieve this? > > > > > > Thanks, > > > Sanjib > > > > > >
