That result sounds confusing. It should take about the same number of wall-clock hours either way. I don't see why it would take 14 hours -- that sounds wrong. If anything it should take 38 / N minutes where N is the number of recommenders you ran.
SVDRecommender is not distributed at all, no. On Fri, Nov 19, 2010 at 9:34 PM, Sanjib Kumar Das <[email protected]>wrote: > Hi All, > > I wanted to run a distributed RecommenderJob with the SVDRecommender > implementation. > So i ran the pseudo.RecommenderJob with an > SVDRecommender(numFeatures=30,trainingSteps=50) on the 1M Movielens > data(6040 users). So this generated 10 recommendations for each of the 6040 > users but took 14 hours to do so! My hadoop cluster had 12 m/cs. So i guess > it just ran multiple instances of the non-distributed SVD implementation > and > each of these instances did the same thing again and again. So unless the > implementation of the recommender is distributed, we dont get any special > benefit with the pseudo.RecommenderJob. > > But the item.RecommenderJob does the same 10 recommendations each for the > 6040 users in 38 minutes. This is because it has an underlying distributed > implementation. > > So my doubt is do we have a distributed SVDRecommender implementation? If > not, how should i go about writing one? Can I use the new LanczosSolver to > achieve this? > > Thanks, > Sanjib >
