There is a JIRA for it: https://issues.apache.org/jira/browse/SPARK-3066
The easiest case is when one side is small. If both sides are large, this is a super-expensive operation. We can do block-wise cross product and then find top-k for each user. Best, Xiangrui On Thu, Nov 6, 2014 at 4:51 PM, Debasish Das <debasish.da...@gmail.com> wrote: > model.recommendProducts can only be called from the master then ? I have a > set of 20% users on whom I am performing the test...the 20% users are in a > RDD...if I have to collect them all to master node and then call > model.recommendProducts, that's a issue... > > Any idea how to optimize this so that we can calculate MAP statistics on > large samples of data ? > > > On Thu, Nov 6, 2014 at 4:41 PM, Xiangrui Meng <men...@gmail.com> wrote: >> >> ALS model contains RDDs. So you cannot put `model.recommendProducts` >> inside a RDD closure `userProductsRDD.map`. -Xiangrui >> >> On Thu, Nov 6, 2014 at 4:39 PM, Debasish Das <debasish.da...@gmail.com> >> wrote: >> > I reproduced the problem in mllib tests ALSSuite.scala using the >> > following >> > functions: >> > >> > val arrayPredict = userProductsRDD.map{case(user,product) => >> > >> > val recommendedProducts = model.recommendProducts(user, >> > products) >> > >> > val productScore = recommendedProducts.find{x=>x.product == >> > product} >> > >> > require(productScore != None) >> > >> > productScore.get >> > >> > }.collect >> > >> > arrayPredict.foreach { elem => >> > >> > if (allRatings.get(elem.user, elem.product) != elem.rating) >> > >> > fail("Prediction APIs don't match") >> > >> > } >> > >> > If the usage of model.recommendProducts is correct, the test fails with >> > the >> > same error I sent before... >> > >> > org.apache.spark.SparkException: Job aborted due to stage failure: Task >> > 0 in >> > stage 316.0 failed 1 times, most recent failure: Lost task 0.0 in stage >> > 316.0 (TID 79, localhost): scala.MatchError: null >> > >> > org.apache.spark.rdd.PairRDDFunctions.lookup(PairRDDFunctions.scala:825) >> > >> > org.apache.spark.mllib.recommendation.MatrixFactorizationModel.recommendProducts(MatrixFactorizationModel.scala:81) >> > >> > It is a blocker for me and I am debugging it. I will open up a JIRA if >> > this >> > is indeed a bug... >> > >> > Do I have to cache the models to make userFeatures.lookup(user).head to >> > work >> > ? >> > >> > >> > On Mon, Nov 3, 2014 at 9:24 PM, Xiangrui Meng <men...@gmail.com> wrote: >> >> >> >> Was "user" presented in training? We can put a check there and return >> >> NaN if the user is not included in the model. -Xiangrui >> >> >> >> On Mon, Nov 3, 2014 at 5:25 PM, Debasish Das <debasish.da...@gmail.com> >> >> wrote: >> >> > Hi, >> >> > >> >> > I am testing MatrixFactorizationModel.predict(user: Int, product: >> >> > Int) >> >> > but >> >> > the code fails on userFeatures.lookup(user).head >> >> > >> >> > In computeRmse MatrixFactorizationModel.predict(RDD[(Int, Int)]) has >> >> > been >> >> > called and in all the test-cases that API has been used... >> >> > >> >> > I can perhaps refactor my code to do the same but I was wondering >> >> > whether >> >> > people test the lookup(user) version of the code.. >> >> > >> >> > Do I need to cache the model to make it work ? I think right now >> >> > default >> >> > is >> >> > STORAGE_AND_DISK... >> >> > >> >> > Thanks. >> >> > Deb >> > >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org