I reproduced the problem in mllib tests ALSSuite.scala using the following
functions:
val arrayPredict = userProductsRDD.map{case(user,product) =
val recommendedProducts = model.recommendProducts(user, products)
val productScore = recommendedProducts.find{x=x.product
model.recommendProducts can only be called from the master then ? I have a
set of 20% users on whom I am performing the test...the 20% users are in a
RDD...if I have to collect them all to master node and then call
model.recommendProducts, that's a issue...
Any idea how to optimize this so that
There is a JIRA for it: https://issues.apache.org/jira/browse/SPARK-3066
The easiest case is when one side is small. If both sides are large,
this is a super-expensive operation. We can do block-wise cross
product and then find top-k for each user.
Best,
Xiangrui
On Thu, Nov 6, 2014 at 4:51 PM,