You can use rdd.cartesian then find top-k by key to distribute the work to executors. There is a trick to boost the performance: you need to blockify user/product features and then use native matrix-matrix multiplication. There is a relevant PR from Deb: https://github.com/apache/spark/pull/3098 . -Xiangrui
On Mon, Feb 23, 2015 at 4:53 AM, Erlend Hamnaberg <erl...@hamnaberg.net> wrote: > Hi. > > We are using the ALS model, and would like to get all users and items > scored. > > currently we have these methods. > > https://gist.github.com/hamnis/e396854f4654bd46ebe0 > > We want to be able to distribute the calculations to the slaves so we dont > have to do this on the master. > > Is there an efficient and distributed way of doing this? > > > I suppose we could collect all items in the product features and send that > into a broadcast, but that needs all items on the master, and we want to > avoid that. > > Regards > > Erlend --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org