You can use rdd.cartesian then find top-k by key to distribute the
work to executors. There is a trick to boost the performance: you need
to blockify user/product features and then use native matrix-matrix
multiplication. There is a relevant PR from Deb:
https://github.com/apache/spark/pull/3098 . -Xiangrui

On Mon, Feb 23, 2015 at 4:53 AM, Erlend Hamnaberg <erl...@hamnaberg.net> wrote:
> Hi.
>
> We are using the ALS model, and would like to get all users and items
> scored.
>
> currently we have these methods.
>
> https://gist.github.com/hamnis/e396854f4654bd46ebe0
>
> We want to be able to distribute the calculations to the slaves so we dont
> have to do this on the master.
>
> Is there an efficient and distributed way of doing this?
>
>
> I suppose we could collect all items in the product features and send that
> into a broadcast, but that needs all items on the master, and we want to
> avoid that.
>
> Regards
>
> Erlend

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to