For large objects, it will be more efficient to broadcast it. If your array
is small it won't really matter. How many centers do you have? Unless you
are finding that you have very large tasks (and Spark will print a warning
about this), it could be okay to just reference it directly.


On Wed, Aug 20, 2014 at 1:18 AM, Julien Naour <julna...@gmail.com> wrote:

> Hi,
>
> I have a question about broadcast. I'm working on a clustering algorithm
> close to KMeans. It seems that KMeans broadcast clusters centers at each
> step. For the moment I just use my centers as Array that I call directly in
> my map at each step. Could it be more efficient to use broadcast instead of
> simple variable?
>
> Cheers,
>
> Julien Naour
>

Reply via email to