Just noticed an error in my wording.

Should be " I'm assuming it's not immediately aggregating on the driver
each time I call the += on the Accumulator."

On Wed, Jan 14, 2015 at 9:19 PM, Corey Nolet <cjno...@gmail.com> wrote:

> What are the limitations of using Accumulators to get a union of a bunch
> of small sets?
>
> Let's say I have an RDD[Map{String,Any} and i want to do:
>
> rdd.map(accumulator += Set(_.get("entityType").get))
>
>
> What implication does this have on performance? I'm assuming it's not
> immediately aggregating each time I call the += on the Accumulator. Is it
> doing a local combine and then occasionally sending the results on the
> current partition back to the driver?
>
>
>
>

Reply via email to