Hi Areg,
Check out
http://spark.apache.org/docs/latest/programming-guide.html#accumulators
val sum = sc.accumulator(0) // accumulator created from an initial value
in the driver
The accumulator variable is created in the driver. Tasks running on the
cluster can then add to it. However, they ca
If you use parallelize, the data is distributed across multiple nodes
available and sum is computed individually within each partition and later
merged. The driver manages the entire process. Is my understanding correct?
Can someone please correct me if I am wrong?
On Fri, Oct 10, 2014 at 9:37 AM,
Hello,
I was wondering on what does the Spark accumulator do under the covers. I’ve
implemented my own associative addInPlace function for the accumulator, where
is this function being run? Let’s say you call something like myRdd.map(x =>
sum += x) is “sum” being accumulated locally in any way,