I'm running into similar problems with accumulators failing to serialize
properly.  Are there any examples of accumulators being used in more
complex environments than simply initializing them in the same class and
then using them in a .foreach() on an RDD referenced a few lines below?

>From the above looking error, it looks like any scala complexity at all
which is added causes the closure cleaner to freak out with accumulators...

On Fri, Nov 7, 2014 at 12:12 AM, Aaron Davidson <ilike...@gmail.com> wrote:

> This may be due in part to Scala allocating an anonymous inner class in
> order to execute the for loop. I would expect if you change it to a while
> loop like
>
> var i = 0
> while (i < 10) {
>   sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum += x)
>   i += 1
> }
>
> then the problem may go away. I am not super familiar with the closure
> cleaner, but I believe that we cannot prune beyond 1 layer of references,
> so the extra class of nesting may be screwing something up. If this is the
> case, then I would also expect replacing the accumulator with any other
> reference to the enclosing scope (such as a broadcast variable) would have
> the same result.
>
> On Fri, Nov 7, 2014 at 12:03 AM, Shixiong Zhu <zsxw...@gmail.com> wrote:
>
>> Could you provide all pieces of codes which can reproduce the bug? Here
>> is my test code:
>>
>> import org.apache.spark._
>> import org.apache.spark.SparkContext._
>>
>> object SimpleApp {
>>
>>   def main(args: Array[String]) {
>>     val conf = new SparkConf().setAppName("SimpleApp")
>>     val sc = new SparkContext(conf)
>>
>>     val accum = sc.accumulator(0)
>>     for (i <- 1 to 10) {
>>       sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum += x)
>>     }
>>     sc.stop()
>>   }
>> }
>>
>> It works fine both in client and cluster. Since this is a serialization
>> bug, the outer class does matter. Could you provide it? Is there
>> a SparkContext field in the outer class?
>>
>> Best Regards,
>> Shixiong Zhu
>>
>> 2014-10-28 0:28 GMT+08:00 octavian.ganea <octavian.ga...@inf.ethz.ch>:
>>
>> I am also using spark 1.1.0 and I ran it on a cluster of nodes (it works
>>> if I
>>> run it in local mode! )
>>>
>>> If I put the accumulator inside the for loop, everything will work fine.
>>> I
>>> guess the bug is that an accumulator can be applied to JUST one RDD.
>>>
>>> Still another undocumented 'feature' of Spark that no one from the people
>>> who maintain Spark is willing to solve or at least to tell us about ...
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Bug-in-Accumulators-tp17263p17372.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>


-- 

  -jake

Reply via email to