This may be due in part to Scala allocating an anonymous inner class in order to execute the for loop. I would expect if you change it to a while loop like
var i = 0 while (i < 10) { sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum += x) i += 1 } then the problem may go away. I am not super familiar with the closure cleaner, but I believe that we cannot prune beyond 1 layer of references, so the extra class of nesting may be screwing something up. If this is the case, then I would also expect replacing the accumulator with any other reference to the enclosing scope (such as a broadcast variable) would have the same result. On Fri, Nov 7, 2014 at 12:03 AM, Shixiong Zhu <zsxw...@gmail.com> wrote: > Could you provide all pieces of codes which can reproduce the bug? Here is > my test code: > > import org.apache.spark._ > import org.apache.spark.SparkContext._ > > object SimpleApp { > > def main(args: Array[String]) { > val conf = new SparkConf().setAppName("SimpleApp") > val sc = new SparkContext(conf) > > val accum = sc.accumulator(0) > for (i <- 1 to 10) { > sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum += x) > } > sc.stop() > } > } > > It works fine both in client and cluster. Since this is a serialization > bug, the outer class does matter. Could you provide it? Is there > a SparkContext field in the outer class? > > Best Regards, > Shixiong Zhu > > 2014-10-28 0:28 GMT+08:00 octavian.ganea <octavian.ga...@inf.ethz.ch>: > > I am also using spark 1.1.0 and I ran it on a cluster of nodes (it works >> if I >> run it in local mode! ) >> >> If I put the accumulator inside the for loop, everything will work fine. I >> guess the bug is that an accumulator can be applied to JUST one RDD. >> >> Still another undocumented 'feature' of Spark that no one from the people >> who maintain Spark is willing to solve or at least to tell us about ... >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Bug-in-Accumulators-tp17263p17372.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >