Thanks Erik. I saw the document too. That is why I am confused because as per the article, it should be good as long as *foo *is serializable. However, what I have seen is that it would work if *testing* is serializable, even foo is not serializable, as shown below. I don't know if there is something specific to Spark.
For example, the code example below works. object testing extends Serializable { object foo { val v = 42 } val list = List(1,2,3) val rdd = sc.parallelize(list) def func = { val after = rdd.foreachPartition { it => println(foo.v) } } } On Thu, Jul 9, 2015 at 12:11 PM, Erik Erlandson <e...@redhat.com> wrote: > I think you have stumbled across this idiosyncrasy: > > > http://erikerlandson.github.io/blog/2015/03/31/hygienic-closures-for-scala-function-serialization/ > > > > > ----- Original Message ----- > > I am not sure this is more of a question for Spark or just Scala but I am > > posting my question here. > > > > The code snippet below shows an example of passing a reference to a > closure > > in rdd.foreachPartition method. > > > > ``` > > object testing { > > object foo extends Serializable { > > val v = 42 > > } > > val list = List(1,2,3) > > val rdd = sc.parallelize(list) > > def func = { > > val after = rdd.foreachPartition { > > it => println(foo.v) > > } > > } > > } > > ``` > > When running this code, I got an exception > > > > ``` > > Caused by: java.io.NotSerializableException: > > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$ > > Serialization stack: > > - object not serializable (class: > > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$, value: > > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$@10b7e824) > > - field (class: > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1, > > name: $outer, type: class > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$) > > - object (class > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1, > > <function1>) > > ``` > > > > It looks like Spark needs to serialize `testing` object. Why is it > > serializing testing even though I only pass foo (another serializable > > object) in the closure? > > > > A more general question is, how can I prevent Spark from serializing the > > parent class where RDD is defined, with still support of passing in > > function defined in other classes? > > > > -- > > Chen Song > > > -- Chen Song