Thanks Erik. I saw the document too. That is why I am confused because as
per the article, it should be good as long as *foo *is serializable.
However, what I have seen is that it would work if *testing* is
serializable, even foo is not serializable, as shown below. I don't know if
there is something specific to Spark.

For example, the code example below works.

object testing extends Serializable {

    object foo {

      val v = 42

    }

    val list = List(1,2,3)

    val rdd = sc.parallelize(list)

    def func = {

      val after = rdd.foreachPartition {

        it => println(foo.v)

      }

    }

  }

On Thu, Jul 9, 2015 at 12:11 PM, Erik Erlandson <e...@redhat.com> wrote:

> I think you have stumbled across this idiosyncrasy:
>
>
> http://erikerlandson.github.io/blog/2015/03/31/hygienic-closures-for-scala-function-serialization/
>
>
>
>
> ----- Original Message -----
> > I am not sure this is more of a question for Spark or just Scala but I am
> > posting my question here.
> >
> > The code snippet below shows an example of passing a reference to a
> closure
> > in rdd.foreachPartition method.
> >
> > ```
> > object testing {
> >     object foo extends Serializable {
> >       val v = 42
> >     }
> >     val list = List(1,2,3)
> >     val rdd = sc.parallelize(list)
> >     def func = {
> >       val after = rdd.foreachPartition {
> >         it => println(foo.v)
> >       }
> >     }
> >   }
> > ```
> > When running this code, I got an exception
> >
> > ```
> > Caused by: java.io.NotSerializableException:
> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$
> > Serialization stack:
> > - object not serializable (class:
> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$, value:
> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$@10b7e824)
> > - field (class:
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
> > name: $outer, type: class
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$)
> > - object (class
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
> > <function1>)
> > ```
> >
> > It looks like Spark needs to serialize `testing` object. Why is it
> > serializing testing even though I only pass foo (another serializable
> > object) in the closure?
> >
> > A more general question is, how can I prevent Spark from serializing the
> > parent class where RDD is defined, with still support of passing in
> > function defined in other classes?
> >
> > --
> > Chen Song
> >
>



-- 
Chen Song

Reply via email to