Reading that article and applying it to your observations of what happens
at runtime:

shouldn't the closure require serializing testing? The foo singleton object
is a member of testing, and then you call this foo value in the closure
func and further in the foreachPartition closure. So following by that
article, Scala will attempt to serialize the containing object/class
testing to get the foo instance.

On Thu, Jul 9, 2015 at 4:11 PM, Chen Song <chen.song...@gmail.com> wrote:

> Repost the code example,
>
> object testing extends Serializable {
>     object foo {
>       val v = 42
>     }
>     val list = List(1,2,3)
>     val rdd = sc.parallelize(list)
>     def func = {
>       val after = rdd.foreachPartition {
>         it => println(foo.v)
>       }
>     }
>   }
>
> On Thu, Jul 9, 2015 at 4:09 PM, Chen Song <chen.song...@gmail.com> wrote:
>
>> Thanks Erik. I saw the document too. That is why I am confused because as
>> per the article, it should be good as long as *foo *is serializable.
>> However, what I have seen is that it would work if *testing* is
>> serializable, even foo is not serializable, as shown below. I don't know if
>> there is something specific to Spark.
>>
>> For example, the code example below works.
>>
>> object testing extends Serializable {
>>
>>     object foo {
>>
>>       val v = 42
>>
>>     }
>>
>>     val list = List(1,2,3)
>>
>>     val rdd = sc.parallelize(list)
>>
>>     def func = {
>>
>>       val after = rdd.foreachPartition {
>>
>>         it => println(foo.v)
>>
>>       }
>>
>>     }
>>
>>   }
>>
>> On Thu, Jul 9, 2015 at 12:11 PM, Erik Erlandson <e...@redhat.com> wrote:
>>
>>> I think you have stumbled across this idiosyncrasy:
>>>
>>>
>>> http://erikerlandson.github.io/blog/2015/03/31/hygienic-closures-for-scala-function-serialization/
>>>
>>>
>>>
>>>
>>> ----- Original Message -----
>>> > I am not sure this is more of a question for Spark or just Scala but I
>>> am
>>> > posting my question here.
>>> >
>>> > The code snippet below shows an example of passing a reference to a
>>> closure
>>> > in rdd.foreachPartition method.
>>> >
>>> > ```
>>> > object testing {
>>> >     object foo extends Serializable {
>>> >       val v = 42
>>> >     }
>>> >     val list = List(1,2,3)
>>> >     val rdd = sc.parallelize(list)
>>> >     def func = {
>>> >       val after = rdd.foreachPartition {
>>> >         it => println(foo.v)
>>> >       }
>>> >     }
>>> >   }
>>> > ```
>>> > When running this code, I got an exception
>>> >
>>> > ```
>>> > Caused by: java.io.NotSerializableException:
>>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$
>>> > Serialization stack:
>>> > - object not serializable (class:
>>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$, value:
>>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$@10b7e824)
>>> > - field (class:
>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
>>> > name: $outer, type: class
>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$)
>>> > - object (class
>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
>>> > <function1>)
>>> > ```
>>> >
>>> > It looks like Spark needs to serialize `testing` object. Why is it
>>> > serializing testing even though I only pass foo (another serializable
>>> > object) in the closure?
>>> >
>>> > A more general question is, how can I prevent Spark from serializing
>>> the
>>> > parent class where RDD is defined, with still support of passing in
>>> > function defined in other classes?
>>> >
>>> > --
>>> > Chen Song
>>> >
>>>
>>
>>
>>
>> --
>> Chen Song
>>
>>
>
>
> --
> Chen Song
>
>


-- 
-- 
*Richard Marscher*
Software Engineer
Localytics
Localytics.com <http://localytics.com/> | Our Blog
<http://localytics.com/blog> | Twitter <http://twitter.com/localytics> |
Facebook <http://facebook.com/localytics> | LinkedIn
<http://www.linkedin.com/company/1148792?trk=tyah>

Reply via email to