In general, you can find out exactly what's not serializable by adding
-Dsun.io.serialization.extendedDebugInfo=true to SPARK_JAVA_OPTS.
Since a this reference to the enclosing class is often what's causing the
problem, a general workaround is to move the mapPartitions call to a static
method where there is no this reference. This transforms this:
class A { def f() = rdd.mapPartitions(iter => ...)}
into this:
class A { def f() = A.helper(rdd)}object A { def helper(rdd: RDD[...]) =
rdd.mapPartitions(iter => ...)}
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Variables-outside-of-mapPartitions-scope-tp5517p5527.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.