Hi!
I understand the usual "Task not serializable" issue that arises when
accessing a field or a method that is out of scope of a closure.
To fix it, I usually define a local copy of these fields/methods, which
avoids the need to serialize the whole class:
class MyClass(val myField: Any) {
def run() = {
val f = sc.textFile("hdfs://xxx.xxx.xxx.xxx/file.csv")
val myField = this.myField
println(f.map( _ + myField ).count)
}
}
===================
Now, if I define a nested function in the run method, it cannot be
serialized:
class MyClass() {
def run() = {
val f = sc.textFile("hdfs://xxx.xxx.xxx.xxx/file.csv")
def mapFn(line: String) = line.split(";")
val myField = this.myField
println(f.map( mapFn( _ ) ).count)
}
}
I don't understand since I thought "mapFn" would be in scope...
Even stranger, if I define mapFn to be a val instead of a def, then it
works:
class MyClass() {
def run() = {
val f = sc.textFile("hdfs://xxx.xxx.xxx.xxx/file.csv")
val mapFn = (line: String) => line.split(";")
println(f.map( mapFn( _ ) ).count)
}
}
Is this related to the way Scala represents nested functions?
What's the recommended way to deal with this issue ?
Thanks for your help,
Pierre
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Nested-method-in-a-class-Task-not-serializable-tp5869.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.