Hey all, trying to set up a pretty simple streaming app and getting some weird behavior.
First, a non-streaming job that works fine: I'm trying to pull out lines of a log file that match a regex, for which I've set up a function: def getRequestDoc(s: String): String = { "KBDOC-[0-9]*".r.findFirstIn(s).orNull } logs=sc.textFile(logfiles) logs.map(getRequestDoc).take(10) That works, but I want to run that on the same data, but streaming, so I tried this: val logs = ssc.socketTextStream("localhost",4444) logs.map(getRequestDoc).print() ssc.start() >From this code, I get: 14/05/08 03:32:08 ERROR JobScheduler: Error running job streaming job 1399545128000 ms.0 org.apache.spark.SparkException: Job aborted: Task not serializable: java.io.NotSerializableException: org.apache.spark.streaming.StreamingContext But if I do the map function inline instead of calling a separate function, it works: logs.map("KBDOC-[0-9]*".r.findFirstIn(_).orNull).print() So why is it able to serialize my little function in regular spark, but not in streaming? Thanks, Diana