Hey Diana, Did you ever figure this out?
I’m running into the same exception, except in my case the function I’m calling is a KMeans model.predict(). In regular Spark it works, and Spark Streaming without the call to model.predict() also works, but when put together I get this serialization exception. I’m on 1.0.0. Nick On Thu, May 8, 2014 at 6:37 AM, Diana Carroll <dcarr...@cloudera.com> wrote: > Hey all, trying to set up a pretty simple streaming app and getting some > weird behavior. > > First, a non-streaming job that works fine: I'm trying to pull out lines > of a log file that match a regex, for which I've set up a function: > > def getRequestDoc(s: String): > String = { "KBDOC-[0-9]*".r.findFirstIn(s).orNull } > logs=sc.textFile(logfiles) > logs.map(getRequestDoc).take(10) > > That works, but I want to run that on the same data, but streaming, so I > tried this: > > val logs = ssc.socketTextStream("localhost",4444) > logs.map(getRequestDoc).print() > ssc.start() > > From this code, I get: > 14/05/08 03:32:08 ERROR JobScheduler: Error running job streaming job > 1399545128000 ms.0 > org.apache.spark.SparkException: Job aborted: Task not serializable: > java.io.NotSerializableException: > org.apache.spark.streaming.StreamingContext > > > But if I do the map function inline instead of calling a separate > function, it works: > > logs.map("KBDOC-[0-9]*".r.findFirstIn(_).orNull).print() > > So why is it able to serialize my little function in regular spark, but > not in streaming? > > Thanks, > Diana > > >