I am not entire sure off the top of my head. But a possible (usually works) workaround is to define the function as a val instead of a def. For example
def func(i: Int): Boolean = { true } can be written as val func = (i: Int) => { true } Hope this helps for now. TD On Tue, Jul 15, 2014 at 9:21 AM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > Hey Diana, > > Did you ever figure this out? > > I’m running into the same exception, except in my case the function I’m > calling is a KMeans model.predict(). > > In regular Spark it works, and Spark Streaming without the call to > model.predict() also works, but when put together I get this > serialization exception. I’m on 1.0.0. > > Nick > > > > On Thu, May 8, 2014 at 6:37 AM, Diana Carroll <dcarr...@cloudera.com> > wrote: > >> Hey all, trying to set up a pretty simple streaming app and getting some >> weird behavior. >> >> First, a non-streaming job that works fine: I'm trying to pull out >> lines of a log file that match a regex, for which I've set up a function: >> >> def getRequestDoc(s: String): >> String = { "KBDOC-[0-9]*".r.findFirstIn(s).orNull } >> logs=sc.textFile(logfiles) >> logs.map(getRequestDoc).take(10) >> >> That works, but I want to run that on the same data, but streaming, so I >> tried this: >> >> val logs = ssc.socketTextStream("localhost",4444) >> logs.map(getRequestDoc).print() >> ssc.start() >> >> From this code, I get: >> 14/05/08 03:32:08 ERROR JobScheduler: Error running job streaming job >> 1399545128000 ms.0 >> org.apache.spark.SparkException: Job aborted: Task not serializable: >> java.io.NotSerializableException: >> org.apache.spark.streaming.StreamingContext >> >> >> But if I do the map function inline instead of calling a separate >> function, it works: >> >> logs.map("KBDOC-[0-9]*".r.findFirstIn(_).orNull).print() >> >> So why is it able to serialize my little function in regular spark, but >> not in streaming? >> >> Thanks, >> Diana >> >> >> >