Hey Diana,

Did you ever figure this out?

I’m running into the same exception, except in my case the function I’m
calling is a KMeans model.predict().

In regular Spark it works, and Spark Streaming without the call to
model.predict() also works, but when put together I get this serialization
exception. I’m on 1.0.0.

Nick
​


On Thu, May 8, 2014 at 6:37 AM, Diana Carroll <dcarr...@cloudera.com> wrote:

> Hey all, trying to set up a pretty simple streaming app and getting some
> weird behavior.
>
> First, a non-streaming job that works fine:  I'm trying to pull out lines
> of a log file that match a regex, for which I've set up a function:
>
> def getRequestDoc(s: String):
>     String = { "KBDOC-[0-9]*".r.findFirstIn(s).orNull }
> logs=sc.textFile(logfiles)
> logs.map(getRequestDoc).take(10)
>
> That works, but I want to run that on the same data, but streaming, so I
> tried this:
>
> val logs = ssc.socketTextStream("localhost",4444)
> logs.map(getRequestDoc).print()
> ssc.start()
>
> From this code, I get:
> 14/05/08 03:32:08 ERROR JobScheduler: Error running job streaming job
> 1399545128000 ms.0
> org.apache.spark.SparkException: Job aborted: Task not serializable:
> java.io.NotSerializableException:
> org.apache.spark.streaming.StreamingContext
>
>
> But if I do the map function inline instead of calling a separate
> function, it works:
>
> logs.map("KBDOC-[0-9]*".r.findFirstIn(_).orNull).print()
>
> So why is it able to serialize my little function in regular spark, but
> not in streaming?
>
> Thanks,
> Diana
>
>
>

Reply via email to