I am not entire sure off the top of my head. But a possible (usually works)
workaround is to define the function as a val instead of a def. For example

def func(i: Int): Boolean = { true }

can be written as

val func = (i: Int) => { true }

Hope this helps for now.

TD


On Tue, Jul 15, 2014 at 9:21 AM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> Hey Diana,
>
> Did you ever figure this out?
>
> I’m running into the same exception, except in my case the function I’m
> calling is a KMeans model.predict().
>
> In regular Spark it works, and Spark Streaming without the call to
> model.predict() also works, but when put together I get this
> serialization exception. I’m on 1.0.0.
>
> Nick
> ​
>
>
> On Thu, May 8, 2014 at 6:37 AM, Diana Carroll <dcarr...@cloudera.com>
> wrote:
>
>> Hey all, trying to set up a pretty simple streaming app and getting some
>> weird behavior.
>>
>>  First, a non-streaming job that works fine:  I'm trying to pull out
>> lines of a log file that match a regex, for which I've set up a function:
>>
>> def getRequestDoc(s: String):
>>     String = { "KBDOC-[0-9]*".r.findFirstIn(s).orNull }
>> logs=sc.textFile(logfiles)
>> logs.map(getRequestDoc).take(10)
>>
>> That works, but I want to run that on the same data, but streaming, so I
>> tried this:
>>
>> val logs = ssc.socketTextStream("localhost",4444)
>> logs.map(getRequestDoc).print()
>> ssc.start()
>>
>> From this code, I get:
>> 14/05/08 03:32:08 ERROR JobScheduler: Error running job streaming job
>> 1399545128000 ms.0
>> org.apache.spark.SparkException: Job aborted: Task not serializable:
>> java.io.NotSerializableException:
>> org.apache.spark.streaming.StreamingContext
>>
>>
>> But if I do the map function inline instead of calling a separate
>> function, it works:
>>
>> logs.map("KBDOC-[0-9]*".r.findFirstIn(_).orNull).print()
>>
>> So why is it able to serialize my little function in regular spark, but
>> not in streaming?
>>
>> Thanks,
>> Diana
>>
>>
>>
>

Reply via email to