In the end is my exception stack. It is a company internal class that Spark
complains.
org.apache.spark.SparkException: Job aborted: Task not serializable:
java.io.NotSerializableException: com.mycompany.util.xxx
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler
Which class is not Serializable?
I run shark0.9 has a similarity exception:
java.io.NotSerializableException (java.io.NotSerializableException:
shark.execution.ReduceKeyReduceSide)
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183
I got a error
org.apache.spark.SparkException: Job aborted: Task not serializable:
java.io.NotSerializableException:
But the class it complains is a java lib class that I dependents on, that I
can't change it to Serializable.
Is there any method to work this around?
I am using Spar
, David Thomas wrote:
>>>
>>>> I'm trying to copy a file from hdfs to a temp local directory within a
>>>> map function using static method of FileUtil and I get the below error. Is
>>>> there a way to get around this?
>>>>
>>>>
function using static method of FileUtil and I get the below error. Is
>>> there a way to get around this?
>>>
>>> org.apache.spark.SparkException: Job aborted: Task not serializable:
>>> java.io.NotSerializableException: org.apache.hadoop.fs.Path
>>> at
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
>>>
>>
>>
>
f FileUtil and I get the below error. Is
>> there a way to get around this?
>>
>> org.apache.spark.SparkException: Job aborted: Task not serializable:
>> java.io.NotSerializableException: org.apache.hadoop.fs.Path
>> at
>> org.apache.spark.sche
aborted: Task not serializable:
> java.io.NotSerializableException: org.apache.hadoop.fs.Path
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
>
I'm trying to copy a file from hdfs to a temp local directory within a map
function using static method of FileUtil and I get the below error. Is
there a way to get around this?
org.apache.spark.SparkException: Job aborted: Task not serializable:
java.io.NotSerializableExce
> DoubleFlatMapFunction>() {
>> >
>> > @Override
>> > public Iterable call(Tuple2 e) {
>> > BSONObject doc = e._2();
>> > BasicDBList vals = (BasicDBList)doc.get("data");
>> >
>> > List result
st vals = (BasicDBList)doc.get("data");
> >
> > List results = new ArrayList();
> > for (int i=0; i< vals.size();i++ )
> > results.add((Double)((BasicDBList)vals.get(i)).get(0));
> >
> > return results;
> >
> >
sicDBList)vals.get(i)).get(0));
>
> return results;
>
> }
> });
>
> logger.info("Take: {}", rdd2.take(100));
> logger.info("Count: {}", rdd2.count());
>
>
> }
>
> }
>
>
> On 11/3/1
te:
Thanks that would help. This would be consistent with there being a
reference to the SparkContext itself inside of the closure. Just want
to make sure that's not the case.
On Sun, Nov 3, 2013 at 5:13 PM, Yadid Ayzenberg wrote:
Im running in local[4] mode -
es. Full stack
> trace:
>
>
> (run-main) org.apache.spark.SparkException: Job failed:
> java.io.NotSerializableException: edu.mit.bsense.AnalyticsEngine
> org.apache.spark.SparkException: Job failed:
> java.io.NotSerializableException: edu.mit.bs
Im running in local[4] mode - so there are no slave machines. Full stack
trace:
(run-main) org.apache.spark.SparkException: Job failed:
java.io.NotSerializableException: edu.mit.bsense.AnalyticsEngine
org.apache.spark.SparkException: Job failed:
java.io.NotSerializableException
>> - Patrick
>>>>
>>>> On Sun, Nov 3, 2013 at 10:33 AM, Yadid Ayzenberg
>>>> wrote:
>>>>>
>>>>> Hi All,
>>>>>
>>>>> My original RDD contains arrays of doubles. when
g exception:
19829 [run-main] INFO org.apache.spark.scheduler.DAGScheduler - Failed
to
run count at AnalyticsEngine.java:133
[error] (run-main) org.apache.spark.SparkException: Job failed:
java.io.NotSerializableException: edu.mit.bsense.AnalyticsEngine
org.apache.spark.SparkException: Job faile
as expected.
>>> However when I run a map on the original RDD in order to generate a new
>>> RDD
>>> with only the first element of each array, and try to apply count() to
>>> the
>>> new generated RDD I get the following exception:
>>>
>>> 19829
n:
19829 [run-main] INFO org.apache.spark.scheduler.DAGScheduler - Failed to
run count at AnalyticsEngine.java:133
[error] (run-main) org.apache.spark.SparkException: Job failed:
java.io.NotSerializableException: edu.mit.bsense.AnalyticsEngine
org.apache.spark.SparkException: Job failed:
java.io
try to apply count() to the
> new generated RDD I get the following exception:
>
> 19829 [run-main] INFO org.apache.spark.scheduler.DAGScheduler - Failed to
> run count at AnalyticsEngine.java:133
> [error] (run-main) org.apache.spark.SparkException: Job failed:
> jav
generated RDD I get the following exception:
19829 [run-main] INFO org.apache.spark.scheduler.DAGScheduler - Failed
to run count at AnalyticsEngine.java:133
[error] (run-main) org.apache.spark.SparkException: Job failed:
java.io.NotSerializableException: edu.mit.bsense.AnalyticsEngine
0: INFO [run-main]
(org.apache.hadoop.mapred.FileInputFormat:199) - Total input paths to
process : 1
20:50:27,760: INFO [run-main] (spark.SparkContext:31) - Starting job:
foreach at webs.scala:40
[error] (run-main) spark.SparkException: Job failed: ResultTask(0, 1)
failed:
ExceptionFailure(j
21 matches
Mail list logo