Re: java.io.NotSerializableException

2014-02-24 Thread yaoxin
In the end is my exception stack. It is a company internal class that Spark complains. org.apache.spark.SparkException: Job aborted: Task not serializable: java.io.NotSerializableException: com.mycompany.util.xxx at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler

Re: java.io.NotSerializableException

2014-02-24 Thread leosand...@gmail.com
Which class is not Serializable? I run shark0.9 has a similarity exception: java.io.NotSerializableException (java.io.NotSerializableException: shark.execution.ReduceKeyReduceSide) java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183

java.io.NotSerializableException

2014-02-24 Thread yaoxin
I got a error org.apache.spark.SparkException: Job aborted: Task not serializable: java.io.NotSerializableException: But the class it complains is a java lib class that I dependents on, that I can't change it to Serializable. Is there any method to work this around? I am using Spar

Re: Task not serializable (java.io.NotSerializableException)

2014-02-11 Thread David Thomas
, David Thomas wrote: >>> >>>> I'm trying to copy a file from hdfs to a temp local directory within a >>>> map function using static method of FileUtil and I get the below error. Is >>>> there a way to get around this? >>>> >>>>

Re: Task not serializable (java.io.NotSerializableException)

2014-02-11 Thread Andrew Ash
function using static method of FileUtil and I get the below error. Is >>> there a way to get around this? >>> >>> org.apache.spark.SparkException: Job aborted: Task not serializable: >>> java.io.NotSerializableException: org.apache.hadoop.fs.Path >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) >>> >> >> >

Re: Task not serializable (java.io.NotSerializableException)

2014-02-11 Thread David Thomas
f FileUtil and I get the below error. Is >> there a way to get around this? >> >> org.apache.spark.SparkException: Job aborted: Task not serializable: >> java.io.NotSerializableException: org.apache.hadoop.fs.Path >> at >> org.apache.spark.sche

Re: Task not serializable (java.io.NotSerializableException)

2014-02-11 Thread Andrew Ash
aborted: Task not serializable: > java.io.NotSerializableException: org.apache.hadoop.fs.Path > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) >

Task not serializable (java.io.NotSerializableException)

2014-02-11 Thread David Thomas
I'm trying to copy a file from hdfs to a temp local directory within a map function using static method of FileUtil and I get the below error. Is there a way to get around this? org.apache.spark.SparkException: Job aborted: Task not serializable: java.io.NotSerializableExce

Re: java.io.NotSerializableException on RDD count() in Java

2013-11-07 Thread Patrick Wendell
> DoubleFlatMapFunction>() { >> > >> > @Override >> > public Iterable call(Tuple2 e) { >> > BSONObject doc = e._2(); >> > BasicDBList vals = (BasicDBList)doc.get("data"); >> > >> > List result

Re: java.io.NotSerializableException on RDD count() in Java

2013-11-03 Thread Reynold Xin
st vals = (BasicDBList)doc.get("data"); > > > > List results = new ArrayList(); > > for (int i=0; i< vals.size();i++ ) > > results.add((Double)((BasicDBList)vals.get(i)).get(0)); > > > > return results; > > > >

Re: java.io.NotSerializableException on RDD count() in Java

2013-11-03 Thread Patrick Wendell
sicDBList)vals.get(i)).get(0)); > > return results; > > } > }); > > logger.info("Take: {}", rdd2.take(100)); > logger.info("Count: {}", rdd2.count()); > > > } > > } > > > On 11/3/1

Re: java.io.NotSerializableException on RDD count() in Java

2013-11-03 Thread Yadid Ayzenberg
te: Thanks that would help. This would be consistent with there being a reference to the SparkContext itself inside of the closure. Just want to make sure that's not the case. On Sun, Nov 3, 2013 at 5:13 PM, Yadid Ayzenberg wrote: Im running in local[4] mode -

Re: java.io.NotSerializableException on RDD count() in Java

2013-11-03 Thread Patrick Wendell
es. Full stack > trace: > > > (run-main) org.apache.spark.SparkException: Job failed: > java.io.NotSerializableException: edu.mit.bsense.AnalyticsEngine > org.apache.spark.SparkException: Job failed: > java.io.NotSerializableException: edu.mit.bs

Re: java.io.NotSerializableException on RDD count() in Java

2013-11-03 Thread Yadid Ayzenberg
Im running in local[4] mode - so there are no slave machines. Full stack trace: (run-main) org.apache.spark.SparkException: Job failed: java.io.NotSerializableException: edu.mit.bsense.AnalyticsEngine org.apache.spark.SparkException: Job failed: java.io.NotSerializableException

Re: java.io.NotSerializableException on RDD count() in Java

2013-11-03 Thread Patrick Wendell
>> - Patrick >>>> >>>> On Sun, Nov 3, 2013 at 10:33 AM, Yadid Ayzenberg >>>> wrote: >>>>> >>>>> Hi All, >>>>> >>>>> My original RDD contains arrays of doubles. when

Re: java.io.NotSerializableException on RDD count() in Java

2013-11-03 Thread Yadid Ayzenberg
g exception: 19829 [run-main] INFO org.apache.spark.scheduler.DAGScheduler - Failed to run count at AnalyticsEngine.java:133 [error] (run-main) org.apache.spark.SparkException: Job failed: java.io.NotSerializableException: edu.mit.bsense.AnalyticsEngine org.apache.spark.SparkException: Job faile

Re: java.io.NotSerializableException on RDD count() in Java

2013-11-03 Thread Patrick Wendell
as expected. >>> However when I run a map on the original RDD in order to generate a new >>> RDD >>> with only the first element of each array, and try to apply count() to >>> the >>> new generated RDD I get the following exception: >>> >>> 19829

Re: java.io.NotSerializableException on RDD count() in Java

2013-11-03 Thread Yadid Ayzenberg
n: 19829 [run-main] INFO org.apache.spark.scheduler.DAGScheduler - Failed to run count at AnalyticsEngine.java:133 [error] (run-main) org.apache.spark.SparkException: Job failed: java.io.NotSerializableException: edu.mit.bsense.AnalyticsEngine org.apache.spark.SparkException: Job failed: java.io

Re: java.io.NotSerializableException on RDD count() in Java

2013-11-03 Thread Patrick Wendell
try to apply count() to the > new generated RDD I get the following exception: > > 19829 [run-main] INFO org.apache.spark.scheduler.DAGScheduler - Failed to > run count at AnalyticsEngine.java:133 > [error] (run-main) org.apache.spark.SparkException: Job failed: > jav

java.io.NotSerializableException on RDD count() in Java

2013-11-03 Thread Yadid Ayzenberg
generated RDD I get the following exception: 19829 [run-main] INFO org.apache.spark.scheduler.DAGScheduler - Failed to run count at AnalyticsEngine.java:133 [error] (run-main) org.apache.spark.SparkException: Job failed: java.io.NotSerializableException: edu.mit.bsense.AnalyticsEngine

Run with java.io.NotSerializableException

2013-09-08 Thread Xiang Huo
0: INFO [run-main] (org.apache.hadoop.mapred.FileInputFormat:199) - Total input paths to process : 1 20:50:27,760: INFO [run-main] (spark.SparkContext:31) - Starting job: foreach at webs.scala:40 [error] (run-main) spark.SparkException: Job failed: ResultTask(0, 1) failed: ExceptionFailure(j