Do you expect to be able to use the spark context on the remote task?

If you do, that won't work. You'll need to rethink what it is you're
trying to do, since SparkContext is not serializable and it doesn't
make sense to make it so. If you don't, you could mark the field as
@transient.

But the two examples you posted shouldn't be creating a reference to
the "aaa" variable in the serialized task. You could use
-Dsun.io.serialization.extendedDebugInfo=true to debug these things.


On Mon, Nov 24, 2014 at 10:15 AM, aecc <alessandroa...@gmail.com> wrote:
> Hello guys,
>
> I'm using Spark 1.0.0 and Kryo serialization
> In the Spark Shell, when I create a class that contains as an attribute the
> SparkContext, in this way:
>
> class AAA(val s: SparkContext) { }
> val aaa = new AAA(sc)
>
> and I execute any action using that attribute like:
>
> val myNumber = 5
> aaa.s.textFile("FILE").filter(_ == myNumber.toString).count
> or
> aaa.s.parallelize(1 to 10).filter(_ == myNumber).count
>
> Returns a NonSerializibleException:
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task not
> serializable: java.io.NotSerializableException:
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$AAA
>         at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
>         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
>         at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:770)
>         at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:713)
>         at
> org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:697)
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1176)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>         at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
> Any thoughts about how to solve this issue and how can I give a workaround
> to it? I'm actually developing an Api that will need the usage of this
> SparkContext several times in different locations, so I will needed to be
> accessible.
>
> Thanks a lot for the cooperation
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Context-as-an-attribute-of-a-class-cannot-be-used-tp19668.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to