Re: spark-shell giving me error of unread block data

Anson Abraham Wed, 19 Nov 2014 16:18:08 -0800

Sorry meant cdh 5.2 w/ spark 1.1.

On Wed, Nov 19, 2014, 17:41 Anson Abraham <anson.abra...@gmail.com> wrote:


> yeah CDH distribution (1.1).
>
> On Wed Nov 19 2014 at 5:29:39 PM Marcelo Vanzin <van...@cloudera.com>
> wrote:
>
>> On Wed, Nov 19, 2014 at 2:13 PM, Anson Abraham <anson.abra...@gmail.com>
>> wrote:
>> > yeah but in this case i'm not building any files.  just deployed out
>> config
>> > files in CDH5.2 and initiated a spark-shell to just read and output a
>> file.
>>
>> In that case it is a little bit weird. Just to be sure, you are using
>> CDH's version of Spark, not trying to run an Apache Spark release on
>> top of CDH, right? (If that's the case, then we could probably move
>> this conversation to cdh-us...@cloudera.org, since it would be
>> CDH-specific.)
>>
>>
>> > On Wed Nov 19 2014 at 4:52:51 PM Marcelo Vanzin <van...@cloudera.com>
>> wrote:
>> >>
>> >> Hi Anson,
>> >>
>> >> We've seen this error when incompatible classes are used in the driver
>> >> and executors (e.g., same class name, but the classes are different
>> >> and thus the serialized data is different). This can happen for
>> >> example if you're including some 3rd party libraries in your app's
>> >> jar, or changing the driver/executor class paths to include these
>> >> conflicting libraries.
>> >>
>> >> Can you clarify whether any of the above apply to your case?
>> >>
>> >> (For example, one easy way to trigger this is to add the
>> >> spark-examples jar shipped with CDH5.2 in the classpath of your
>> >> driver. That's one of the reasons I filed SPARK-4048, but I digress.)
>> >>
>> >>
>> >> On Tue, Nov 18, 2014 at 1:59 PM, Anson Abraham <
>> anson.abra...@gmail.com>
>> >> wrote:
>> >> > I'm essentially loading a file and saving output to another location:
>> >> >
>> >> > val source = sc.textFile("/tmp/testfile.txt")
>> >> > source.saveAsTextFile("/tmp/testsparkoutput")
>> >> >
>> >> > when i do so, i'm hitting this error:
>> >> > 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at
>> >> > <console>:15
>> >> > org.apache.spark.SparkException: Job aborted due to stage failure:
>> Task
>> >> > 0 in
>> >> > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage
>> >> > 0.0
>> >> > (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateException:
>> >> > unread
>> >> > block data
>> >> >
>> >> >
>> >> > java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(
>> ObjectInputStream.java:2421)
>> >> >
>> >> > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
>> >> >
>> >> > java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea
>> m.java:1990)
>> >> >
>> >> > java.io.ObjectInputStream.readSerialData(ObjectInputStream.
>> java:1915)
>> >> >
>> >> >
>> >> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre
>> am.java:1798)
>> >> >
>> >> > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> >> >         java.io.ObjectInputStream.readObject(ObjectInputStream.java
>> :370)
>> >> >
>> >> >
>> >> > org.apache.spark.serializer.JavaDeserializationStream.readOb
>> ject(JavaSerializer.scala:62)
>> >> >
>> >> >
>> >> > org.apache.spark.serializer.JavaSerializerInstance.deseriali
>> ze(JavaSerializer.scala:87)
>> >> >
>> >> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.
>> scala:162)
>> >> >
>> >> >
>> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1145)
>> >> >
>> >> >
>> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:615)
>> >> >         java.lang.Thread.run(Thread.java:744)
>> >> > Driver stacktrace:
>> >> > at
>> >> >
>> >> > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$sch
>> eduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
>> >> > at
>> >> >
>> >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(
>> DAGScheduler.scala:1174)
>> >> > at
>> >> >
>> >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(
>> DAGScheduler.scala:1173)
>> >> > at
>> >> >
>> >> > scala.collection.mutable.ResizableArray$class.foreach(Resiza
>> bleArray.scala:59)
>> >> > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.
>> scala:47)
>> >> > at
>> >> >
>> >> > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGSchedu
>> ler.scala:1173)
>> >> > at
>> >> >
>> >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS
>> etFailed$1.apply(DAGScheduler.scala:688)
>> >> > at
>> >> >
>> >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS
>> etFailed$1.apply(DAGScheduler.scala:688)
>> >> > at scala.Option.foreach(Option.scala:236)
>> >> > at
>> >> >
>> >> > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(
>> DAGScheduler.scala:688)
>> >> > at
>> >> >
>> >> > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$
>> anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
>> >> > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>> >> > at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>> >> > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>> >> > at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>> >> > at
>> >> >
>> >> > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(
>> AbstractDispatcher.scala:386)
>> >> > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.
>> java:260)
>> >> > at
>> >> >
>> >> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(
>> ForkJoinPool.java:1339)
>> >> > at
>> >> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPoo
>> l.java:1979)
>> >> > at
>> >> >
>> >> > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinW
>> orkerThread.java:107)
>> >> >
>> >> >
>> >> > Cant figure out what the issue is.  I'm running in CDH5.2 w/ version
>> of
>> >> > spark being 1.1.  The file i'm loading is literally just 7 MB.  I
>> >> > thought it
>> >> > was jar files mismatch, but i did a compare and see they're all
>> >> > identical.
>> >> > But seeing as how they were all installed through CDH parcels, not
>> sure
>> >> > how
>> >> > there would be version mismatch on the nodes and master.  Oh yeah 1
>> >> > master
>> >> > node w/ 2 worker nodes and running in standalone not through yarn.
>> So
>> >> > as a
>> >> > just in case, i copied the jars from the master to the 2 worker
>> nodes as
>> >> > just in case, and still same issue.
>> >> > Weird thing is, first time i installed and tested it out, it worked,
>> but
>> >> > now
>> >> > it doesn't.
>> >> >
>> >> > Any help here would be greatly appreciated.
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>>
>>
>>
>> --
>> Marcelo
>>
>

Re: spark-shell giving me error of unread block data

Reply via email to