Re: spark-shell giving me error of unread block data

Anson Abraham Tue, 18 Nov 2014 14:29:02 -0800

when cdh cluster was running, i did not set up spark role.  When I did for
the first time, it was working ie, the same load of test file gave me
output.  But in this case, how can there be different versions?  This is
all done through cloudera manager parcels  how does one find out version
installed?  I did do an rsync from master to the worker nodes, and that did
not help me much.   And we're talking about the


spark-assembly jar files correct?  or is there another set of jar files i
should be checking for?

On Tue Nov 18 2014 at 5:16:57 PM Ritesh Kumar Singh <
riteshoneinamill...@gmail.com> wrote:

> It can be a serialization issue.
> Happens when there are different versions installed on the same system.
> What do you mean by the first time you installed and tested it out?
>
> On Wed, Nov 19, 2014 at 3:29 AM, Anson Abraham <anson.abra...@gmail.com>
> wrote:
>
>> I'm essentially loading a file and saving output to another location:
>>
>> val source = sc.textFile("/tmp/testfile.txt")
>> source.saveAsTextFile("/tmp/testsparkoutput")
>>
>> when i do so, i'm hitting this error:
>> 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at
>> <console>:15
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
>> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage
>> 0.0 (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateException:
>> unread block data
>>
>> java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
>>         java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
>>
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>         java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>         java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
>>
>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
>>
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:162)
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         java.lang.Thread.run(Thread.java:744)
>> Driver stacktrace:
>> at org.apache.spark.scheduler.DAGScheduler.org
>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
>> at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> at
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
>> at scala.Option.foreach(Option.scala:236)
>> at
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
>> at
>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>> at
>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> at
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>
>>
>> Cant figure out what the issue is.  I'm running in CDH5.2 w/ version of
>> spark being 1.1.  The file i'm loading is literally just 7 MB.  I thought
>> it was jar files mismatch, but i did a compare and see they're all
>> identical.  But seeing as how they were all installed through CDH parcels,
>> not sure how there would be version mismatch on the nodes and master.  Oh
>> yeah 1 master node w/ 2 worker nodes and running in standalone not through
>> yarn.  So as a just in case, i copied the jars from the master to the 2
>> worker nodes as just in case, and still same issue.
>> Weird thing is, first time i installed and tested it out, it worked, but
>> now it doesn't.
>>
>> Any help here would be greatly appreciated.
>>
>
>

Re: spark-shell giving me error of unread block data

Reply via email to