Sorry meant cdh 5.2 w/ spark 1.1. On Wed, Nov 19, 2014, 17:41 Anson Abraham <anson.abra...@gmail.com> wrote:
> yeah CDH distribution (1.1). > > On Wed Nov 19 2014 at 5:29:39 PM Marcelo Vanzin <van...@cloudera.com> > wrote: > >> On Wed, Nov 19, 2014 at 2:13 PM, Anson Abraham <anson.abra...@gmail.com> >> wrote: >> > yeah but in this case i'm not building any files. just deployed out >> config >> > files in CDH5.2 and initiated a spark-shell to just read and output a >> file. >> >> In that case it is a little bit weird. Just to be sure, you are using >> CDH's version of Spark, not trying to run an Apache Spark release on >> top of CDH, right? (If that's the case, then we could probably move >> this conversation to cdh-us...@cloudera.org, since it would be >> CDH-specific.) >> >> >> > On Wed Nov 19 2014 at 4:52:51 PM Marcelo Vanzin <van...@cloudera.com> >> wrote: >> >> >> >> Hi Anson, >> >> >> >> We've seen this error when incompatible classes are used in the driver >> >> and executors (e.g., same class name, but the classes are different >> >> and thus the serialized data is different). This can happen for >> >> example if you're including some 3rd party libraries in your app's >> >> jar, or changing the driver/executor class paths to include these >> >> conflicting libraries. >> >> >> >> Can you clarify whether any of the above apply to your case? >> >> >> >> (For example, one easy way to trigger this is to add the >> >> spark-examples jar shipped with CDH5.2 in the classpath of your >> >> driver. That's one of the reasons I filed SPARK-4048, but I digress.) >> >> >> >> >> >> On Tue, Nov 18, 2014 at 1:59 PM, Anson Abraham < >> anson.abra...@gmail.com> >> >> wrote: >> >> > I'm essentially loading a file and saving output to another location: >> >> > >> >> > val source = sc.textFile("/tmp/testfile.txt") >> >> > source.saveAsTextFile("/tmp/testsparkoutput") >> >> > >> >> > when i do so, i'm hitting this error: >> >> > 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at >> >> > <console>:15 >> >> > org.apache.spark.SparkException: Job aborted due to stage failure: >> Task >> >> > 0 in >> >> > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage >> >> > 0.0 >> >> > (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateException: >> >> > unread >> >> > block data >> >> > >> >> > >> >> > java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode( >> ObjectInputStream.java:2421) >> >> > >> >> > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) >> >> > >> >> > java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea >> m.java:1990) >> >> > >> >> > java.io.ObjectInputStream.readSerialData(ObjectInputStream. >> java:1915) >> >> > >> >> > >> >> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre >> am.java:1798) >> >> > >> >> > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> >> > java.io.ObjectInputStream.readObject(ObjectInputStream.java >> :370) >> >> > >> >> > >> >> > org.apache.spark.serializer.JavaDeserializationStream.readOb >> ject(JavaSerializer.scala:62) >> >> > >> >> > >> >> > org.apache.spark.serializer.JavaSerializerInstance.deseriali >> ze(JavaSerializer.scala:87) >> >> > >> >> > org.apache.spark.executor.Executor$TaskRunner.run(Executor. >> scala:162) >> >> > >> >> > >> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1145) >> >> > >> >> > >> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:615) >> >> > java.lang.Thread.run(Thread.java:744) >> >> > Driver stacktrace: >> >> > at >> >> > >> >> > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$sch >> eduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) >> >> > at >> >> > >> >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( >> DAGScheduler.scala:1174) >> >> > at >> >> > >> >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( >> DAGScheduler.scala:1173) >> >> > at >> >> > >> >> > scala.collection.mutable.ResizableArray$class.foreach(Resiza >> bleArray.scala:59) >> >> > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer. >> scala:47) >> >> > at >> >> > >> >> > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGSchedu >> ler.scala:1173) >> >> > at >> >> > >> >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS >> etFailed$1.apply(DAGScheduler.scala:688) >> >> > at >> >> > >> >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS >> etFailed$1.apply(DAGScheduler.scala:688) >> >> > at scala.Option.foreach(Option.scala:236) >> >> > at >> >> > >> >> > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed( >> DAGScheduler.scala:688) >> >> > at >> >> > >> >> > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$ >> anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) >> >> > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) >> >> > at akka.actor.ActorCell.invoke(ActorCell.scala:456) >> >> > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) >> >> > at akka.dispatch.Mailbox.run(Mailbox.scala:219) >> >> > at >> >> > >> >> > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec( >> AbstractDispatcher.scala:386) >> >> > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask. >> java:260) >> >> > at >> >> > >> >> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask( >> ForkJoinPool.java:1339) >> >> > at >> >> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPoo >> l.java:1979) >> >> > at >> >> > >> >> > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinW >> orkerThread.java:107) >> >> > >> >> > >> >> > Cant figure out what the issue is. I'm running in CDH5.2 w/ version >> of >> >> > spark being 1.1. The file i'm loading is literally just 7 MB. I >> >> > thought it >> >> > was jar files mismatch, but i did a compare and see they're all >> >> > identical. >> >> > But seeing as how they were all installed through CDH parcels, not >> sure >> >> > how >> >> > there would be version mismatch on the nodes and master. Oh yeah 1 >> >> > master >> >> > node w/ 2 worker nodes and running in standalone not through yarn. >> So >> >> > as a >> >> > just in case, i copied the jars from the master to the 2 worker >> nodes as >> >> > just in case, and still same issue. >> >> > Weird thing is, first time i installed and tested it out, it worked, >> but >> >> > now >> >> > it doesn't. >> >> > >> >> > Any help here would be greatly appreciated. >> >> >> >> >> >> >> >> -- >> >> Marcelo >> >> >> >> -- >> Marcelo >> >