Hi,

I encountered a weird bytecode incompatability issue between spark-core jar 
from mvn repo and official spark prebuilt binary.

Steps to reproduce:

1.     Download the official pre-built Spark binary 1.1.1 at 
http://d3kbcqa49mib13.cloudfront.net/spark-1.1.1-bin-hadoop1.tgz

2.     Launch the Spark cluster in pseudo cluster mode

3.     A small scala APP which calls RDD.saveAsObjectFile()

scalaVersion := "2.10.4"



libraryDependencies ++= Seq(

  "org.apache.spark" %% "spark-core" % "1.1.1"

)



val sc = new SparkContext(args(0), "test") //args[0] is the Spark master URI

  val rdd = sc.parallelize(List(1, 2, 3))

  rdd.saveAsObjectFile("/tmp/mysaoftmp")

          sc.stop



throws an exception as follows:

[error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 
in stage 0.0 (TID 6, ray-desktop.sh.intel.com): java.lang.ClassCastException: 
scala.Tuple2 cannot be cast to scala.collection.Iterator

[error]         org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)

[error]         org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)

[error]         
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)

[error]         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

[error]         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

[error]         org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)

[error]         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

[error]         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

[error]         
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)

[error]         org.apache.spark.scheduler.Task.run(Task.scala:54)

[error]         
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)

[error]         
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)

[error]         
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

[error]         java.lang.Thread.run(Thread.java:701)

After investigation, I found that this is caused by bytecode incompatibility 
issue between RDD.class in spark-core_2.10-1.1.1.jar and the pre-built spark 
assembly respectively.

This issue also happens with spark 1.1.0.

Is there anything wrong in my usage of Spark? Or anything wrong in the process 
of deploying Spark module jars to maven repo?

Reply via email to