You should use the same binaries everywhere. The problem here is that anonymous functions get compiled to different names when you build different (potentially) so you actually have one function being called when another function is meant.
On Wed, Dec 17, 2014 at 12:07 PM, Sun, Rui <rui....@intel.com> wrote: > Hi, > > > > I encountered a weird bytecode incompatability issue between spark-core jar > from mvn repo and official spark prebuilt binary. > > > > Steps to reproduce: > > 1. Download the official pre-built Spark binary 1.1.1 at > http://d3kbcqa49mib13.cloudfront.net/spark-1.1.1-bin-hadoop1.tgz > > 2. Launch the Spark cluster in pseudo cluster mode > > 3. A small scala APP which calls RDD.saveAsObjectFile() > > scalaVersion := "2.10.4" > > > > libraryDependencies ++= Seq( > > "org.apache.spark" %% "spark-core" % "1.1.1" > > ) > > > > val sc = new SparkContext(args(0), "test") //args[0] is the Spark master URI > > val rdd = sc.parallelize(List(1, 2, 3)) > > rdd.saveAsObjectFile("/tmp/mysaoftmp") > > sc.stop > > > > throws an exception as follows: > > [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to > stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost > task 1.3 in stage 0.0 (TID 6, ray-desktop.sh.intel.com): > java.lang.ClassCastException: scala.Tuple2 cannot be cast to > scala.collection.Iterator > > [error] org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > > [error] org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > > [error] > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > > [error] > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > > [error] org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > [error] org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > > [error] > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > > [error] org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > [error] > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) > > [error] org.apache.spark.scheduler.Task.run(Task.scala:54) > > [error] > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) > > [error] > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) > > [error] > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > [error] java.lang.Thread.run(Thread.java:701) > > > > After investigation, I found that this is caused by bytecode incompatibility > issue between RDD.class in spark-core_2.10-1.1.1.jar and the pre-built spark > assembly respectively. > > > > This issue also happens with spark 1.1.0. > > > > Is there anything wrong in my usage of Spark? Or anything wrong in the > process of deploying Spark module jars to maven repo? > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org