@Rui do you mean the spark-core jar in the maven central repo are incompatible with the same version of the the official pre-built Spark binary? That's really weird. I thought they should have used the same codes.
Best Regards, Shixiong Zhu 2014-12-18 17:22 GMT+08:00 Sean Owen <so...@cloudera.com>: > > Well, it's always a good idea to used matched binary versions. Here it > is more acutely necessary. You can use a pre built binary -- if you > use it to compile and also run. Why does it not make sense to publish > artifacts? > > Not sure what you mean about core vs assembly, as the assembly > contains all of the modules. You don't literally need the same jar > file. > > On Thu, Dec 18, 2014 at 3:20 AM, Sun, Rui <rui....@intel.com> wrote: > > Not using spark-submit. The App directly communicates with the Spark > cluster > > in standalone mode. > > > > > > > > If mark the Spark dependency as 'provided’, then the spark-core .jar > > elsewhere must be pointe to in CLASSPATH. However, the pre-built Spark > > binary only has an assembly jar, not having individual module jars. So > you > > don’t have a chance to point to a module.jar which is the same binary as > > that in the pre-built Spark binary. > > > > > > > > Maybe the Spark distribution should contain not only the assembly jar but > > also individual module jars. Any opinion? > > > > > > > > From: Shivaram Venkataraman [mailto:shiva...@eecs.berkeley.edu] > > Sent: Thursday, December 18, 2014 2:20 AM > > To: Sean Owen > > Cc: Sun, Rui; user@spark.apache.org > > Subject: Re: weird bytecode incompatability issue between spark-core jar > > from mvn repo and official spark prebuilt binary > > > > > > > > Just to clarify, are you running the application using spark-submit after > > packaging with sbt package ? One thing that might help is to mark the > Spark > > dependency as 'provided' as then you shouldn't have the Spark classes in > > your jar. > > > > > > > > Thanks > > > > Shivaram > > > > > > > > On Wed, Dec 17, 2014 at 4:39 AM, Sean Owen <so...@cloudera.com> wrote: > > > > You should use the same binaries everywhere. The problem here is that > > anonymous functions get compiled to different names when you build > > different (potentially) so you actually have one function being called > > when another function is meant. > > > > > > On Wed, Dec 17, 2014 at 12:07 PM, Sun, Rui <rui....@intel.com> wrote: > >> Hi, > >> > >> > >> > >> I encountered a weird bytecode incompatability issue between spark-core > >> jar > >> from mvn repo and official spark prebuilt binary. > >> > >> > >> > >> Steps to reproduce: > >> > >> 1. Download the official pre-built Spark binary 1.1.1 at > >> http://d3kbcqa49mib13.cloudfront.net/spark-1.1.1-bin-hadoop1.tgz > >> > >> 2. Launch the Spark cluster in pseudo cluster mode > >> > >> 3. A small scala APP which calls RDD.saveAsObjectFile() > >> > >> scalaVersion := "2.10.4" > >> > >> > >> > >> libraryDependencies ++= Seq( > >> > >> "org.apache.spark" %% "spark-core" % "1.1.1" > >> > >> ) > >> > >> > >> > >> val sc = new SparkContext(args(0), "test") //args[0] is the Spark master > >> URI > >> > >> val rdd = sc.parallelize(List(1, 2, 3)) > >> > >> rdd.saveAsObjectFile("/tmp/mysaoftmp") > >> > >> sc.stop > >> > >> > >> > >> throws an exception as follows: > >> > >> [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to > >> stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: > >> Lost > >> task 1.3 in stage 0.0 (TID 6, ray-desktop.sh.intel.com): > >> java.lang.ClassCastException: scala.Tuple2 cannot be cast to > >> scala.collection.Iterator > >> > >> [error] > org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > >> > >> [error] > org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > >> > >> [error] > >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > >> > >> [error] > >> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > >> > >> [error] org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > >> > >> [error] > org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > >> > >> [error] > >> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > >> > >> [error] org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > >> > >> [error] > >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) > >> > >> [error] org.apache.spark.scheduler.Task.run(Task.scala:54) > >> > >> [error] > >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) > >> > >> [error] > >> > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) > >> > >> [error] > >> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >> > >> [error] java.lang.Thread.run(Thread.java:701) > >> > >> > >> > >> After investigation, I found that this is caused by bytecode > >> incompatibility > >> issue between RDD.class in spark-core_2.10-1.1.1.jar and the pre-built > >> spark > >> assembly respectively. > >> > >> > >> > >> This issue also happens with spark 1.1.0. > >> > >> > >> > >> Is there anything wrong in my usage of Spark? Or anything wrong in the > >> process of deploying Spark module jars to maven repo? > >> > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >