I run my Spark over Mesos by either running spark submit in a Docker container using Marathon or from one of the node in mesos cluster. I am on mesos 0.21. I have tried both spark 1.3.1 and 1.2.1 with rebuild of hadoop 2.4 and above.
Some details on the configuration: I made sure that spark is using ip addresses for all communication by defining spark.driver.host, SPARK_PUBLIC_DNS, SPARK_LOCAL_IP, SPARK_LOCAL_HOST in the right place. Hope this help. Yang. On Fri, Apr 24, 2015 at 5:15 PM, Stephen Carman <scar...@coldlight.com> wrote: > So I can’t for the life of me to get something even simple working for > Spark on Mesos. > > I installed a 3 master, 3 slave mesos cluster, which is all configured, > but I can’t for the life of me even get the spark shell to work properly. > > I get errors like this > org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 > in stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage > 0.0 (TID 23, 10.253.1.117): ExecutorLostFailure (executor > 20150424-104711-1375862026-5050-20113-S1 lost) > Driver stacktrace: > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > > I tried both mesos 0.21 and 0.22 and they both produce the same error… > > My version of spark is 1.3.1 with hadoop 2.6, I just downloaded the > pre-build from the site, or is that wrong and i have to build it myself? > > I have my mesos_native_java_library, spark executor URI and mesos master > set in my spark-env.sh, they to the best of my abilities seem correct. > > Does anyone have any insight into this at all? I’m running this on red hat > 7 with 8 CPU cores and 14gb of ram per slave, so 24 cores total and 42gb of > ram total. > > Anyone have any idea at all what is going on here? > > Thanks, > Steve > This e-mail is intended solely for the above-mentioned recipient and it > may contain confidential or privileged information. If you have received it > in error, please notify us immediately and delete the e-mail. You must not > copy, distribute, disclose or take any action in reliance on it. In > addition, the contents of an attachment to this e-mail may contain software > viruses which could damage your own computer system. While ColdLight > Solutions, LLC has taken every reasonable precaution to minimize this risk, > we cannot accept liability for any damage which you sustain as a result of > software viruses. You should perform your own virus checks before opening > the attachment. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >