I'm running Spark on Mesos in coarse grain mode and experiencing some serious issues when trying to run an application on spark 1.6.1 we have no issues running the same app on spark 1.5.1 (we're trying to migrate to 1.6.1)
I'm running the mesos-external-shuffle service on all my slaves. My command args look like the following: /opt/spark-1.6.1/bin/spark-submit --master "mesos://zk://prod-zookeeper-1:2181 ,prod-zookeeper-2:2181,prod-zookeeper-3:2181/mesos" \--conf spark.ui.port=31232 \ \--conf spark.mesos.coarse=true \ \--conf spark.mesos.constraints="rack:spark" \ \--conf spark.shuffle.service.enabled=true \ \--conf spark.dynamicAllocation.enabled=false \ \--conf spark.mesos.executor.memoryOverhead=4500 \ \--conf spark.shuffle.io.connectionTimeout=3600s \ \--class com.orchard.dataloader.library.originators.prosper.LoadTrade_Prosper \ \--total-executor-cores 48 \ \--driver-memory 14G \ \--executor-memory 15G \ \--jars config.jar target/scala-2.11/dataloader-library- a65139092664f386c317b3e5908bf009015477a2-assembled.jar After starting the job after about 30 or so minutes during the final 3 stages I start to see the following exceptions thrown: Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Executor is not registered (appId=f3c06869-9e5f-429c-bfc3-713c8f475064-34075, execId=f3c06869-9e5f-429c-bfc3-713c8f475064-S20) 6/04/07 04:01:06 INFO DAGScheduler: Job 2 failed: first at Table.scala:49, took 966.989764 s Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: ShuffleMapStage 11 (first at Table.scala:49) has failed the maximum allowable number of times: 4. Most recent failure reason: org.apache.spark.shuffle.FetchFailedException: java.lang.RuntimeException: Executor is not registered (appId=f3c06869-9e5f-429c-bfc3-713c8f475064-34075, execId=f3c06869-9e5f-429c-bfc3-713c8f475064-S20) I've tried modifying a lot of the default settings thinking the executors are being lost because of GC timeouts and heart beats etc.. The following job was ran with the following defaults which still made no difference spark.executor.extraJavaOptions -Duser.timezone=UTC spark.driver.extraJavaOptions -Duser.timezone=UTC spark.akka.timeout 300s spark.network.timeout 300s spark.core.connection.ack.wait.timeout 300s spark.executor.heartbeatInterval 300s spark.files.fetchTimeout 120s spark.shuffle.service.port 31338 spark.shuffle.compress true spark.shuffle.file.buffer 128k spark.shuffle.io.maxRetries 5 spark.shuffle.io.numConnectionsPerPeer 3 spark.shuffle.service.enabled true spark.files.fetchTimeout 120s spark.akka.timeout 250s spark.dynamicAllocation.enabled true Anyone other suggestions at this point? I'm not sure what else to do at this point. \-- **Rodrick Brown** / Senior Systems Engineer +1 917 445 6839 / [rodr...@orchardplatform.com](mailto:char...@orchardplatform.com) **Orchard Platform** 101 5th Avenue, 4th Floor, New York, NY 10003 [http://www.orchardplatform.com](http://www.orchardplatform.com/) [Orchard Blog](http://www.orchardplatform.com/blog/) | [Marketplace Lending Meetup](http://www.meetup.com/Peer-to-Peer-Lending-P2P/) -- *NOTICE TO RECIPIENTS*: This communication is confidential and intended for the use of the addressee only. If you are not an intended recipient of this communication, please delete it immediately and notify the sender by return email. Unauthorized reading, dissemination, distribution or copying of this communication is prohibited. This communication does not constitute an offer to sell or a solicitation of an indication of interest to purchase any loan, security or any other financial product or instrument, nor is it an offer to sell or a solicitation of an indication of interest to purchase any products or services to any persons who are prohibited from receiving such information under applicable law. The contents of this communication may not be accurate or complete and are subject to change without notice. As such, Orchard App, Inc. (including its subsidiaries and affiliates, "Orchard") makes no representation regarding the accuracy or completeness of the information contained herein. The intended recipient is advised to consult its own professional advisors, including those specializing in legal, tax and accounting matters. Orchard does not provide legal, tax or accounting advice.