Hi, I am new to Spark and trying out with a stand-alone, 3-node (1 master, 2 workers) cluster.
>From the Web UI at the master, I see that the workers are registered. But when I try running the SparkPi example from the master node, I get the following message and then an exception. 14/07/17 01:20:36 INFO AppClient$ClientActor: Connecting to master spark://10.1.3.7:7077... 14/07/17 01:20:46 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory I searched a bit for the above warning, and found and found that others have encountered this problem before, but did not see a clear resolution except for this link: http://apache-spark-user-list.1001560.n3.nabble.com/TaskSchedulerImpl-Initial-job-has-not-accepted-any-resources-check-your-cluster-UI-to-ensure-that-woy-tt8247.html#a8444 Based on the suggestion there I tried supplying --executor-memory option to spark-submit but that did not help. Any suggestions. Here are the details of my set up. - 3 nodes (each with 4 CPU cores and 7 GB memory) - 1 node configured as Master, and the other two configured as workers - Firewall is disabled on all nodes, and network communication between the nodes is not a problem - Edited the conf/spark-env.sh on all nodes to set the following: SPARK_WORKER_CORES=3 SPARK_WORKER_MEMORY=5G - The Web UI as well as logs on master show that Workers were able to register correctly. Also the Web UI correctly shows the aggregate available memory and CPU cores on the workers: URL: spark://vmsparkwin1:7077 Workers: 2 Cores: 6 Total, 0 Used Memory: 10.0 GB Total, 0.0 B Used Applications: 0 Running, 0 Completed Drivers: 0 Running, 0 Completed Status: ALIVE I try running the SparkPi example first using the run-example (which was failing) and later directly using the spark-submit as shown below: $ export MASTER=spark://vmsparkwin1:7077 $ echo $MASTER spark://vmsparkwin1:7077 azureuser@vmsparkwin1 /cygdrive/c/opt/spark-1.0.0 $ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://10.1.3.7:7077 --executor-memory 1G --total-executor-cores 2 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 10 The following is the full screen output: 14/07/17 01:20:13 INFO SecurityManager: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/07/17 01:20:13 INFO SecurityManager: Changing view acls to: azureuser 14/07/17 01:20:13 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(azureuser) 14/07/17 01:20:14 INFO Slf4jLogger: Slf4jLogger started 14/07/17 01:20:14 INFO Remoting: Starting remoting 14/07/17 01:20:14 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sp...@vmsparkwin1.cssparkwin.b1.internal.cloudapp.net:49839] 14/07/17 01:20:14 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sp...@vmsparkwin1.cssparkwin.b1.internal.cloudapp.net:49839] 14/07/17 01:20:14 INFO SparkEnv: Registering MapOutputTracker 14/07/17 01:20:14 INFO SparkEnv: Registering BlockManagerMaster 14/07/17 01:20:14 INFO DiskBlockManager: Created local directory at C:\cygwin\tmp\spark-local-20140717012014-b606 14/07/17 01:20:14 INFO MemoryStore: MemoryStore started with capacity 294.9 MB. 14/07/17 01:20:14 INFO ConnectionManager: Bound socket to port 49842 with id = ConnectionManagerId(vmsparkwin1.cssparkwin.b1.internal.cloudapp.net,49842) 14/07/17 01:20:14 INFO BlockManagerMaster: Trying to register BlockManager 14/07/17 01:20:14 INFO BlockManagerInfo: Registering block manager vmsparkwin1.cssparkwin.b1.internal.cloudapp.net:49842 with 294.9 MB RAM 14/07/17 01:20:14 INFO BlockManagerMaster: Registered BlockManager 14/07/17 01:20:14 INFO HttpServer: Starting HTTP Server 14/07/17 01:20:14 INFO HttpBroadcast: Broadcast server started at http://10.1.3.7:49843 14/07/17 01:20:14 INFO HttpFileServer: HTTP File server directory is C:\cygwin\tmp\spark-6a076e92-53bb-4c7a-9e27-ce53a818146d 14/07/17 01:20:14 INFO HttpServer: Starting HTTP Server 14/07/17 01:20:15 INFO SparkUI: Started SparkUI at http://vmsparkwin1.cssparkwin.b1.internal.cloudapp.net:4040 14/07/17 01:20:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/17 01:20:16 INFO SparkContext: Added JAR file:/C:/opt/spark-1.0.0/./lib/spark-examples-1.0.0-hadoop2.2.0.jar at http://10.1.3.7:49844/jars/spark-examples-1.0.0-hadoop2.2.0.jar with timestamp 1405560016316 14/07/17 01:20:16 INFO AppClient$ClientActor: Connecting to master spark://10.1.3.7:7077... 14/07/17 01:20:16 INFO SparkContext: Starting job: reduce at SparkPi.scala:35 14/07/17 01:20:16 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:35) with 10 output partitions (allowLocal=false) 14/07/17 01:20:16 INFO DAGScheduler: Final stage: Stage 0(reduce at SparkPi.scala:35) 14/07/17 01:20:16 INFO DAGScheduler: Parents of final stage: List() 14/07/17 01:20:16 INFO DAGScheduler: Missing parents: List() 14/07/17 01:20:16 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at SparkPi.scala:31), which has no missing parents 14/07/17 01:20:16 INFO DAGScheduler: Submitting 10 missing tasks from Stage 0 (MappedRDD[1] at map at SparkPi.scala:31) 14/07/17 01:20:16 INFO TaskSchedulerImpl: Adding task set 0.0 with 10 tasks 14/07/17 01:20:31 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 14/07/17 01:20:36 INFO AppClient$ClientActor: Connecting to master spark://10.1.3.7:7077... 14/07/17 01:20:46 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 14/07/17 01:20:56 INFO AppClient$ClientActor: Connecting to master spark://10.1.3.7:7077... 14/07/17 01:21:01 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 14/07/17 01:21:16 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up. 14/07/17 01:21:16 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/07/17 01:21:16 INFO TaskSchedulerImpl: Cancelling stage 0 14/07/17 01:21:16 INFO DAGScheduler: Failed to run reduce at SparkPi.scala:35 Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: All masters are unresponsive! Giving up. at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-with-spark-submit-formatting-corrected-tp10102.html Sent from the Apache Spark User List mailing list archive at Nabble.com.