See this answer by Josh http://stackoverflow.com/questions/26692658/cant-connect-from-application-to-the-standalone-cluster
You may also find this post useful http://mail-archives.apache.org/mod_mbox/spark-user/201407.mbox/%3c7a889b1c-aa14-4cf2-8375-37f9cf827...@gmail.com%3E Thanks Best Regards On Wed, Feb 11, 2015 at 10:11 AM, lakewood <pxy0...@gmail.com> wrote: > Hi, > > I'm new to Spark. I have built small spark on yarn cluster, which contains > 1 master(20GB RAM, 8 core), 3 worker(4GB RAM, 4 core). When trying to run a > command sc.parallelize(1 to 1000).count() through > $SPARK_HOME/bin/spark-shell, sometimes the command can submit a job > successfully, sometimes it is failure with following exception. > > I can definitely make sure the three workers are registered to master > after checking out spark webui. There are spark memory-related parameters > to be configured in spark-env.sh file, for instance, > SPARK_EXECUTOR_MEMORY=2G, SPARK_DRIVER_MEMORY=1G, SPARK_WORKER_MEMORY=4G. > > Would anyone help to give me hint how to resolve this issue? I have not > give any hint after google search. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *# bin/spark-shellSpark assembly has been built with Hive, including > Datanucleus jars on classpath15/02/11 12:21:39 INFO SecurityManager: > Changing view acls to: root,15/02/11 12:21:39 INFO SecurityManager: > Changing modify acls to: root,15/02/11 12:21:39 INFO SecurityManager: > SecurityManager: authentication disabled; ui acls disabled; users with view > permissions: Set(root, ); users with modify permissions: Set(root, > )15/02/11 12:21:39 INFO HttpServer: Starting HTTP Server15/02/11 12:21:39 > INFO Utils: Successfully started service 'HTTP class server' on port > 28968.Welcome to ____ __ / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.1.0 > /_/Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.6.0_24)Type > in expressions to have them evaluated.Type :help for more > information.15/02/11 12:21:43 INFO SecurityManager: Changing view acls to: > root,15/02/11 12:21:43 INFO SecurityManager: Changing modify acls to: > root,15/02/11 12:21:43 INFO SecurityManager: SecurityManager: > authentication disabled; ui acls disabled; users with view permissions: > Set(root, ); users with modify permissions: Set(root, )15/02/11 12:21:44 > INFO Slf4jLogger: Slf4jLogger started15/02/11 12:21:44 INFO Remoting: > Starting remoting15/02/11 12:21:44 INFO Remoting: Remoting started; > listening on addresses :[akka.tcp://sparkDriver@xpan-biqa1:6862]15/02/11 > 12:21:44 INFO Remoting: Remoting now listens on addresses: > [akka.tcp://sparkDriver@xpan-biqa1:6862]15/02/11 12:21:44 INFO Utils: > Successfully started service 'sparkDriver' on port 6862.15/02/11 12:21:44 > INFO SparkEnv: Registering MapOutputTracker15/02/11 12:21:44 INFO SparkEnv: > Registering BlockManagerMaster15/02/11 12:21:44 INFO DiskBlockManager: > Created local directory at /tmp/spark-local-20150211122144-ed2615/02/11 > 12:21:44 INFO Utils: Successfully started service 'Connection manager for > block manager' on port 40502.15/02/11 12:21:44 INFO ConnectionManager: > Bound socket to port 40502 with id = > ConnectionManagerId(xpan-biqa1,40502)15/02/11 12:21:44 INFO MemoryStore: > MemoryStore started with capacity 265.0 MB15/02/11 12:21:44 INFO > BlockManagerMaster: Trying to register BlockManager15/02/11 12:21:44 INFO > BlockManagerMasterActor: Registering block manager xpan-biqa1:40502 with > 265.0 MB RAM15/02/11 12:21:44 INFO BlockManagerMaster: Registered > BlockManager15/02/11 12:21:44 INFO HttpFileServer: HTTP File server > directory is /tmp/spark-0a80ce6b-6a05-4163-a97d-07753f627ec815/02/11 > 12:21:44 INFO HttpServer: Starting HTTP Server15/02/11 12:21:44 INFO Utils: > Successfully started service 'HTTP file server' on port 25939.15/02/11 > 12:21:44 INFO Utils: Successfully started service 'SparkUI' on port > 4040.15/02/11 12:21:44 INFO SparkUI: Started SparkUI at > http://xpan-biqa1:4040 <http://xpan-biqa1:4040>15/02/11 12:21:45 WARN > NativeCodeLoader: Unable to load native-hadoop library for your platform... > using builtin-java classes where applicable15/02/11 12:21:46 INFO > EventLoggingListener: Logging events to > hdfs://xpan-biqa1:7020/spark/spark-shell-142362850543115/02/11 12:21:46 > INFO AppClient$ClientActor: Connecting to master > spark://xpan-biqa1:7077...15/02/11 12:21:46 INFO > SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling > beginning after reached minRegisteredResourcesRatio: 0.015/02/11 12:21:46 > INFO SparkILoop: Created spark context..Spark context available as > sc.scala> 15/02/11 12:22:06 INFO AppClient$ClientActor: Connecting to > master spark://xpan-biqa1:7077...scala> sc.parallelize(1 to > 1000).count()15/02/11 12:22:24 INFO SparkContext: Starting job: count at > <console>:1315/02/11 12:22:24 INFO DAGScheduler: Got job 0 (count at > <console>:13) with 2 output partitions (allowLocal=false)15/02/11 12:22:24 > INFO DAGScheduler: Final stage: Stage 0(count at <console>:13)15/02/11 > 12:22:24 INFO DAGScheduler: Parents of final stage: List()15/02/11 12:22:24 > INFO DAGScheduler: Missing parents: List()15/02/11 12:22:24 INFO > DAGScheduler: Submitting Stage 0 (ParallelCollectionRDD[0] at parallelize > at <console>:13), which has no missing parents15/02/11 12:22:24 INFO > MemoryStore: ensureFreeSpace(1088) called with curMem=0, > maxMem=27784249315/02/11 12:22:24 INFO MemoryStore: Block broadcast_0 > stored as values in memory (estimated size 1088.0 B, free 265.0 MB)15/02/11 > 12:22:24 INFO MemoryStore: ensureFreeSpace(800) called with curMem=1088, > maxMem=27784249315/02/11 12:22:24 INFO MemoryStore: Block > broadcast_0_piece0 stored as bytes in memory (estimated size 800.0 B, free > 265.0 MB)15/02/11 12:22:24 INFO BlockManagerInfo: Added broadcast_0_piece0 > in memory on xpan-biqa1:40502 (size: 800.0 B, free: 265.0 MB)15/02/11 > 12:22:24 INFO BlockManagerMaster: Updated info of block > broadcast_0_piece015/02/11 12:22:24 INFO DAGScheduler: Submitting 2 missing > tasks from Stage 0 (ParallelCollectionRDD[0] at parallelize at > <console>:13)15/02/11 12:22:24 INFO TaskSchedulerImpl: Adding task set 0.0 > with 2 tasks15/02/11 12:22:26 INFO AppClient$ClientActor: Connecting to > master spark://xpan-biqa1:7077...15/02/11 12:22:39 WARN TaskSchedulerImpl: > Initial job has not accepted any resources; check your cluster UI to ensure > that workers are registered and have sufficient memory15/02/11 12:22:46 > ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All > masters are unresponsive! Giving up.15/02/11 12:22:46 INFO > TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, > from pool15/02/11 12:22:46 INFO TaskSchedulerImpl: Cancelling stage > 015/02/11 12:22:46 INFO DAGScheduler: Failed to run count at > <console>:1315/02/11 12:22:46 INFO SparkUI: Stopped Spark web UI at > http://xpan-biqa1:4040 <http://xpan-biqa1:4040>15/02/11 12:22:46 INFO > DAGScheduler: Stopping DAGScheduler15/02/11 12:22:46 INFO > SparkDeploySchedulerBackend: Shutting down all executors15/02/11 12:22:46 > INFO SparkDeploySchedulerBackend: Asking each executor to shut > downorg.apache.spark.SparkException: Job aborted due to stage failure: All > masters are unresponsive! Giving up. at > org.apache.spark.scheduler.DAGScheduler.org > <http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) > at scala.Option.foreach(Option.scala:236) at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at > akka.actor.ActorCell.invoke(ActorCell.scala:456) at > akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at > akka.dispatch.Mailbox.run(Mailbox.scala:219) at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)scala> > 15/02/11 12:22:47 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor > stopped!* > > > Regards, > Ryan >