Hi All, I'm testing a very simple code and it is running ok in spark-shell(both local and cluster standalone mode). I put this code also into a job and again it is running without a problem with master set to local but when I'm trying to run it on cluster (standalone mode) it seems to be starting and then quits with the error message:
[info] Set current project to Simple Project (in build file:/opt/spark-jobs/) [info] Running SimpleJob 13/09/02 12:15:42 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started 13/09/02 12:15:42 INFO spark.SparkEnv: Registering BlockManagerMaster 13/09/02 12:15:42 INFO storage.MemoryStore: MemoryStore started with capacity 971.5 MB. 13/09/02 12:15:42 INFO storage.DiskStore: Created local directory at /tmp/spark-local-20130902121542-c534 13/09/02 12:15:42 INFO network.ConnectionManager: Bound socket to port 33575 with id = ConnectionManagerId(hadoop-jobtracker001,33575) 13/09/02 12:15:43 INFO storage.BlockManagerMaster: Trying to register BlockManager 13/09/02 12:15:43 INFO storage.BlockManagerMaster: Registered BlockManager 13/09/02 12:15:43 INFO server.Server: jetty-7.6.8.v20121106 13/09/02 12:15:43 INFO server.AbstractConnector: Started [email protected]:48410 13/09/02 12:15:43 INFO broadcast.HttpBroadcast: Broadcast server started at http://10.10.10.21:48410 13/09/02 12:15:43 INFO spark.SparkEnv: Registering MapOutputTracker 13/09/02 12:15:43 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-317014cd-55bc-4ed6-a014-b990e4eb50ba 13/09/02 12:15:43 INFO server.Server: jetty-7.6.8.v20121106 13/09/02 12:15:43 INFO server.AbstractConnector: Started [email protected]:50734 13/09/02 12:15:43 INFO io.IoWorker: IoWorker thread 'spray-io-worker-0' started 13/09/02 12:15:43 INFO server.HttpServer: akka://spark/user/BlockManagerHTTPServer started on /0.0.0.0:40274 13/09/02 12:15:43 INFO storage.BlockManagerUI: Started BlockManager web UI at http://hadoop-jobtracker001:40274 13/09/02 12:15:43 INFO spark.SparkContext: Added JAR target/scala-2.9.3/simple-project_2.9.3-1.0.jar at http://10.10.10.21:50734/jars/simple-project_2.9.3-1.0.jar with timestamp 1378124143900 13/09/02 12:15:44 INFO client.Client$ClientActor: Connecting to master spark://hadoop-name001:7077 13/09/02 12:15:44 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20130902121544-0013 13/09/02 12:15:44 INFO client.Client$ClientActor: Executor added: app-20130902121544-0013/0 on worker-20130902105957-hadoop-task-data010-53221 (hadoop-task-data010) with 4 cores 13/09/02 12:15:44 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20130902121544-0013/0 on host hadoop-task-data010 with 4 cores, 512.0 MB RAM 13/09/02 12:15:44 INFO client.Client$ClientActor: Executor added: app-20130902121544-0013/1 on worker-20130902105957-hadoop-task-data008-39605 (hadoop-task-data008) with 4 cores 13/09/02 12:15:44 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20130902121544-0013/1 on host hadoop-task-data008 with 4 cores, 512.0 MB RAM 13/09/02 12:15:44 INFO client.Client$ClientActor: Executor added: app-20130902121544-0013/2 on worker-20130902105957-hadoop-task-data003-36917 (hadoop-task-data003) with 4 cores 13/09/02 12:15:44 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20130902121544-0013/2 on host hadoop-task-data003 with 4 cores, 512.0 MB RAM 13/09/02 12:15:44 INFO client.Client$ClientActor: Executor added: app-20130902121544-0013/3 on worker-20130902105957-hadoop-task-data009-51272 (hadoop-task-data009) with 4 cores 13/09/02 12:15:44 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20130902121544-0013/3 on host hadoop-task-data009 with 4 cores, 512.0 MB RAM 13/09/02 12:15:44 INFO client.Client$ClientActor: Executor added: app-20130902121544-0013/4 on worker-20130902105957-hadoop-task-data002-37491 (hadoop-task-data002) with 4 cores 13/09/02 12:15:44 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20130902121544-0013/4 on host hadoop-task-data002 with 4 cores, 512.0 MB RAM 13/09/02 12:15:44 INFO client.Client$ClientActor: Executor added: app-20130902121544-0013/5 on worker-20130902105957-hadoop-task-data011-47945 (hadoop-task-data011) with 4 cores 13/09/02 12:15:44 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20130902121544-0013/5 on host hadoop-task-data011 with 4 cores, 512.0 MB RAM 13/09/02 12:15:44 INFO client.Client$ClientActor: Executor added: app-20130902121544-0013/6 on worker-20130902105957-hadoop-task-data004-41650 (hadoop-task-data004) with 4 cores 13/09/02 12:15:44 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20130902121544-0013/6 on host hadoop-task-data004 with 4 cores, 512.0 MB RAM 13/09/02 12:15:44 INFO client.Client$ClientActor: Executor added: app-20130902121544-0013/7 on worker-20130902105957-hadoop-task-data007-40802 (hadoop-task-data007) with 4 cores 13/09/02 12:15:44 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20130902121544-0013/7 on host hadoop-task-data007 with 4 cores, 512.0 MB RAM 13/09/02 12:15:44 INFO client.Client$ClientActor: Executor added: app-20130902121544-0013/8 on worker-20130902105957-hadoop-task-data006-55127 (hadoop-task-data006) with 4 cores 13/09/02 12:15:44 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20130902121544-0013/8 on host hadoop-task-data006 with 4 cores, 512.0 MB RAM 13/09/02 12:15:44 INFO client.Client$ClientActor: Executor added: app-20130902121544-0013/9 on worker-20130902105957-hadoop-task-data005-38800 (hadoop-task-data005) with 4 cores 13/09/02 12:15:44 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20130902121544-0013/9 on host hadoop-task-data005 with 4 cores, 512.0 MB RAM 13/09/02 12:15:44 INFO client.Client$ClientActor: Executor added: app-20130902121544-0013/10 on worker-20130902105957-hadoop-task-data001-56057 (hadoop-task-data001) with 4 cores 13/09/02 12:15:44 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20130902121544-0013/10 on host hadoop-task-data001 with 4 cores, 512.0 MB RAM 13/09/02 12:15:44 ERROR client.Client$ClientActor: Connection to master failed; stopping client 13/09/02 12:15:44 ERROR cluster.SparkDeploySchedulerBackend: Disconnected from Spark cluster! 13/09/02 12:15:44 ERROR cluster.ClusterScheduler: Exiting due to error from cluster scheduler: Disconnected from Spark cluster 13/09/02 12:15:44 ERROR client.Client$ClientActor: Connection to master failed; stopping client 13/09/02 12:15:44 ERROR cluster.SparkDeploySchedulerBackend: Disconnected from Spark cluster! 13/09/02 12:15:44 ERROR cluster.ClusterScheduler: Exiting due to error from cluster scheduler: Disconnected from Spark cluster 13/09/02 12:15:44 ERROR actor.ActorSystemImpl: Uncaught error from thread [spark-akka.actor.default-dispatcher-3] 13/09/02 12:15:44 ERROR actor.ActorSystemImpl: Uncaught error from thread [spark-akka.actor.default-dispatcher-4] 13/09/02 12:15:44 ERROR client.Client$ClientActor: Master removed our application: FINISHED; stopping client 13/09/02 12:15:44 ERROR cluster.SparkDeploySchedulerBackend: Disconnected from Spark cluster! 13/09/02 12:15:44 ERROR cluster.ClusterScheduler: Exiting due to error from cluster scheduler: Disconnected from Spark cluster 13/09/02 12:15:44 ERROR client.Client$ClientActor: Connection to master failed; stopping client 13/09/02 12:15:44 ERROR cluster.SparkDeploySchedulerBackend: Disconnected from Spark cluster! 13/09/02 12:15:44 ERROR cluster.ClusterScheduler: Exiting due to error from cluster scheduler: Disconnected from Spark cluster 13/09/02 12:15:44 ERROR actor.ActorSystemImpl: Uncaught error from thread [spark-akka.actor.default-dispatcher-1] 13/09/02 12:15:44 ERROR actor.ActorSystemImpl: Uncaught error from thread [spark-akka.actor.default-dispatcher-1] 13/09/02 12:15:45 INFO storage.MemoryStore: ensureFreeSpace(90796) called with curMem=0, maxMem=1018712555 13/09/02 12:15:45 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated size 88.7 KB, free 971.4 MB) 13/09/02 12:15:45 INFO network.ConnectionManager: Selector thread was interrupted! I'm running it using sbt run command. The work dir on worker nodes are empty (stderr and stdout files empty as well) Spark Master log has no other entries. Has anybody had a similar problem before? Regards, Marek
