Hi All, I am trying to run a simple word count using YARN as a cluster manager. I am currently using Spark 2.3.1 and Apache hadoop 2.7.3. When I spawn spark-shell like below it gets stuck in ACCEPTED stated forever.
./bin/spark-shell --master yarn --deploy-mode client I set my log4j.properties in SPARK_HOME/conf to TRACE queue: "default" name: "Spark shell" host: "N/A" rpc_port: -1 yarn_application_state: ACCEPTED trackingUrl: " http://Kants-MacBook-Pro-2.local:8088/proxy/application_1531056583425_0001/" diagnostics: "" startTime: 1531056632496 finishTime: 0 final_application_status: APP_UNDEFINED app_resource_Usage { num_used_containers: 0 num_reserved_containers: 0 used_resources { memory: 0 virtual_cores: 0 } reserved_resources { memory: 0 virtual_cores: 0 } needed_resources { memory: 0 virtual_cores: 0 } memory_seconds: 0 vcore_seconds: 0 } originalTrackingUrl: "N/A" currentApplicationAttemptId { application_id { id: 1 cluster_timestamp: 1531056583425 } attemptId: 1 } progress: 0.0 applicationType: "SPARK" }} 18/07/08 06:32:22 INFO Client: Application report for application_1531056583425_0001 (state: ACCEPTED) 18/07/08 06:32:22 DEBUG Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1531056632496 final status: UNDEFINED tracking URL: http://xxx-MacBook-Pro-2.local:8088/proxy/application_1531056583425_0001/ user: xxx 18/07/08 06:32:20 DEBUG Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1531056632496 final status: UNDEFINED tracking URL: http://Kants-MacBook-Pro-2.local:8088/proxy/application_1531056583425_0001/ user: kantkodali 18/07/08 06:32:21 TRACE ProtobufRpcEngine: 1: Call -> /0.0.0.0:8032: getApplicationReport {application_id { id: 1 cluster_timestamp: 1531056583425 }} 18/07/08 06:32:21 DEBUG Client: IPC Client (1608805714) connection to / 0.0.0.0:8032 from kantkodali sending #136 18/07/08 06:32:21 DEBUG Client: IPC Client (1608805714) connection to / 0.0.0.0:8032 from kantkodali got value #136 18/07/08 06:32:21 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 1ms 18/07/08 06:32:21 TRACE ProtobufRpcEngine: 1: Response <- /0.0.0.0:8032: getApplicationReport {application_report { applicationId { id: 1 cluster_timestamp: 1531056583425 } user: "xxx" queue: "default" name: "Spark shell" host: "N/A" rpc_port: -1 yarn_application_state: ACCEPTED trackingUrl: " http://xxx-MacBook-Pro-2.local:8088/proxy/application_1531056583425_0001/" diagnostics: "" startTime: 1531056632496 finishTime: 0 final_application_status: APP_UNDEFINED app_resource_Usage { num_used_containers: 0 num_reserved_containers: 0 used_resources { memory: 0 virtual_cores: 0 } reserved_resources { memory: 0 virtual_cores: 0 } needed_resources { memory: 0 virtual_cores: 0 } memory_seconds: 0 vcore_seconds: 0 } originalTrackingUrl: "N/A" currentApplicationAttemptId { application_id { id: 1 cluster_timestamp: 1531056583425 } attemptId: 1 } progress: 0.0 applicationType: "SPARK" }} 18/07/08 06:32:21 INFO Client: Application report for application_1531056583425_0001 (state: ACCEPTED) I have read this link <https://stackoverflow.com/questions/32658840/spark-shell-stuck-in-yarn-accepted-state> and here are the conf files that are different from default settings *yarn-site.xml* <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>16384</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>256</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>8192</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>8</value> </property> </configuration> *core-site.xml* <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration> *hdfs-site.xml* <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> you can imagine every other config remains untouched(so everything else has default settings) Finally, I have also tried to see if there any clues in resource manager logs but they dont seem to be helpful in terms of fixing the issue however I am newbie to yarn so please let me know if I missed out on something. 2018-07-08 06:54:57,345 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 1 2018-07-08 06:55:09,413 WARN org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The specific max attempts: 0 for application: 1 is invalid, because it is out of the range [1, 2]. Use the global max attempts instead. 2018-07-08 06:55:09,414 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with id 1 submitted by user xxx 2018-07-08 06:55:09,415 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing application with id application_1531058076308_0001 2018-07-08 06:55:09,416 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=kantkodali IP=10.0.0.58 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1531058076308_0001 2018-07-08 06:55:09,422 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1531058076308_0001 State change from NEW to NEW_SAVING on event= START 2018-07-08 06:55:09,422 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1531058076308_0001 2018-07-08 06:55:09,423 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1531058076308_0001 State change from NEW_SAVING to SUBMITTED on event=APP_NEW_SAVED 2018-07-08 06:55:09,425 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application added - appId: application_1531058076308_0001 user: kantkodali leaf-queue of parent: root #applications: 1 2018-07-08 06:55:09,425 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Accepted application application_1531058076308_0001 from user: kantkodali, in queue: default 2018-07-08 06:55:09,439 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1531058076308_0001 State change from SUBMITTED to ACCEPTED on event=APP_ACCEPTED 2018-07-08 06:55:09,470 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1531058076308_0001_000001 2018-07-08 06:55:09,471 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1531058076308_0001_000001 State change from NEW to SUBMITTED 2018-07-08 06:55:09,481 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: maximum-am-resource-percent is insufficient to start a single application in queue, it is likely set too low. skipping enforcement to allow at least one application to start 2018-07-08 06:55:09,481 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: maximum-am-resource-percent is insufficient to start a single application in queue for user, it is likely set too low. skipping enforcement to allow at least one application to start 2018-07-08 06:55:09,481 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Application application_1531058076308_0001 from user: xxx activated in queue: default 2018-07-08 06:55:09,482 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Application added - appId: application_1531058076308_0001 user: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$ User@fdd759d, leaf-queue: default #user-pending-applications: 0 #user-active-applications: 1 #queue-pending-applications: 0 #queue-active-applications: 1 2018-07-08 06:55:09,482 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Added Application Attempt appattempt_1531058076308_0001_000001 to scheduler from user kantkodali in queue default 2018-07-08 06:55:09,484 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1531058076308_0001_000001 State change from SUBMITTED to SCHEDULED Any help would be great! Thanks!