[ https://issues.apache.org/jira/browse/YARN-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wangda Tan resolved YARN-7065. ------------------------------ Resolution: Duplicate This is duplicated by YARN-7163, closing as dup. > [RM UI] App status not getting updated in "All application" page > ---------------------------------------------------------------- > > Key: YARN-7065 > URL: https://issues.apache.org/jira/browse/YARN-7065 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Yesha Vora > Attachments: Screen Shot 2017-09-08 at 4.14.51 PM.png, Screen Shot > 2017-09-08 at 4.15.07 PM.png > > > Scenario: > 1) Run Spark Long Running application > 2) Do RM and NN failover randomly > 3) Validate App state in Yarn > The Spark applications are finished. Yarn-cli returns correct status of yarn > application. > {code} > [hrt_qa@xxx hadoopqe]$ yarn application -status application_1503203977699_0014 > 17/08/21 16:56:10 INFO client.AHSProxy: Connecting to Application History > server at host1 xxx.xx.xx.x:10200 > 17/08/21 16:56:10 INFO client.RequestHedgingRMFailoverProxyProvider: Looking > for the active RM in [rm1, rm2]... > 17/08/21 16:56:10 INFO client.RequestHedgingRMFailoverProxyProvider: Found > active RM [rm1] > Application Report : > Application-Id : application_1503203977699_0014 > Application-Name : > org.apache.spark.sql.execution.datasources.hbase.examples.LRJobForDataSources > Application-Type : SPARK > User : hrt_qa > Queue : default > Application Priority : null > Start-Time : 1503215983532 > Finish-Time : 1503250203806 > Progress : 0% > State : FAILED > Final-State : FAILED > Tracking-URL : > https://host1:8090/cluster/app/application_1503203977699_0014 > RPC Port : -1 > AM Host : N/A > Aggregate Resource Allocation : 174722793 MB-seconds, 170603 > vcore-seconds > Log Aggregation Status : SUCCEEDED > Diagnostics : Application application_1503203977699_0014 failed 20 > times due to AM Container for appattempt_1503203977699_0014_000020 exited > with exitCode: 1 > For more detailed output, check the application tracking page: > https://host1:8090/cluster/app/application_1503203977699_0014 Then click on > links to logs of each attempt. > Diagnostics: Exception from container-launch. > Container id: container_e04_1503203977699_0014_20_000001 > Exit code: 1 > Stack trace: > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Launch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:109) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:89) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:392) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Shell output: main : command provided 1 > main : run as user is hrt_qa > main : requested yarn user is hrt_qa > Getting exit code file... > Creating script paths... > Writing pid file... > Writing to tmp file > /grid/0/hadoop/yarn/local/nmPrivate/application_1503203977699_0014/container_e04_1503203977699_0014_20_000001/container_e04_1503203977699_0014_20_000001.pid.tmp > Writing to cgroup task files... > Creating local dirs... > Launching container... > Getting exit code file... > Creating script paths... > Container exited with a non-zero exit code 1 > Failing this attempt. Failing the application. > Unmanaged Application : false > Application Node Label Expression : <Not set> > AM container Node Label Expression : <DEFAULT_PARTITION>{code} > However, RM UI "All application" page still shows the application in > "RUNNING" State. > https://host1:8090/cluster > On clicking application_id ( > https://host1:8090/cluster/app/application_1503203977699_0014) , it redirects > to application page and there it shows correct application state = Failed. > The App status is not getting updated on Yarn All Application page. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org