Yesha Vora created YARN-8580: -------------------------------- Summary: yarn.resourcemanager.am.max-attempts is not respected for yarn services Key: YARN-8580 URL: https://issues.apache.org/jira/browse/YARN-8580 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Affects Versions: 3.1.1 Reporter: Yesha Vora
1) Max am attempt is set to 100 on all nodes. ( including gateway) {code} <property> <name>yarn.resourcemanager.am.max-attempts</name> <value>100</value> </property>{code} 2) Start a Yarn service ( Hbase tarball ) application 3) Kill AM 20 times Here, App fails with below diagnostics. {code} bash-4.2$ /usr/hdp/current/hadoop-yarn-client/bin/yarn application -status application_1532481557746_0001 18/07/25 18:43:34 INFO client.AHSProxy: Connecting to Application History server at xxx/xxx:10200 18/07/25 18:43:34 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 18/07/25 18:43:34 INFO conf.Configuration: found resource resource-types.xml at file:/etc/hadoop/3.0.0.0-1634/0/resource-types.xml Application Report : Application-Id : application_1532481557746_0001 Application-Name : hbase-tarball-lr Application-Type : yarn-service User : hbase Queue : default Application Priority : 0 Start-Time : 1532481864863 Finish-Time : 1532522943103 Progress : 100% State : FAILED Final-State : FAILED Tracking-URL : https://xxx:8090/cluster/app/application_1532481557746_0001 RPC Port : -1 AM Host : N/A Aggregate Resource Allocation : 252150112 MB-seconds, 164141 vcore-seconds Aggregate Resource Preempted : 0 MB-seconds, 0 vcore-seconds Log Aggregation Status : SUCCEEDED Diagnostics : Application application_1532481557746_0001 failed 20 times (global limit =100; local limit is =20) due to AM Container for appattempt_1532481557746_0001_000020 exited with exitCode: 137 Failing this attempt.Diagnostics: [2018-07-25 12:49:00.784]Container killed on request. Exit code is 137 [2018-07-25 12:49:03.045]Container exited with a non-zero exit code 137. [2018-07-25 12:49:03.045]Killed by external signal For more detailed output, check the application tracking page: https://xxx:8090/cluster/app/application_1532481557746_0001 Then click on links to logs of each attempt. . Failing the application. Unmanaged Application : false Application Node Label Expression : <Not set> AM container Node Label Expression : <DEFAULT_PARTITION> TimeoutType : LIFETIME ExpiryTime : 2018-07-25T22:26:15.419+0000 RemainingTime : 0seconds {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org