[ https://issues.apache.org/jira/browse/YARN-8580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556321#comment-16556321 ]
Giovanni Matteo Fumarola commented on YARN-8580: ------------------------------------------------ Hi [~yeshavora] , YARN gets the minimum between the global and the local limit. Your global limit is set to 100 (yarn.resourcemanager.am.max-attempts) while the AM limit is set to 20. Closing this Jira as invalid. Diagnostics : Application application_1532481557746_0001 failed 20 times (global limit =100; local limit is =20) > yarn.resourcemanager.am.max-attempts is not respected for yarn services > ----------------------------------------------------------------------- > > Key: YARN-8580 > URL: https://issues.apache.org/jira/browse/YARN-8580 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Affects Versions: 3.1.1 > Reporter: Yesha Vora > Priority: Major > > 1) Max am attempt is set to 100 on all nodes. ( including gateway) > {code} > <property> > <name>yarn.resourcemanager.am.max-attempts</name> > <value>100</value> > </property>{code} > 2) Start a Yarn service ( Hbase tarball ) application > 3) Kill AM 20 times > Here, App fails with below diagnostics. > {code} > bash-4.2$ /usr/hdp/current/hadoop-yarn-client/bin/yarn application -status > application_1532481557746_0001 > 18/07/25 18:43:34 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/07/25 18:43:34 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > 18/07/25 18:43:34 INFO conf.Configuration: found resource resource-types.xml > at file:/etc/hadoop/3.0.0.0-1634/0/resource-types.xml > Application Report : > Application-Id : application_1532481557746_0001 > Application-Name : hbase-tarball-lr > Application-Type : yarn-service > User : hbase > Queue : default > Application Priority : 0 > Start-Time : 1532481864863 > Finish-Time : 1532522943103 > Progress : 100% > State : FAILED > Final-State : FAILED > Tracking-URL : > https://xxx:8090/cluster/app/application_1532481557746_0001 > RPC Port : -1 > AM Host : N/A > Aggregate Resource Allocation : 252150112 MB-seconds, 164141 > vcore-seconds > Aggregate Resource Preempted : 0 MB-seconds, 0 vcore-seconds > Log Aggregation Status : SUCCEEDED > Diagnostics : Application application_1532481557746_0001 failed 20 > times (global limit =100; local limit is =20) due to AM Container for > appattempt_1532481557746_0001_000020 exited with exitCode: 137 > Failing this attempt.Diagnostics: [2018-07-25 12:49:00.784]Container killed > on request. Exit code is 137 > [2018-07-25 12:49:03.045]Container exited with a non-zero exit code 137. > [2018-07-25 12:49:03.045]Killed by external signal > For more detailed output, check the application tracking page: > https://xxx:8090/cluster/app/application_1532481557746_0001 Then click on > links to logs of each attempt. > . Failing the application. > Unmanaged Application : false > Application Node Label Expression : <Not set> > AM container Node Label Expression : <DEFAULT_PARTITION> > TimeoutType : LIFETIME ExpiryTime : 2018-07-25T22:26:15.419+0000 > RemainingTime : 0seconds > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org