[ https://issues.apache.org/jira/browse/YARN-8773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Prabhu Joseph resolved YARN-8773. --------------------------------- Resolution: Invalid > Blacklisting support for scheduling AMs > ---------------------------------------- > > Key: YARN-8773 > URL: https://issues.apache.org/jira/browse/YARN-8773 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler > Affects Versions: 2.2.0 > Reporter: Prabhu Joseph > Assignee: Wangda Tan > Priority: Major > > MapReduce jobs failed with both AM attempts failing on same node - the node > had some issue. Both AM attempts are placed on same node as there is no > blacklisting feature. Customer is expecting a fix for YARN-2005 + YARN-4389. > Is it possible to backport it to HDP-2.2.9 and do we have any better > workaround to avoid this issue. > {code} > "2018-08-18 11:32:57,855 WARN > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=edwaefrp > OPERATION=Application Finished - Failed TARGET=RMAppManager > RESULT=FAILURE DESCRIPTION=App failed with state: FAILED > PERMISSIONS=Application application_1529242338015_465184 failed 2 times due > to AM Container for appattempt_1529242338015_465184_000002 exited with > exitCode: -1000 > For more detailed output, check application tracking > page:https://ma4-gbihrcp-lnn14.corp.apple.com:8078/proxy/application_1529242338015_465184/Then, > click on links to logs of each attempt. > Diagnostics: Error while running command to get file permissions : > ExitCodeException exitCode=139: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) > at org.apache.hadoop.util.Shell.run(Shell.java:455) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:808) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:791) > at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1103) > at > org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:659) > at > org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:634) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1411) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService.java:1375) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$1000(ResourceLocalizationService.java:139) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1102) > Failing this attempt. Failing the application. > APPID=application_1529242338015_465184","2018-08-18T11:32:57.855+0000","ma4-gbihrcp-lnn14.corp.apple.com","gbi_hadoop_prod_hrc_core",16,"/ngs/app/yarn/hadoop/logs/yarn/yarn-yarn-resourcemanager-ma4-gbihrcp-lnn14.corp.apple.com.log","yarn_resourcemanager_log","in-gncs-159.corp.apple.com", > "2018-08-18 11:32:57,855 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application > application_1529242338015_465184 failed 2 times due to AM Container for > appattempt_1529242338015_465184_000002 exited with exitCode: -1000 > For more detailed output, check application tracking > page:https://ma4-gbihrcp-lnn14.corp.apple.com:8078/proxy/application_1529242338015_465184/Then, > click on links to logs of each attempt. > Diagnostics: Error while running command to get file permissions : > ExitCodeException exitCode=139: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) > at org.apache.hadoop.util.Shell.run(Shell.java:455) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:808) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:791) > at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1103) > at > org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:659) > at > org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:634) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1411) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService.java:1375) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$1000(ResourceLocalizationService.java:139) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1102) > Failing this attempt. Failing the > application.","2018-08-18T11:32:57.855+0000","ma4-gbihrcp-lnn14.corp.apple.com","gbi_hadoop_prod_hrc_core",16,"/ngs/app/yarn/hadoop/logs/yarn/yarn-yarn-resourcemanager-ma4-gbihrcp-lnn14.corp.apple.com.log","yarn_resourcemanager_log","in-gncs-159.corp.apple.com", > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org