[ 
https://issues.apache.org/jira/browse/YARN-8773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph resolved YARN-8773.
---------------------------------
    Resolution: Invalid

> Blacklisting support for scheduling AMs 
> ----------------------------------------
>
>                 Key: YARN-8773
>                 URL: https://issues.apache.org/jira/browse/YARN-8773
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 2.2.0
>            Reporter: Prabhu Joseph
>            Assignee: Wangda Tan
>            Priority: Major
>
> MapReduce jobs failed with both AM attempts failing on same node - the node 
> had some issue. Both AM attempts are placed on same node as there is no 
> blacklisting feature. Customer is expecting a fix for YARN-2005 + YARN-4389. 
> Is it possible to backport it to HDP-2.2.9 and do we have any better 
> workaround to avoid this issue. 
> {code}
> "2018-08-18 11:32:57,855 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=edwaefrp    
>     OPERATION=Application Finished - Failed TARGET=RMAppManager     
> RESULT=FAILURE  DESCRIPTION=App failed with state: FAILED       
> PERMISSIONS=Application application_1529242338015_465184 failed 2 times due 
> to AM Container for appattempt_1529242338015_465184_000002 exited with  
> exitCode: -1000
> For more detailed output, check application tracking 
> page:https://ma4-gbihrcp-lnn14.corp.apple.com:8078/proxy/application_1529242338015_465184/Then,
>  click on links to logs of each attempt.
> Diagnostics: Error while running command to get file permissions : 
> ExitCodeException exitCode=139:
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>         at org.apache.hadoop.util.Shell.run(Shell.java:455)
>         at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
>         at org.apache.hadoop.util.Shell.execCommand(Shell.java:808)
>         at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
>         at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1103)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:659)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:634)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1411)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService.java:1375)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$1000(ResourceLocalizationService.java:139)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1102)
> Failing this attempt. Failing the application.  
> APPID=application_1529242338015_465184","2018-08-18T11:32:57.855+0000","ma4-gbihrcp-lnn14.corp.apple.com","gbi_hadoop_prod_hrc_core",16,"/ngs/app/yarn/hadoop/logs/yarn/yarn-yarn-resourcemanager-ma4-gbihrcp-lnn14.corp.apple.com.log","yarn_resourcemanager_log","in-gncs-159.corp.apple.com",
> "2018-08-18 11:32:57,855 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application 
> application_1529242338015_465184 failed 2 times due to AM Container for 
> appattempt_1529242338015_465184_000002 exited with  exitCode: -1000
> For more detailed output, check application tracking 
> page:https://ma4-gbihrcp-lnn14.corp.apple.com:8078/proxy/application_1529242338015_465184/Then,
>  click on links to logs of each attempt.
> Diagnostics: Error while running command to get file permissions : 
> ExitCodeException exitCode=139:
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>         at org.apache.hadoop.util.Shell.run(Shell.java:455)
>         at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
>         at org.apache.hadoop.util.Shell.execCommand(Shell.java:808)
>         at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
>         at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1103)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:659)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:634)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1411)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService.java:1375)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$1000(ResourceLocalizationService.java:139)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1102)
> Failing this attempt. Failing the 
> application.","2018-08-18T11:32:57.855+0000","ma4-gbihrcp-lnn14.corp.apple.com","gbi_hadoop_prod_hrc_core",16,"/ngs/app/yarn/hadoop/logs/yarn/yarn-yarn-resourcemanager-ma4-gbihrcp-lnn14.corp.apple.com.log","yarn_resourcemanager_log","in-gncs-159.corp.apple.com",
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to