Ferenc Erdelyi created YARN-11709: ------------------------------------- Summary: NodeManager should be shut down or blacklisted when it cannot run program "/var/lib/yarn-ce/bin/container-executor" Key: YARN-11709 URL: https://issues.apache.org/jira/browse/YARN-11709 Project: Hadoop YARN Issue Type: Improvement Components: container-executor Reporter: Ferenc Erdelyi
When NodeManager encounters the below "No such file or directory" error reported against the "container-executor", it should give up participating in the cluster as it is not capable to run any container, but just fail the jobs. {code:java} 2023-01-18 10:08:10,600 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_e159_1673543180101_9407_02_ 000014 startLocalizer is : -1 org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: java.io.IOException: Cannot run program "/var/lib/yarn-ce/bin/container-executor": error=2, No such file or directory at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:183) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:403) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.j ava:1250) Caused by: java.io.IOException: Cannot run program "/var/lib/yarn-ce/bin/container-executor": error=2, No such file or directory {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org