[ https://issues.apache.org/jira/browse/YARN-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Brennan resolved YARN-10477. -------------------------------- Resolution: Invalid Closing this as invalid. The problem was only there in our internal version of container-executor. I should have checked the code in trunk before filing. > runc launch failure should not cause nodemanager to go unhealthy > ---------------------------------------------------------------- > > Key: YARN-10477 > URL: https://issues.apache.org/jira/browse/YARN-10477 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn > Affects Versions: 3.3.1, 3.4.1 > Reporter: Jim Brennan > Assignee: Jim Brennan > Priority: Major > > We have observed some failures when launching containers with runc. We have > not yet identified the root cause of those failures, but a side-effect of > these failures was the Nodemanager marked itself unhealthy. Since these are > rare failures that only affect a single launch, they should not cause the > Nodemanager to be marked unhealthy. > Here is an example RM log: > {noformat} > resourcemanager.log.2020-10-02-03.bz2:2020-10-02 03:20:10,255 [RM Event > dispatcher] INFO rmnode.RMNodeImpl: Node node:8041 reported UNHEALTHY with > details: Linux Container Executor reached unrecoverable exception > {noformat} > And here is an example of the NM log: > {noformat} > 2020-10-02 03:20:02,033 [ContainersLauncher #434] INFO > runtime.RuncContainerRuntime: Launch container failed for > container_e25_1601602719874_10691_01_001723 > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=24: OCI command has bad/missing local dire > ctories > {noformat} > The problem is that the runc code in container-executor is re-using exit code > 24 (INVALID_CONFIG_FILE) which is intended for problems with the > container-executor.cfg file, and those failures are fatal for the NM. We > should use a different exit code for these. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org