[ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16570774#comment-16570774
 ] 

Chandni Singh edited comment on YARN-8160 at 8/6/18 9:07 PM:
-------------------------------------------------------------

Attached are the logs of container 3 that fails to re-initialize. When it is 
re-initialized, the container is stopped and cleanup. This causes the container 
to exit but here it exits with code {{255}} instead of {{FORCE_KILLED}} or 
{{TERMINATED}}.

Since the container exits with a failure code, that is {{255}}, the status of 
the container in NM changes from {{REINITIALIZING_AWAITING_KILL}} to 
{{EXITED_WITH_FAILURE}}.

Below are the relevant log stmts:

1. Reinit of the container is triggered
{code:java}
 ctr005.log:2018-08-02 22:30:41,100 DEBUG container.ContainerImpl 
(ContainerImpl.java:handle(2080)) - Processing 
container_e02_1533231998644_0009_01_000003 of type REINITIALIZE_CONTAINER

ctr005.log:2018-08-02 22:30:41,101 INFO container.ContainerImpl 
(ContainerImpl.java:handle(2093)) - Container 
container_e02_1533231998644_0009_01_000003 transitioned from RUNNING to 
REINITIALIZING_AWAITING_KIL
{code}
2. Reinit triggers cleanup of the container
{code:java}
ctr005.log:2018-08-02 22:30:41,102 INFO launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(734)) - Cleaning up container 
container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:41,102 DEBUG recovery.NMLeveldbStateStoreService 
(NMLeveldbStateStoreService.java:storeContainerKilled(555)) - 
storeContainerKilled: containerId=container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(752)) - Marking container 
container_e02_1533231998644_0009_01_000003 as inactive
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(759)) - Getting pid for container 
container_e02_1533231998644_0009_01_000003 to kill from pid file 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003/container_e02_1533231998644_0009_01_000003.pid
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:getContainerPid(1084)) - Accessing pid for container 
container_e02_1533231998644_0009_01_000003 from pid file 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003/container_e02_1533231998644_0009_01_000003.pid
ctr005.log:2018-08-02 22:30:41,102 DEBUG util.ProcessIdFileReader 
(ProcessIdFileReader.java:getProcessId(53)) - Accessing pid from pid file 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003/container_e02_1533231998644_0009_01_000003.pid
ctr005.log:2018-08-02 22:30:41,102 DEBUG util.ProcessIdFileReader 
(ProcessIdFileReader.java:getProcessId(103)) - Got pid 364708 from path 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003/container_e02_1533231998644_0009_01_000003.pid
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:getContainerPid(1096)) - Got pid 364708 for container 
container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:signalProcess(919)) - Sending signal to pid 364708 as 
user root for container container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:41,102 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: 
inspect docker-command=inspect format=\{{.State.Status}} 
name=container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:41,103 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:getPrivilegedOperationExecutionCommand(119)) 
- Privileged Execution Command Array: 
[/hadoop_dist/hadoop-yarn/bin/container-executor, --inspect-docker-container, 
--format=\{{.State.Status}}, container_e02_1533231998644_0009_01_000003]
ctr005.log:2018-08-02 22:30:41,129 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:executePrivilegedOperation(155)) - 
[/hadoop_dist/hadoop-yarn/bin/container-executor, --inspect-docker-container, 
--format=\{{.State.Status}}, container_e02_1533231998644_0009_01_000003]
ctr005.log:2018-08-02 22:30:41,130 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:getContainerStatus(154)) - Container Status: 
running ContainerId: container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:41,131 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: 
stop docker-command=stop name=container_e02_1533231998644_0009_01_000003
{code}
3. After 10 seconds, the stop command sent to the executor completes and the 
container is removed
{code:java}
ctr005.log:2018-08-02 22:30:51,251 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:executePrivilegedOperation(155)) - 
[/hadoop_dist/hadoop-yarn/bin/container-executor, --run-docker, 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003/docker.container_e02_1533231998644_0009_01_0000038521705952835205058.cmd]
ctr005.log:2018-08-02 22:30:51,251 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:executePrivilegedOperation(157)) - 
container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:51,251 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:signalProcess(927)) - Sent signal SIGTERM to pid 364708 
as user root for container container_e02_1533231998644_0009_01_000003, 
result=success

ctr005.log:2018-08-02 22:30:51,298 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: 
rm docker-command=rm name=container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:51,298 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:getPrivilegedOperationExecutionCommand(119)) 
- Privileged Execution Command Array: 
[/hadoop_dist/hadoop-yarn/bin/container-executor, --remove-docker-container, 
container_e02_1533231998644_0009_01_000003]

ctr005.log:2018-08-02 22:30:51,977 DEBUG nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:postComplete(963)) - 
container_e02_1533231998644_0009_01_000003 post complete
ctr005.log:2018-08-02 22:30:51,977 DEBUG resources.CGroupsHandlerImpl 
(CGroupsHandlerImpl.java:deleteCGroup(535)) - deleteCGroup: 
/sys/fs/cgroup/cpu/hadoop-yarn-tmp-ctr-e138-1518143905142-423707-01-000002.localhost/container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:51,997 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainerFiles(1876)) - cleanup container 
/tmp/hadoop/yarn/local/usercache/root/appcache/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003
 files
ctr005.log:2018-08-02 22:30:51,998 INFO nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:deleteAsUser(815)) - Deleting absolute path : 
/tmp/hadoop/yarn/local/usercache/root/appcache/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003/launch_container.sh
ctr005.log:2018-08-02 22:30:51,998 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:getPrivilegedOperationExecutionCommand(119)) 
- Privileged Execution Command Array: 
[/hadoop_dist/hadoop-yarn/bin/container-executor, nobody, root, 3, 
/tmp/hadoop/yarn/local/usercache/root/appcache/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003/launch_container.sh]
ctr005.log:2018-08-02 22:30:52,006 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:executePrivilegedOperation(155)) - 
[/hadoop_dist/hadoop-yarn/bin/container-executor, nobody, root, 3, 
/tmp/hadoop/yarn/local/usercache/root/appcache/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003/launch_container.sh]
ctr005.log:2018-08-02 22:30:52,006 INFO nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:deleteAsUser(815)) - Deleting absolute path : 
/tmp/hadoop/yarn/local/usercache/root/appcache/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003/container_tokens
ctr005.log:2018-08-02 22:30:52,006 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:getPrivilegedOperationExecutionCommand(119)) 
- Privileged Execution Command Array: 
[/hadoop_dist/hadoop-yarn/bin/container-executor, nobody, root, 3, 
/tmp/hadoop/yarn/local/usercache/root/appcache/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003/container_tokens]
{code}
4. Meanwhile, the container exits with exit code 255
{code:java}
ctr005.log:2018-08-02 22:30:52,040 WARN nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:handleExitCode(585)) - Exit code from container 
container_e02_1533231998644_0009_01_000003 is : 255
ctr005.log:2018-08-02 22:30:52,040 WARN nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:handleExitCode(591)) - Exception from 
container-launch with container ID: container_e02_1533231998644_0009_01_000003 
and exit code: 255
ctr005.log:2018-08-02 22:30:52,041 INFO nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Container id: 
container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:52,041 INFO nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Shell error output: Error: No such 
object: container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:52,041 INFO nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Could not inspect docker to get pid 
/usr/bin/docker inspect --format \{{.State.Pid}} 
container_e02_1533231998644_0009_01_000003.
ctr005.log:2018-08-02 22:30:52,041 INFO nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Error: No such object: 
container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:52,041 INFO nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Could not inspect docker to get pid 
/usr/bin/docker inspect --format \{{.State.Pid}} 
container_e02_1533231998644_0009_01_000003.
ctr005.log:2018-08-02 22:30:52,041 INFO nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Error: No such object: 
container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:52,041 INFO nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Could not inspect docker to get 
exitcode: /usr/bin/docker inspect --format \{{.State.ExitCode}} 
container_e02_1533231998644_0009_01_000003.
{code}


was (Author: csingh):
Attached are the logs of ctr005 that fails to re-initialize. When it is 
re-initialized, the container is stopped and cleanup. This causes the container 
to exit but here it exits with code {{255}} instead of {{FORCE_KILLED}} or 
{{TERMINATED}}.

Since the container exits with a failure code, that is {{255}}, the status of 
the container in NM changes from {{REINITIALIZING_AWAITING_KILL}} to 
{{EXITED_WITH_FAILURE}}.

Below are the relevant log stmts:

1. Reinit of the container is triggered
{code}
 ctr005.log:2018-08-02 22:30:41,100 DEBUG container.ContainerImpl 
(ContainerImpl.java:handle(2080)) - Processing 
container_e02_1533231998644_0009_01_000003 of type REINITIALIZE_CONTAINER

ctr005.log:2018-08-02 22:30:41,101 INFO container.ContainerImpl 
(ContainerImpl.java:handle(2093)) - Container 
container_e02_1533231998644_0009_01_000003 transitioned from RUNNING to 
REINITIALIZING_AWAITING_KIL
{code}

2. Reinit triggers cleanup of the container
{code}
ctr005.log:2018-08-02 22:30:41,102 INFO launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(734)) - Cleaning up container 
container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:41,102 DEBUG recovery.NMLeveldbStateStoreService 
(NMLeveldbStateStoreService.java:storeContainerKilled(555)) - 
storeContainerKilled: containerId=container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(752)) - Marking container 
container_e02_1533231998644_0009_01_000003 as inactive
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(759)) - Getting pid for container 
container_e02_1533231998644_0009_01_000003 to kill from pid file 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003/container_e02_1533231998644_0009_01_000003.pid
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:getContainerPid(1084)) - Accessing pid for container 
container_e02_1533231998644_0009_01_000003 from pid file 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003/container_e02_1533231998644_0009_01_000003.pid
ctr005.log:2018-08-02 22:30:41,102 DEBUG util.ProcessIdFileReader 
(ProcessIdFileReader.java:getProcessId(53)) - Accessing pid from pid file 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003/container_e02_1533231998644_0009_01_000003.pid
ctr005.log:2018-08-02 22:30:41,102 DEBUG util.ProcessIdFileReader 
(ProcessIdFileReader.java:getProcessId(103)) - Got pid 364708 from path 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003/container_e02_1533231998644_0009_01_000003.pid
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:getContainerPid(1096)) - Got pid 364708 for container 
container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:signalProcess(919)) - Sending signal to pid 364708 as 
user root for container container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:41,102 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: 
inspect docker-command=inspect format=\{{.State.Status}} 
name=container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:41,103 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:getPrivilegedOperationExecutionCommand(119)) 
- Privileged Execution Command Array: 
[/hadoop_dist/hadoop-yarn/bin/container-executor, --inspect-docker-container, 
--format=\{{.State.Status}}, container_e02_1533231998644_0009_01_000003]
ctr005.log:2018-08-02 22:30:41,129 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:executePrivilegedOperation(155)) - 
[/hadoop_dist/hadoop-yarn/bin/container-executor, --inspect-docker-container, 
--format=\{{.State.Status}}, container_e02_1533231998644_0009_01_000003]
ctr005.log:2018-08-02 22:30:41,130 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:getContainerStatus(154)) - Container Status: 
running ContainerId: container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:41,131 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: 
stop docker-command=stop name=container_e02_1533231998644_0009_01_000003
{code}

3. After 10 seconds, the stop command sent to the executor completes and the 
container is removed
{code}
ctr005.log:2018-08-02 22:30:51,251 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:executePrivilegedOperation(155)) - 
[/hadoop_dist/hadoop-yarn/bin/container-executor, --run-docker, 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003/docker.container_e02_1533231998644_0009_01_0000038521705952835205058.cmd]
ctr005.log:2018-08-02 22:30:51,251 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:executePrivilegedOperation(157)) - 
container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:51,251 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:signalProcess(927)) - Sent signal SIGTERM to pid 364708 
as user root for container container_e02_1533231998644_0009_01_000003, 
result=success

ctr005.log:2018-08-02 22:30:51,298 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: 
rm docker-command=rm name=container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:51,298 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:getPrivilegedOperationExecutionCommand(119)) 
- Privileged Execution Command Array: 
[/hadoop_dist/hadoop-yarn/bin/container-executor, --remove-docker-container, 
container_e02_1533231998644_0009_01_000003]

ctr005.log:2018-08-02 22:30:51,977 DEBUG nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:postComplete(963)) - 
container_e02_1533231998644_0009_01_000003 post complete
ctr005.log:2018-08-02 22:30:51,977 DEBUG resources.CGroupsHandlerImpl 
(CGroupsHandlerImpl.java:deleteCGroup(535)) - deleteCGroup: 
/sys/fs/cgroup/cpu/hadoop-yarn-tmp-ctr-e138-1518143905142-423707-01-000002.localhost/container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:51,997 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainerFiles(1876)) - cleanup container 
/tmp/hadoop/yarn/local/usercache/root/appcache/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003
 files
ctr005.log:2018-08-02 22:30:51,998 INFO nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:deleteAsUser(815)) - Deleting absolute path : 
/tmp/hadoop/yarn/local/usercache/root/appcache/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003/launch_container.sh
ctr005.log:2018-08-02 22:30:51,998 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:getPrivilegedOperationExecutionCommand(119)) 
- Privileged Execution Command Array: 
[/hadoop_dist/hadoop-yarn/bin/container-executor, nobody, root, 3, 
/tmp/hadoop/yarn/local/usercache/root/appcache/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003/launch_container.sh]
ctr005.log:2018-08-02 22:30:52,006 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:executePrivilegedOperation(155)) - 
[/hadoop_dist/hadoop-yarn/bin/container-executor, nobody, root, 3, 
/tmp/hadoop/yarn/local/usercache/root/appcache/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003/launch_container.sh]
ctr005.log:2018-08-02 22:30:52,006 INFO nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:deleteAsUser(815)) - Deleting absolute path : 
/tmp/hadoop/yarn/local/usercache/root/appcache/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003/container_tokens
ctr005.log:2018-08-02 22:30:52,006 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:getPrivilegedOperationExecutionCommand(119)) 
- Privileged Execution Command Array: 
[/hadoop_dist/hadoop-yarn/bin/container-executor, nobody, root, 3, 
/tmp/hadoop/yarn/local/usercache/root/appcache/application_1533231998644_0009/container_e02_1533231998644_0009_01_000003/container_tokens]
{code}

4. Meanwhile, the container exits with exit code 255
{code}
ctr005.log:2018-08-02 22:30:52,040 WARN nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:handleExitCode(585)) - Exit code from container 
container_e02_1533231998644_0009_01_000003 is : 255
ctr005.log:2018-08-02 22:30:52,040 WARN nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:handleExitCode(591)) - Exception from 
container-launch with container ID: container_e02_1533231998644_0009_01_000003 
and exit code: 255
ctr005.log:2018-08-02 22:30:52,041 INFO nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Container id: 
container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:52,041 INFO nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Shell error output: Error: No such 
object: container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:52,041 INFO nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Could not inspect docker to get pid 
/usr/bin/docker inspect --format \{{.State.Pid}} 
container_e02_1533231998644_0009_01_000003.
ctr005.log:2018-08-02 22:30:52,041 INFO nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Error: No such object: 
container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:52,041 INFO nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Could not inspect docker to get pid 
/usr/bin/docker inspect --format \{{.State.Pid}} 
container_e02_1533231998644_0009_01_000003.
ctr005.log:2018-08-02 22:30:52,041 INFO nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Error: No such object: 
container_e02_1533231998644_0009_01_000003
ctr005.log:2018-08-02 22:30:52,041 INFO nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Could not inspect docker to get 
exitcode: /usr/bin/docker inspect --format \{{.State.ExitCode}} 
container_e02_1533231998644_0009_01_000003.
{code}

> Yarn Service Upgrade: Support upgrade of service that use docker containers 
> ----------------------------------------------------------------------------
>
>                 Key: YARN-8160
>                 URL: https://issues.apache.org/jira/browse/YARN-8160
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Chandni Singh
>            Assignee: Chandni Singh
>            Priority: Major
>              Labels: Docker
>         Attachments: container_e02_1533231998644_0009_01_000003.nm.log
>
>
> Ability to upgrade dockerized  yarn native services.
> Ref: YARN-5637
> *Background*
> Container upgrade is supported by the NM via {{reInitializeContainer}} api. 
> {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded 
> container.
> NM performs the following steps during {{reInitializeContainer}}:
> - kills the existing process
> - cleans up the container
> - launches another container with the new {{ContainerLaunchContext}}
> NOTE: {{ContainerLaunchContext}} holds all the information that needs to 
> upgrade the container.
> With {{reInitializeContainer}}, the following does *NOT* change
> - container ID. This is not created by NM. It is provided to it and here RM 
> is not creating another container allocation.
> - {{localizedResources}} this stays the same if the upgrade does *NOT* 
> require additional resources IIUC.
>  
> The following changes with {{reInitializeContainer}}
> - the working directory of the upgraded container changes. It is *NOT* a 
> relaunch. 
> *Changes required in the case of docker container*
> - {{reInitializeContainer}} seems to not be working with Docker containers. 
> Investigate and fix this.
> - [Future change] Add an additional api to NM to pull the images and modify 
> {{reInitializeContainer}} to trigger docker container launch without pulling 
> the image first which could be based on a flag.
>     -- When the service upgrade is initialized, we can provide the user with 
> an option to just pull the images  on the NMs.
>     -- When a component instance is upgrade, it calls the 
> {{reInitializeContainer}} with the flag pull-image set to false, since the NM 
> will have already pulled the images.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to