[ https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343403#comment-14343403 ]
Abin Shahab commented on YARN-3080: ----------------------------------- [~vvasudev], Thanks again for looking into this. I'm fine with the simpler implementation as long as it works. However, I'm convinced it does not kill the process: {code} root@10-10-10-101:~# bash -x ./bash_test.sh + echo 11163 + exec bash -x ./docker_launch.sh + docker run -itd ubuntu bash -c 'sleep infinity' 48df7021c1c2402e77069fad8c9fced6fd74dfc00fc3b6d67b2b4fac86585c86 root@10-10-10-101:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 48df7021c1c2 ubuntu:14.04 "bash -c 'sleep infi 5 seconds ago Up 4 seconds silly_lumiere root@10-10-10-101:~# cat /tmp/pidfile 11163 root@10-10-10-101:~# kill -9 -11163 -bash: kill: (-11163) - No such process root@10-10-10-101:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 48df7021c1c2 ubuntu:14.04 "bash -c 'sleep infi 27 seconds ago Up 26 seconds silly_lumiere root@10-10-10-101:~# docker inspect --format {{.State.Pid}} 48df7021c1c2 11171 root@10-10-10-101:~# pstree -ps 11171 init(1)---docker(6512)---sleep(11171) root@10-10-10-101:~# kill -9 11171 root@10-10-10-101:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES {code} > The DockerContainerExecutor could not write the right pid to container pidFile > ------------------------------------------------------------------------------ > > Key: YARN-3080 > URL: https://issues.apache.org/jira/browse/YARN-3080 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.6.0 > Reporter: Beckham007 > Assignee: Abin Shahab > Attachments: YARN-3080.patch, YARN-3080.patch, YARN-3080.patch, > YARN-3080.patch > > > The docker_container_executor_session.sh is like this: > {quote} > #!/usr/bin/env bash > echo `/usr/bin/docker inspect --format {{.State.Pid}} > container_1421723685222_0008_01_000002` > > /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_000002/container_1421723685222_0008_01_000002.pid.tmp > /bin/mv -f > /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_000002/container_1421723685222_0008_01_000002.pid.tmp > > /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_000002/container_1421723685222_0008_01_000002.pid > /usr/bin/docker run --rm --name container_1421723685222_0008_01_000002 -e > GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e > GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e > GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e > GAIA_CONTAINER_ID=container_1421723685222_0008_01_000002 --memory=32M > --cpu-shares=1024 -v > /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_000002:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_000002 > -v > /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_000002:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_000002 > -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash > "/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_000002/launch_container.sh" > {quote} > The DockerContainerExecutor use docker inspect before docker run, so the > docker inspect couldn't get the right pid for the docker, signalContainer() > and nm restart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)