[ https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341761#comment-14341761 ]
Abin Shahab commented on YARN-3080: ----------------------------------- [~chenchun], Also, other threads in the NM may depend on the actual PID of the launched JVM, and I'm not going to preclude any future processes from depending upon this. The pidfile is supposed to be populated right after the container is launched, not after the process has finished. The DockerContainerExecutor will not alter these fundamental api and expectations of the NM from a container executor. Therefore, getting the actual PID from docker inspect is the most accurate and recommended way. > The DockerContainerExecutor could not write the right pid to container pidFile > ------------------------------------------------------------------------------ > > Key: YARN-3080 > URL: https://issues.apache.org/jira/browse/YARN-3080 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.6.0 > Reporter: Beckham007 > Assignee: Abin Shahab > Attachments: YARN-3080.patch, YARN-3080.patch, YARN-3080.patch, > YARN-3080.patch > > > The docker_container_executor_session.sh is like this: > {quote} > #!/usr/bin/env bash > echo `/usr/bin/docker inspect --format {{.State.Pid}} > container_1421723685222_0008_01_000002` > > /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_000002/container_1421723685222_0008_01_000002.pid.tmp > /bin/mv -f > /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_000002/container_1421723685222_0008_01_000002.pid.tmp > > /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_000002/container_1421723685222_0008_01_000002.pid > /usr/bin/docker run --rm --name container_1421723685222_0008_01_000002 -e > GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e > GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e > GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e > GAIA_CONTAINER_ID=container_1421723685222_0008_01_000002 --memory=32M > --cpu-shares=1024 -v > /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_000002:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_000002 > -v > /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_000002:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_000002 > -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash > "/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_000002/launch_container.sh" > {quote} > The DockerContainerExecutor use docker inspect before docker run, so the > docker inspect couldn't get the right pid for the docker, signalContainer() > and nm restart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)