[ https://issues.apache.org/jira/browse/YARN-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167078#comment-16167078 ]
Eric Badger commented on YARN-6495: ----------------------------------- Hey [~Jaeboo], sorry for taking so long to actually take a good look at this. I do have a question though. Just because the docker command exited with a non-zero exit code, does that necessarily mean that it failed due to a failed cgroup write? Wouldn't this allow a race condition with a possibly invalid docker command and the failed write to the cgroup by the container executor? I think it might be better to handle this by sending back the exit code of the docker command if it's non-0 and clearing the exit code if the docker command returned 0. Thoughts? > check docker container's exit code when writing to cgroup task files > -------------------------------------------------------------------- > > Key: YARN-6495 > URL: https://issues.apache.org/jira/browse/YARN-6495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Reporter: Jaeboo Jeong > Assignee: Jaeboo Jeong > Attachments: YARN-6495.001.patch > > > If I execute simple command like date on docker container, the application > failed to complete successfully. > for example, > {code} > $ yarn jar > $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar > -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop-docker -shell_command "date" -jar > $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar > -num_containers 1 -timeout 3600000 > … > 17/04/12 00:16:40 INFO distributedshell.Client: Application did finished > unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring > loop > 17/04/12 00:16:40 ERROR distributedshell.Client: Application failed to > complete successfully > {code} > The error log is like below. > {code} > ... > Failed to write pid to file > /cgroup_parent/cpu/hadoop-yarn/container_xxxx/tasks - No such process > ... > {code} > When writing pid to cgroup tasks, container-executor doesn’t check docker > container’s status. > If the container finished very quickly, we can’t write pid to cgroup tasks, > and it is not problem. > So container-executor needs to check docker container’s exit code during > writing pid to cgroup tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org