[jira] [Comment Edited] (YARN-6495) check docker container's exit code when writing to cgroup task files

Eric Badger (JIRA) Mon, 16 Apr 2018 09:34:34 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439690#comment-16439690
 ]


Eric Badger edited comment on YARN-6495 at 4/16/18 4:33 PM:
------------------------------------------------------------

Hey [~Jaeboo], thanks for the patch update. The patch doesn't apply for me on 
trunk. I believe a rebase is required. However, here are my comments looking at 
the patch 
{noformat}
+      // write pid to cgroups
+      char* const* cgroup_ptr;
+      int docker_exit_code = 0;
+      for (cgroup_ptr = resources_values; cgroup_ptr != NULL &&
+           *cgroup_ptr != NULL; ++cgroup_ptr) {
+        if (strcmp(*cgroup_ptr, "none") != 0 &&
+          write_pid_to_cgroup_as_root(*cgroup_ptr, pid) != 0) {
+          docker_exit_code = check_docker_exit_code(docker_binary, 
container_id);
+          if (docker_exit_code != 0) {
+            exit_code = docker_exit_code;
+            goto cleanup;
+          } else {
+            exit_code = WRITE_CGROUP_FAILED;
+            goto cleanup;
+          }
+        }
+      }
{noformat}
This is semantically different from the previous version of the patch in that 
now failed cgroup writes will always cause an error. When the cgroup write 
fails due to {{no such process}}, but the docker exit code is 0, we want to 
continue on without error. 

Additionally, as of now, there is currently no support in 
{{write_pid_to_cgroup_as_root()}} to differentiate between an error due to {{no 
such process}} or a different type of error (opening the files or changing 
effective user). On the former, we want to ignore the cgroup write error so 
long as the docker exit code is 0. On the latter, we want to fail regardless of 
the docker outcome. 


was (Author: ebadger):
Hey [~Jaeboo], thanks for the patch update
{noformat}
+      // write pid to cgroups
+      char* const* cgroup_ptr;
+      int docker_exit_code = 0;
+      for (cgroup_ptr = resources_values; cgroup_ptr != NULL &&
+           *cgroup_ptr != NULL; ++cgroup_ptr) {
+        if (strcmp(*cgroup_ptr, "none") != 0 &&
+          write_pid_to_cgroup_as_root(*cgroup_ptr, pid) != 0) {
+          docker_exit_code = check_docker_exit_code(docker_binary, 
container_id);
+          if (docker_exit_code != 0) {
+            exit_code = docker_exit_code;
+            goto cleanup;
+          } else {
+            exit_code = WRITE_CGROUP_FAILED;
+            goto cleanup;
+          }
+        }
+      }
{noformat}
This is semantically different from the previous version of the patch in that 
now failed cgroup writes will always cause an error. When the cgroup write 
fails due to {{no such process}}, but the docker exit code is 0, we want to 
continue on without error. 

Additionally, as of now, there is currently no support in 
{{write_pid_to_cgroup_as_root()}} to differentiate between an error due to {{no 
such process}} or a different type of error (opening the files or changing 
effective user). On the former, we want to ignore the cgroup write error so 
long as the docker exit code is 0. On the latter, we want to fail regardless of 
the docker outcome. 

> check docker container's exit code when writing to cgroup task files
> --------------------------------------------------------------------
>
>                 Key: YARN-6495
>                 URL: https://issues.apache.org/jira/browse/YARN-6495
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Jaeboo Jeong
>            Assignee: Jaeboo Jeong
>            Priority: Major
>         Attachments: YARN-6495.001.patch, YARN-6495.002.patch
>
>
> If I execute simple command like date on docker container, the application 
> failed to complete successfully.
> for example, 
> {code}
> $ yarn  jar 
> $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar
>  -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop-docker -shell_command "date" -jar 
> $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar
>  -num_containers 1 -timeout 3600000
> …
> 17/04/12 00:16:40 INFO distributedshell.Client: Application did finished 
> unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring 
> loop
> 17/04/12 00:16:40 ERROR distributedshell.Client: Application failed to 
> complete successfully
> {code}
> The error log is like below.
> {code}
> ...
> Failed to write pid to file 
> /cgroup_parent/cpu/hadoop-yarn/container_xxxx/tasks - No such process
> ...
> {code}
> When writing pid to cgroup tasks, container-executor doesn’t check docker 
> container’s status.
> If the container finished very quickly, we can’t write pid to cgroup tasks, 
> and it is not problem.
> So container-executor needs to check docker container’s exit code during 
> writing pid to cgroup tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-6495) check docker container's exit code when writing to cgroup task files

Reply via email to