[ 
https://issues.apache.org/jira/browse/YARN-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-5366:
------------------------------
    Description: 
There are several paths that need to be improved with regard to the Docker 
container lifecycle when running Docker containers on YARN.

1) Provide the ability to keep a container on the NodeManager for a set period 
of time for debugging purposes.
2) Support sending signals to the process in the container to allow for 
triggering stack traces, heap dumps, etc.
3) Support for Docker's live restore, which means moving away from the use of 
{{docker wait}}. (YARN-5818)
4) Improve the resiliency of liveliness checks (kill -0) by adding retries.
5) Improve the resiliency of container removal by adding retries.
6) Only attempt to stop, kill, and remove containers if the current container 
state allows for it.
7) Better handling of short lived containers when the container is stopped 
before the PID can be retrieved. (YARN-6305)

  was:Currently, completed and failed docker containers are removed by 
container-executor. Add a job level environment variable to 
DockerLinuxContainerRuntime to allow the user to toggle whether they want the 
container deleted or not and remove the logic from container-executor.


> Improve handling of the Docker container life cycle
> ---------------------------------------------------
>
>                 Key: YARN-5366
>                 URL: https://issues.apache.org/jira/browse/YARN-5366
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>            Reporter: Shane Kumpf
>            Assignee: Shane Kumpf
>              Labels: oct16-medium
>         Attachments: YARN-5366.001.patch, YARN-5366.002.patch, 
> YARN-5366.003.patch, YARN-5366.004.patch, YARN-5366.005.patch, 
> YARN-5366.006.patch
>
>
> There are several paths that need to be improved with regard to the Docker 
> container lifecycle when running Docker containers on YARN.
> 1) Provide the ability to keep a container on the NodeManager for a set 
> period of time for debugging purposes.
> 2) Support sending signals to the process in the container to allow for 
> triggering stack traces, heap dumps, etc.
> 3) Support for Docker's live restore, which means moving away from the use of 
> {{docker wait}}. (YARN-5818)
> 4) Improve the resiliency of liveliness checks (kill -0) by adding retries.
> 5) Improve the resiliency of container removal by adding retries.
> 6) Only attempt to stop, kill, and remove containers if the current container 
> state allows for it.
> 7) Better handling of short lived containers when the container is stopped 
> before the PID can be retrieved. (YARN-6305)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to