[ 
https://issues.apache.org/jira/browse/YARN-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249716#comment-16249716
 ] 

Eric Badger commented on YARN-5366:
-----------------------------------

bq. Would you be OK if I pursued a patch that didn't use --rm for now? We can 
revise that decision if the patch doesn't pan out?
Given your reasoning above, I'm ok with this. I still don't like it, but if we 
can't trust docker to actually clean up after itself then it seems that there's 
no much of an option. 

Do you want to handle YARN-7189 in this patch or should I put up a separate 
patch in YARN-7189? I had been waiting on putting up a patch until we resolved 
which direction we were going to go with container removal. 

bq. This also reminds me, we should probably document suggested kernel, docker, 
and docker backing storage configs that we've tested.
Yea, this would be good. More than just suggested, there will be minimum 
versions of the kernel and docker that must be used or it won't work at all. 
This can be a real pain for first-time users to figure out, so documenting it 
would be good.

> Improve handling of the Docker container life cycle
> ---------------------------------------------------
>
>                 Key: YARN-5366
>                 URL: https://issues.apache.org/jira/browse/YARN-5366
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>            Reporter: Shane Kumpf
>            Assignee: Shane Kumpf
>              Labels: oct16-medium
>         Attachments: YARN-5366.001.patch, YARN-5366.002.patch, 
> YARN-5366.003.patch, YARN-5366.004.patch, YARN-5366.005.patch, 
> YARN-5366.006.patch
>
>
> There are several paths that need to be improved with regard to the Docker 
> container lifecycle when running Docker containers on YARN.
> 1) Provide the ability to keep a container on the NodeManager for a set 
> period of time for debugging purposes.
> 2) Support sending signals to the process in the container to allow for 
> triggering stack traces, heap dumps, etc.
> 3) Support for Docker's live restore, which means moving away from the use of 
> {{docker wait}}. (YARN-5818)
> 4) Improve the resiliency of liveliness checks (kill -0) by adding retries.
> 5) Improve the resiliency of container removal by adding retries.
> 6) Only attempt to stop, kill, and remove containers if the current container 
> state allows for it.
> 7) Better handling of short lived containers when the container is stopped 
> before the PID can be retrieved. (YARN-6305)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to