[jira] [Commented] (YARN-4759) Fix signal handling for docker containers

Shane Kumpf (JIRA) Fri, 08 Sep 2017 05:24:15 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16158542#comment-16158542
 ]


Shane Kumpf commented on YARN-4759:
-----------------------------------

Thanks for the follow up [~ebadger]

{quote}
I thought that you had decided that we didn't need to worry about this in your 
comment above?
{quote}

I'm actually saying the opposite. My initial thought was to allow the user to 
tell YARN the stop/kill signal when submitting the job. However, after more 
research I found STOPSIGNAL, which means YARN doesn't need to explicitly handle 
this and the user can define the necessary signal via the Dockerfile. This 
depends on using {{docker stop}} though.

{quote}
How does docker stop solve the issue here? If the container doesn't exist yet, 
then docker stop will fail with "No such container" and stop trying. The 
documentation isn't very informative, but it doesn't appear to wait the grace 
period for the SIGKILL if it can't find the container in the first place.
{quote}

Sorry, I wasn't very clear before, I'm referring to a different situation. The 
container can exist, but the process inside the container may not be fully 
started and/or Docker has not yet written the PID to the data structure used by 
{{docker inspect}}. We use {{docker run}}, which does a {{docker create}} and 
{{docker start}} behind the scenes. If the image doesn't exist it is implicitly 
pulled during that time as well. You will often find the Created and StartedAt 
times in {{docker inspect}} differ wildly due to additional background 
operations. I will concede that {{docker stop}} is less necessary here, as a 
container still in Created state can be {{docker rm}}-ed (well, most of the 
time that is, but that's another discussion). However, the docker client is 
decoupled from YARN, so it's quite possible for races to occur and containers 
to become leaked, so it may still be useful in case the container has 
transitioned to running while we attempt to obtain the PID, etc.

> Fix signal handling for docker containers
> -----------------------------------------
>
>                 Key: YARN-4759
>                 URL: https://issues.apache.org/jira/browse/YARN-4759
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>            Reporter: Sidharta Seethana
>            Assignee: Shane Kumpf
>             Fix For: 2.9.0, 3.0.0-alpha1
>
>         Attachments: YARN-4759.001.patch, YARN-4759.002.patch, 
> YARN-4759.003.patch
>
>
> The current signal handling (in the DockerContainerRuntime) needs to be 
> revisited for docker containers. For example, container reacquisition on NM 
> restart might not work, depending on which user the process in the container 
> runs as. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4759) Fix signal handling for docker containers

Reply via email to