[ 
https://issues.apache.org/jira/browse/YARN-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208603#comment-15208603
 ] 

Shane Kumpf commented on YARN-4759:
-----------------------------------

We need to use docker client commands to signal to processes in containers 
versus the OS kill command.

docker stop sends a SIGTERM to PID 1 and waits 10 seconds for the process to 
stop (by default, configurable), if the container hasn't stopped at the end of 
the timeout, SIGKILL is sent. docker kill, OTOH, has no delay and simply sends 
SIGKILL to PID 1 of the container (by default, signal configurable).

Signals that invoke graceful shutdown vary between processes. For instance to 
gracefully shutdown nginx (allowing outstanding requests to finish) SIGQUIT 
should be sent. For Apache HTTPD, SIGWINCH is used for graceful shutdown. 

To complicate matters, the docker client sends signals PID 1 in the container, 
so depending on if exec form is used for CMD in the Dockerfile, the process we 
want to send the signal to may be a subprocess of the shell running as PID 1. 
User's that require specific signals will need to properly understand this 
limitation.

We should allow for user configurable signals and timeouts. There are a couple 
of approaches to achieve this:

1) Only use docker kill and sleep in Java code. Docker kill accepts the 
--signal argument, but does not support a wait timeout. The flow would be: send 
signal, sleep 10 seconds by default  or the user supplied sleep value.

2) Use docker stop if the user has not specified a signal. Use the default of 
10 seconds for the timeout or the user supplied timeout. Use docker kill if the 
user supplies a signal.

The default behavior should be to send a SIGTERM, sleep 10 seconds, if still 
running, send SIGKILL. Signal and timeouts should be configurable.

How the above impacts NM reacquistion is yet to be determined, but it may make 
sense to make this an umbrella to split out the required changes.

/cc [~sidharta-s] - thoughts on the above?

> Revisit signalContainer() for docker containers
> -----------------------------------------------
>
>                 Key: YARN-4759
>                 URL: https://issues.apache.org/jira/browse/YARN-4759
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>            Reporter: Sidharta Seethana
>            Assignee: Shane Kumpf
>
> The current signal handling (in the DockerContainerRuntime) needs to be 
> revisited for docker containers. For example, container reacquisition on NM 
> restart might not work, depending on which user the process in the container 
> runs as. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to