[ https://issues.apache.org/jira/browse/YARN-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208603#comment-15208603 ]
Shane Kumpf commented on YARN-4759: ----------------------------------- We need to use docker client commands to signal to processes in containers versus the OS kill command. docker stop sends a SIGTERM to PID 1 and waits 10 seconds for the process to stop (by default, configurable), if the container hasn't stopped at the end of the timeout, SIGKILL is sent. docker kill, OTOH, has no delay and simply sends SIGKILL to PID 1 of the container (by default, signal configurable). Signals that invoke graceful shutdown vary between processes. For instance to gracefully shutdown nginx (allowing outstanding requests to finish) SIGQUIT should be sent. For Apache HTTPD, SIGWINCH is used for graceful shutdown. To complicate matters, the docker client sends signals PID 1 in the container, so depending on if exec form is used for CMD in the Dockerfile, the process we want to send the signal to may be a subprocess of the shell running as PID 1. User's that require specific signals will need to properly understand this limitation. We should allow for user configurable signals and timeouts. There are a couple of approaches to achieve this: 1) Only use docker kill and sleep in Java code. Docker kill accepts the --signal argument, but does not support a wait timeout. The flow would be: send signal, sleep 10 seconds by default or the user supplied sleep value. 2) Use docker stop if the user has not specified a signal. Use the default of 10 seconds for the timeout or the user supplied timeout. Use docker kill if the user supplies a signal. The default behavior should be to send a SIGTERM, sleep 10 seconds, if still running, send SIGKILL. Signal and timeouts should be configurable. How the above impacts NM reacquistion is yet to be determined, but it may make sense to make this an umbrella to split out the required changes. /cc [~sidharta-s] - thoughts on the above? > Revisit signalContainer() for docker containers > ----------------------------------------------- > > Key: YARN-4759 > URL: https://issues.apache.org/jira/browse/YARN-4759 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn > Reporter: Sidharta Seethana > Assignee: Shane Kumpf > > The current signal handling (in the DockerContainerRuntime) needs to be > revisited for docker containers. For example, container reacquisition on NM > restart might not work, depending on which user the process in the container > runs as. -- This message was sent by Atlassian JIRA (v6.3.4#6332)