[ https://issues.apache.org/jira/browse/YARN-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596637#comment-16596637 ]
Chandni Singh commented on YARN-8706: ------------------------------------- {quote}I am not entirely sure about globally identical killing mechanism for all container type, is a sane approach to brute force container shutdown. {quote} I am not sure what you mean. NM does a graceful shutdown for all types of containers. It first sends a {{SIGTERM}} and then after a grace period, sends {{SIGKILL}}. The {{SIGTERM}} for docker is handled by docker stop, which has the following problems: 1. grace period can be specified only in seconds 2. clubs {{SIGKILL}} with stop. Docker first sends a {{STOPSIGNAL}} to the root process and then after the grace period, sends {{SIGKILL}} to the root process. This is not what NM wants with the stop and docker stop doesn't give any option to NOT send {{SIGKILL}} The proposed change by [~ebadger] will just send the {{STOPSIGNAL}} which solves our problem. {quote}10 seconds default is probably more sensible to give the container a chance to shutdown gracefully without causing corruption to data. {quote} Why is this specific to docker containers? Other types of containers maybe dealing with data and if the default grace period of 250 millis is too small, then it can be changed with the config {{NM_SLEEP_DELAY_BEFORE_SIGKILL_MS}}. Maybe this should be something that the application could specify as well, but that is a different discussion. > DelayedProcessKiller is executed for Docker containers even though docker > stop sends a KILL signal after the specified grace period > ----------------------------------------------------------------------------------------------------------------------------------- > > Key: YARN-8706 > URL: https://issues.apache.org/jira/browse/YARN-8706 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Chandni Singh > Assignee: Chandni Singh > Priority: Major > Labels: docker > > {{DockerStopCommand}} adds a grace period of 10 seconds. > 10 seconds is also the default grace time use by docker stop > [https://docs.docker.com/engine/reference/commandline/stop/] > Documentation of the docker stop: > {quote}the main process inside the container will receive {{SIGTERM}}, and > after a grace period, {{SIGKILL}}. > {quote} > There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes > for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By > default this is set to {{250 milliseconds}} and so irrespective of the > container type, it will always get executed. > > For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}} > after the grace period > - when sleepDelayBeforeSigKill > 10 seconds, then there is no point of > executing DelayedProcessKiller > - when sleepDelayBeforeSigKill < 1 second, then the grace period should be > the smallest value, which is 1 second, because anyways we are forcing kill > after 250 ms > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org