[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626455#comment-16626455 ]
Chandni Singh edited comment on YARN-7644 at 9/24/18 9:14 PM: -------------------------------------------------------------- For {{LAUNCH_CONTAINER}}, {{RELAUNCH_CONTAINER}}, {{RECOVER_CONTAINER}}, and {{RECOVER_PAUSED_CONTAINER}}, the {{ContainersLauncher}} service creates tasks and submits it to the executor to be performed in a non-blocking way: {code:java} containerLauncher.submit(launch); {code} However, for {{CLEANUP_CONTAINER}}, {{CLEANUP_CONTAINER_FOR_REINIT}}, {{SIGNAL_CONTAINER}}, {{PAUSE_CONTAINER}}, {{RESUME_CONTAINER}}, the actions are performed in a blocking way. {code:java} launcher.cleanupContainer(); {code} With this Jira, I can focus on {{CLEANUP_CONTAINER}} and {{CLEANUP_CONTAINER_FOR_REINIT}} events to be performed in a non-blocking way. Doesn't look the caller ({{ContainerImpl}}) waits anywhere for {{cleanupContainer()}} to be performed synchronously. It is triggered by dispatching {{ContainersLauncherEventType.CLEANUP_CONTAINER}} events. cc. [~ebadger] [~jlowe] was (Author: csingh): For {{LAUNCH_CONTAINER}}, {{RELAUNCH_CONTAINER}}, {{RECOVER_CONTAINER}}, and {{RECOVER_PAUSED_CONTAINER}}, the {{ContainersLauncher}} service creates tasks and submits it to the executor to be performed in a non-blocking way: {code:java} containerLauncher.submit(launch); {code} However, for {{CLEANUP_CONTAINER}}, {{CLEANUP_CONTAINER_FOR_REINIT}}, {{SIGNAL_CONTAINER}}, {{PAUSE_CONTAINER}}, {{RESUME_CONTAINER}}, the actions are performed in a blocking way. {code:java} launcher.cleanupContainer(); {code} With this Jira, I can focus on {{CLEANUP_CONTAINER}} and {{CLEANUP_CONTAINER_FOR_REINIT}} events to be performed in a non-blocking way. Doesn't look the caller ({{ContainerImpl}}) waits anywhere for {{cleanupContainer()}} to be performed synchronously. It is triggered by dispatching {{ContainersLauncherEventType.CLEANUP_CONTAINER}} events. > NM gets backed up deleting docker containers > -------------------------------------------- > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Reporter: Eric Badger > Assignee: Chandni Singh > Priority: Major > Labels: Docker > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org