[ 
https://issues.apache.org/jira/browse/YARN-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219536#comment-16219536
 ] 

Eric Badger commented on YARN-7395:
-----------------------------------

Here's the relevant lines from the NM log 
{noformat}
2017-10-25 20:03:07,549 [Container Monitor] WARN monitor.ContainersMonitorImpl: 
Process tree for container: container_e126_1508911755032_0004_02_000001 has 
processes older than 1 iteration running over the configured limit. 
Limit=536870912, current usage = 585281536
2017-10-25 20:03:07,551 [Container Monitor] WARN monitor.ContainersMonitorImpl: 
Container [pid=29030,containerID=container_e126_1508911755032_0004_02_000001] 
is running beyond physical memory limits. Current usage: 558.2 MB of 512 MB 
physical memory used; 2.8 GB of 1.0 GB virtual memory used. Killing container.
Dump of the process-tree for container_e126_1508911755032_0004_02_000001 :
        |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
        |- 29065 29030 29030 29030 (java) 6022 290 2962636800 142606 /bin/java 
-Djava.io.tmpdir=/tmp/yarn-local/usercache/ebadger/appcache/application_1508911755032_0004/container_e126_1508911755032_0004_02_000001/tmp
 -Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.container.log.dir=/tmp/yarn-logs/application_1508911755032_0004/container_e126_1508911755032_0004_02_000001
 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA 
-Dhadoop.root.logfile=syslog 
-XX:ErrorFile=/tmp/yarn-logs/application_1508911755032_0004/container_e126_1508911755032_0004_02_000001/hs_err_pid%p.log
 -XX:GCTimeLimit=50 -XX:ParallelGCThreads=4 -XX:NewRatio=8 
-Djava.net.preferIPv4Stack=true -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-Xloggc:/tmp/yarn-logs/application_1508911755032_0004/container_e126_1508911755032_0004_02_000001/gc.log
 -Xmx1024m -XX:NewRatio=8 -Djava.net.preferIPv4Stack=true 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster 
        |- 29030 29014 29030 29030 (bash) 3 2 9474048 285 /bin/bash -c 
/bin/java 
-Djava.io.tmpdir=/tmp/yarn-local/usercache/ebadger/appcache/application_1508911755032_0004/container_e126_1508911755032_0004_02_000001/tmp
 -Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.container.log.dir=/tmp/yarn-logs/application_1508911755032_0004/container_e126_1508911755032_0004_02_000001
 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA 
-Dhadoop.root.logfile=syslog 
-XX:ErrorFile=/tmp/yarn-logs/application_1508911755032_0004/container_e126_1508911755032_0004_02_000001/hs_err_pid%p.log
 -XX:GCTimeLimit=50 -XX:ParallelGCThreads=4 -XX:NewRatio=8 
-Djava.net.preferIPv4Stack=true -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-Xloggc:/tmp/yarn-logs/application_1508911755032_0004/container_e126_1508911755032_0004_02_000001/gc.log
 -Xmx1024m -XX:NewRatio=8 -Djava.net.preferIPv4Stack=true 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster 
1>/tmp/yarn-logs/application_1508911755032_0004/container_e126_1508911755032_0004_02_000001/stdout
 
2>/tmp/yarn-logs/application_1508911755032_0004/container_e126_1508911755032_0004_02_000001/stderr
  

2017-10-25 20:03:07,551 [Container Monitor] INFO monitor.ContainersMonitorImpl: 
Removed ProcessTree with root 29030
2017-10-25 20:03:07,551 [AsyncDispatcher event handler] INFO 
container.ContainerImpl: Container container_e126_1508911755032_0004_02_000001 
transitioned from RUNNING to KILLING
2017-10-25 20:03:07,552 [AsyncDispatcher event handler] INFO 
launcher.ContainerLaunch: Cleaning up container 
container_e126_1508911755032_0004_02_000001
2017-10-25 20:03:07,576 [AsyncDispatcher event handler] WARN 
nodemanager.LinuxContainerExecutor: Error in signalling container 29030 with 
SIGTERM; exit = 1
org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
 Signal container failed
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.signalContainer(DockerLinuxContainerRuntime.java:615)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:510)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:473)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:140)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:56)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
        at java.lang.Thread.run(Thread.java:745)
2017-10-25 20:03:07,576 [AsyncDispatcher event handler] INFO 
nodemanager.ContainerExecutor: Using command stop 
'container_e126_1508911755032_0004_02_000001' 
2017-10-25 20:03:07,576 [AsyncDispatcher event handler] WARN 
launcher.ContainerLaunch: Exception when trying to cleanup container 
container_e126_1508911755032_0004_02_000001: java.io.IOException: Problem 
signalling container 29030 with SIGTERM; output: Using command stop 
'container_e126_1508911755032_0004_02_000001' 
 and exitCode: 1
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:521)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:473)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:140)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:56)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
        at java.lang.Thread.run(Thread.java:745)
Caused by: 
org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
 Signal container failed
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.signalContainer(DockerLinuxContainerRuntime.java:615)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:510)
        ... 6 more
{noformat}

> NM fails to successfully kill tasks that run over their memory limit
> --------------------------------------------------------------------
>
>                 Key: YARN-7395
>                 URL: https://issues.apache.org/jira/browse/YARN-7395
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>            Reporter: Eric Badger
>
> The NM correctly notes that the container is over its configured limit, but 
> then fails to successfully kill the process. So the Docker container AM stays 
> around and the job keeps running



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to