Hi,

You containers got killed by YARN for exceeding virtual memory limits. For
some reason your container intensively allocate virtual memory while having
free physical memory.

There are some gotchas regarding such issue on CentOS, caused by
OS-specific aggressive virtual memory allocation: [1], [2]. They disable
YARN virtual memory checker to work around that.

Also in this mailing list people recently reported that high virtual memory
consumption may be caused by some libraries.

Links:
[1]
http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/,
section "Killing of Tasks Due to Virtual Memory Usage"
[2] https://www.mapr.com/blog/best-practices-yarn-resource-management,
section "3. Virtual/physical memory checker".

Regards,
Yury

2017-01-05 11:54 GMT+03:00 Sachin Goel <sachingoel0...@gmail.com>:

> Hey!
>
> I'm running locally under this configuration(copied from nodemanager logs):
> physical-memory=8192 virtual-memory=17204 virtual-cores=8
>
> Before starting a flink deployment, memory usage stats show 3.7 GB used on
> system, indicating lots of free memory for flink containers.
> However, after I submit using minimal resource requirements,
> ./yarn-session.sh -n 1 -tm 768, the cluster deploys successfully but then
> every application on system receives a sigterm and it basically kills the
> current user session, logging out of the system.
>
> The job manager and task manager logs contain just the information that a
> SIGTERM was received and shut down gracefully.
> All yarn and dfs process contain the log information showing the receipt
> of a sigterm.
>
> Here's the relevant log from nodemanager:
>
> 2017-01-05 17:00:06,089 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1483603191971_0002_01_000002 transitioned from LOCALIZED 
> to RUNNING
> 2017-01-05 17:00:06,092 INFO 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
> launchContainer: [bash, 
> /opt/hadoop-2.7.3/tmp/nm-local-dir/usercache/kirk/appcache/application_1483603191971_0002/container_1483603191971_0002_01_000002/default_container_executor.sh]
> 2017-01-05 17:00:08,731 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Starting resource-monitoring for container_1483603191971_0002_01_000002
> 2017-01-05 17:00:08,744 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Memory usage of ProcessTree 17872 for container-id 
> container_1483603191971_0002_01_000001: 282.7 MB of 1 GB physical memory 
> used; 2.1 GB of 2.1 GB virtual memory used
> 2017-01-05 17:00:08,744 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Process tree for container: container_1483603191971_0002_01_000001 has 
> processes older than 1 iteration running over the configured limit. 
> Limit=2254857728, current usage = 2255896576
> 2017-01-05 17:00:08,745 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Container [pid=17872,containerID=container_1483603191971_0002_01_000001] is 
> running beyond virtual memory limits. Current usage: 282.7 MB of 1 GB 
> physical memory used; 2.1 GB of 2.1 GB virtual memory used. Killing container.
> Dump of the process-tree for container_1483603191971_0002_01_000001 :
>       |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>       |- 17872 17870 17872 17872 (bash) 0 0 21409792 812 /bin/bash -c 
> /usr/lib/jvm/java-8-openjdk-amd64//bin/java -Xmx424M  
> -Dlog.file=/opt/hadoop-2.7.3/logs/userlogs/application_1483603191971_0002/container_1483603191971_0002_01_000001/jobmanager.log
>  -Dlogback.configurationFile=file:logback.xml 
> -Dlog4j.configuration=file:log4j.properties 
> org.apache.flink.yarn.YarnApplicationMasterRunner  
> 1>/opt/hadoop-2.7.3/logs/userlogs/application_1483603191971_0002/container_1483603191971_0002_01_000001/jobmanager.out
>  
> 2>/opt/hadoop-2.7.3/logs/userlogs/application_1483603191971_0002/container_1483603191971_0002_01_000001/jobmanager.err
>       |- 17879 17872 17872 17872 (java) 748 20 2234486784 71553 
> /usr/lib/jvm/java-8-openjdk-amd64//bin/java -Xmx424M 
> -Dlog.file=/opt/hadoop-2.7.3/logs/userlogs/application_1483603191971_0002/container_1483603191971_0002_01_000001/jobmanager.log
>  -Dlogback.configurationFile=file:logback.xml 
> -Dlog4j.configuration=file:log4j.properties 
> org.apache.flink.yarn.YarnApplicationMasterRunner
>
> 2017-01-05 17:00:08,745 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Removed ProcessTree with root 17872
> 2017-01-05 17:00:08,746 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1483603191971_0002_01_000001 transitioned from RUNNING 
> to KILLING
> 2017-01-05 17:00:08,746 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_1483603191971_0002_01_000001
> 2017-01-05 17:00:08,779 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: 
> SIGTERM
> 2017-01-05 17:00:08,822 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
> from container container_1483603191971_0002_01_000001 is : 143
> 2017-01-05 17:00:08,825 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
> from container container_1483603191971_0002_01_000002 is : 143
>
>
> Is the memory available on my pc not enough or are there any known issues
> which might lead to this?
>
> Also, this doesn't occur every time I start a flink session.
>
> Thanks
> Sachin
>

Reply via email to