Re: YARN terminating TaskNode

Maximilian Michels Mon, 25 Apr 2016 03:46:03 -0700

Hi Timur,

Which version of Flink are you using? Could you share the entire logs?


Thanks,
Max

On Mon, Apr 25, 2016 at 12:05 PM, Robert Metzger <rmetz...@apache.org> wrote:
> Hi Timur,
>
> The reason why we only allocate 570mb for the heap is because you are
> allocating most of the memory as off heap (direct byte buffers).
>
> In theory, the memory footprint of the JVM is limited to 570 (heap) + 1900
> (direct mem) = 2470 MB (which is below 2500). But in practice thje JVM is
> allocating more memory, causing these killings by YARN.
>
> I have to check the code of Flink again, because I would expect the safety
> boundary to be much larger than 30 mb.
>
> Regards,
> Robert
>
>
> On Fri, Apr 22, 2016 at 9:47 PM, Timur Fayruzov <timur.fairu...@gmail.com>
> wrote:
>>
>> Hello,
>>
>> Next issue in a string of things I'm solving is that my application fails
>> with the message 'Connection unexpectedly closed by remote task manager'.
>>
>> Yarn log shows the following:
>>
>> Container [pid=4102,containerID=container_1461341357870_0004_01_000015] is
>> running beyond physical memory limits. Current usage: 2.5 GB of 2.5 GB
>> physical memory used; 9.0 GB of 12.3 GB virtual memory used. Killing
>> container.
>> Dump of the process-tree for container_1461341357870_0004_01_000015 :
>>         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
>> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>>         |- 4102 4100 4102 4102 (bash) 1 7 115806208 715 /bin/bash -c
>> /usr/lib/jvm/java-1.8.0/bin/java -Xms570m -Xmx570m
>> -XX:MaxDirectMemorySize=1900m
>> -Dlog.file=/var/log/hadoop-yarn/containers/application_1461341357870_0004/container_1461341357870_0004_01_000015/taskmanager.log
>> -Dlogback.configurationFile=file:logback.xml
>> -Dlog4j.configuration=file:log4j.properties
>> org.apache.flink.yarn.YarnTaskManagerRunner --configDir . 1>
>> /var/log/hadoop-yarn/containers/application_1461341357870_0004/container_1461341357870_0004_01_000015/taskmanager.out
>> 2>
>> /var/log/hadoop-yarn/containers/application_1461341357870_0004/container_1461341357870_0004_01_000015/taskmanager.err
>>         |- 4306 4102 4102 4102 (java) 172258 40265 9495257088 646460
>> /usr/lib/jvm/java-1.8.0/bin/java -Xms570m -Xmx570m
>> -XX:MaxDirectMemorySize=1900m
>> -Dlog.file=/var/log/hadoop-yarn/containers/application_1461341357870_0004/container_1461341357870_0004_01_000015/taskmanager.log
>> -Dlogback.configurationFile=file:logback.xml
>> -Dlog4j.configuration=file:log4j.properties
>> org.apache.flink.yarn.YarnTaskManagerRunner --configDir .
>>
>> One thing that drew my attention is `-Xmx570m`. I expected it to be
>> TaskManagerMemory*0.75 (due to yarn.heap-cutoff-ratio). I run the
>> application as follows:
>> HADOOP_CONF_DIR=/etc/hadoop/conf flink run -m yarn-cluster -yn 18 -yjm
>> 4096 -ytm 2500 eval-assembly-1.0.jar
>>
>> In flink logs I do see 'Task Manager memory: 2500'. When I look at the
>> yarn container logs on the cluster node I see that it starts with 570mb,
>> which puzzles me. When I look at the actually allocated memory for a Yarn
>> container using 'top' I see 2.2GB used. Am I interpreting these parameters
>> correctly?
>>
>> I also have set (it failed in the same way without this as well):
>> taskmanager.memory.off-heap: true
>>
>> Also, I don't understand why this happens at all. I assumed that Flink
>> won't overcommit allocated resources and will spill to the disk when running
>> out of heap memory. Appreciate if someone can shed light on this too.
>>
>> Thanks,
>> Timur
>
>

Re: YARN terminating TaskNode

Reply via email to