Hi,

有可能是堆外内存超用,可以参考最近中文社区的一篇投稿 《详解 Flink 容器化环境下的 OOM Killed》进行修改,建议先增大 jvm-overhead 
相关配置

[1] 
https://mp.weixin.qq.com/s?__biz=MzU3Mzg4OTMyNQ==&mid=2247490197&idx=1&sn=b0893a9bf12fbcae76852a156302de95

祝好
唐云
________________________________
From: Yang Peng <yangpengklf...@gmail.com>
Sent: Thursday, January 7, 2021 12:24
To: user-zh <user-zh@flink.apache.org>
Subject: Flink 1.11.2版本 实时任务运行 报错 is running beyond physical memory limits. 
Current usage: 25.0 GB of 25 GB physical memory used; 28.3 GB of 52.5 GB 
virtual memory used. Killing container

Hi,

 
大家好,咨询一个问题,我们有个实时任务运行在Flink1.11.2版本,使用rocksdbstatebackend,最近报警出现了物理内存超限被kill的异常信息,我们查看了监控taskmanager
heap使用量没有超限,direct内存使用量也维持在一个平稳的范围内没有超限,也没有报oom,这种情况是非堆内存异常是吗?完整报错信息如下:

Dump of the process-tree for container_e06_1603181034156_0137_01_000002 :
        |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES)
FULL_CMD_LINE
        |- 180421 180362 180362 180362 (java) 258262921 59979106 30306209792
6553277 /usr/jdk64/jdk1.8.0_152/bin/java -XX:+UseSerialGC
-Xmx11542724608 -Xms11542724608 -XX:MaxDirectMemorySize=1207959552
-XX:MaxMetaspaceSize=268435456
-Dlog.file=/mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.log
-Dlog4j.configuration=file:./log4j.properties
-Dlog4j.configurationFile=file:./log4j.properties
org.apache.flink.yarn.YarnTaskExecutorRunner -D
taskmanager.memory.framework.off-heap.size=134217728b -D
taskmanager.memory.network.max=1073741824b -D
taskmanager.memory.network.min=1073741824b -D
taskmanager.memory.framework.heap.size=134217728b -D
taskmanager.memory.managed.size=12750684160b -D
taskmanager.cpu.cores=1.0 -D
taskmanager.memory.task.heap.size=11408506880b -D
taskmanager.memory.task.off-heap.size=0b --configDir .
-Djobmanager.rpc.address=flink-cm8.jd.163.org -Dweb.port=0
-Dweb.tmpdir=/tmp/flink-web-9197a884-03b9-4865-a0a0-0b6a1c295f2c
-Djobmanager.rpc.port=33656 -Drest.address=flink-cm8.jd.163.org
-Dsecurity.kerberos.login.keytab=/mnt/ssd/3/yarn/local/usercache/portal/appcache/application_1603181034156_0137/container_e06_1603181034156_0137_01_000001/krb5.keytab
        |- 180362 180360 180362 180362 (bash) 0 2 116011008 353 /bin/bash -c
/usr/jdk64/jdk1.8.0_152/bin/java -XX:+UseSerialGC -Xmx11542724608
-Xms11542724608 -XX:MaxDirectMemorySize=1207959552
-XX:MaxMetaspaceSize=268435456
-Dlog.file=/mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.log
-Dlog4j.configuration=file:./log4j.properties
-Dlog4j.configurationFile=file:./log4j.properties
org.apache.flink.yarn.YarnTaskExecutorRunner -D
taskmanager.memory.framework.off-heap.size=134217728b -D
taskmanager.memory.network.max=1073741824b -D
taskmanager.memory.network.min=1073741824b -D
taskmanager.memory.framework.heap.size=134217728b -D
taskmanager.memory.managed.size=12750684160b -D
taskmanager.cpu.cores=1.0 -D
taskmanager.memory.task.heap.size=11408506880b -D
taskmanager.memory.task.off-heap.size=0b --configDir .
-Djobmanager.rpc.address='flink-cm8.jd.163.org' -Dweb.port='0'
-Dweb.tmpdir='/tmp/flink-web-9197a884-03b9-4865-a0a0-0b6a1c295f2c'
-Djobmanager.rpc.port='33656' -Drest.address='flink-cm8.jd.163.org'
-Dsecurity.kerberos.login.keytab='/mnt/ssd/3/yarn/local/usercache/portal/appcache/application_1603181034156_0137/container_e06_1603181034156_0137_01_000001/krb5.keytab'
1> 
/mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.out
2> 
/mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.err

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

2021-01-07 11:51:00,781 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
Source: 银河SDK原始日志 (18/90) (51ac2f29df472d001ce9b4307636ac1c) switched
from RUNNING to FAILED on
org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@1aad00fa.
java.lang.Exception: Container
[pid=180362,containerID=container_e06_1603181034156_0137_01_000002] is
running beyond physical memory limits. Current usage: 25.0 GB of 25 GB
physical memory used; 28.3 GB of 52.5 GB virtual memory used. Killing
container.
Dump of the process-tree for container_e06_1603181034156_0137_01_000002 :
        |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES)
FULL_CMD_LINE
        |- 180421 180362 180362 180362 (java) 258262921 59979106 30306209792
6553277 /usr/jdk64/jdk1.8.0_152/bin/java -XX:+UseSerialGC
-Xmx11542724608 -Xms11542724608 -XX:MaxDirectMemorySize=1207959552
-XX:MaxMetaspaceSize=268435456
-Dlog.file=/mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.log
-Dlog4j.configuration=file:./log4j.properties
-Dlog4j.configurationFile=file:./log4j.properties
org.apache.flink.yarn.YarnTaskExecutorRunner -D
taskmanager.memory.framework.off-heap.size=134217728b -D
taskmanager.memory.network.max=1073741824b -D
taskmanager.memory.network.min=1073741824b -D
taskmanager.memory.framework.heap.size=134217728b -D
taskmanager.memory.managed.size=12750684160b -D
taskmanager.cpu.cores=1.0 -D
taskmanager.memory.task.heap.size=11408506880b -D
taskmanager.memory.task.off-heap.size=0b --configDir .
-Djobmanager.rpc.address=flink-cm8.jd.163.org -Dweb.port=0
-Dweb.tmpdir=/tmp/flink-web-9197a884-03b9-4865-a0a0-0b6a1c295f2c
-Djobmanager.rpc.port=33656 -Drest.address=flink-cm8.jd.163.org
-Dsecurity.kerberos.login.keytab=/mnt/ssd/3/yarn/local/usercache/portal/appcache/application_1603181034156_0137/container_e06_1603181034156_0137_01_000001/krb5.keytab
        |- 180362 180360 180362 180362 (bash) 0 2 116011008 353 /bin/bash -c
/usr/jdk64/jdk1.8.0_152/bin/java -XX:+UseSerialGC -Xmx11542724608
-Xms11542724608 -XX:MaxDirectMemorySize=1207959552
-XX:MaxMetaspaceSize=268435456
-Dlog.file=/mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.log
-Dlog4j.configuration=file:./log4j.properties
-Dlog4j.configurationFile=file:./log4j.properties
org.apache.flink.yarn.YarnTaskExecutorRunner -D
taskmanager.memory.framework.off-heap.size=134217728b -D
taskmanager.memory.network.max=1073741824b -D
taskmanager.memory.network.min=1073741824b -D
taskmanager.memory.framework.heap.size=134217728b -D
taskmanager.memory.managed.size=12750684160b -D
taskmanager.cpu.cores=1.0 -D
taskmanager.memory.task.heap.size=11408506880b -D
taskmanager.memory.task.off-heap.size=0b --configDir .
-Djobmanager.rpc.address='flink-cm8.jd.163.org' -Dweb.port='0'
-Dweb.tmpdir='/tmp/flink-web-9197a884-03b9-4865-a0a0-0b6a1c295f2c'
-Djobmanager.rpc.port='33656' -Drest.address='flink-cm8.jd.163.org'
-Dsecurity.kerberos.login.keytab='/mnt/ssd/3/yarn/local/usercache/portal/appcache/application_1603181034156_0137/container_e06_1603181034156_0137_01_000001/krb5.keytab'
1> 
/mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.out
2> 
/mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.err

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

回复