我有一个sql job,跑的任务是双流jion,状态保留12 –
24小时,checkpoint是正常的,状态大小在300M到4G之间,当手动触发savepoint时,容器会被杀死,原因是超出内存限制(申请的内存是单slot
5G)。
我想问的是,rocksdb,在savepiont时,是把所有的磁盘状态读入内存,然后再全量快照?
如果是这样,后续版本有没有优化?不然每次磁盘状态超过托管内存,一手动savepoint,job就会被杀死。
下面是报错信息。
2020-12-10 09:18:50
java.lang.Exception: Container
[pid=33290,containerID=container_e47_1594105654926_6890682_01_000002] is
running beyond physical memory limits. Current usage: 5.1 GB of 5 GB physical
memory used; 7.4 GB of 10.5 GB virtual memory used. Killing container.
Dump of the process-tree for container_e47_1594105654926_6890682_01_000002 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 33334 33290 33290 33290 (java) 787940 76501 7842340864 1337121
/usr/java/default/bin/java -Xmx1234802980 -Xms1234802980
-XX:MaxDirectMemorySize=590558009 -XX:MaxMetaspaceSize=268435456
-Dlog.file=/data6/yarn/container-logs/application_1594105654926_6890682/container_e47_1594105654926_6890682_01_000002/taskmanager.log
-Dlog4j.configuration=file:./log4j.properties
-Dlog4j.configurationFile=file:./log4j.properties
org.apache.flink.yarn.YarnTaskExecutorRunner -D
taskmanager.memory.framework.off-heap.size=134217728b -D
taskmanager.memory.network.max=456340281b -D
taskmanager.memory.network.min=456340281b -D
taskmanager.memory.framework.heap.size=134217728b -D
taskmanager.memory.managed.size=2738041755b -D taskmanager.cpu.cores=5.0 -D
taskmanager.memory.task.heap.size=1100585252b -D
taskmanager.memory.task.off-heap.size=0b --configDir .
-Djobmanager.rpc.address=shd177.yonghui.cn -Dpipeline.classpaths= -Dweb.port=0
-Dexecution.target=embedded
-Dweb.tmpdir=/tmp/flink-web-c415ad8e-c019-4398-869d-7c9e540c2479
-Djobmanager.rpc.port=44058
-Dpipeline.jars=file:/data1/yarn/nm/usercache/xiebo/appcache/application_1594105654926_6890682/container_e47_1594105654926_6890682_01_000001/yh-datacenter-platform-flink-1.0.0.jar
-Drest.address=shd177.yonghui.cn
-Dsecurity.kerberos.login.keytab=/data1/yarn/nm/usercache/xiebo/appcache/application_1594105654926_6890682/container_e47_1594105654926_6890682_01_000001/krb5.keytab
|- 33290 33288 33290 33290 (bash) 0 0 108679168 318 /bin/bash -c
/usr/java/default/bin/java -Xmx1234802980 -Xms1234802980
-XX:MaxDirectMemorySize=590558009 -XX:MaxMetaspaceSize=268435456
-Dlog.file=/data6/yarn/container-logs/application_1594105654926_6890682/container_e47_1594105654926_6890682_01_000002/taskmanager.log
-Dlog4j.configuration=file:./log4j.properties
-Dlog4j.configurationFile=file:./log4j.properties
org.apache.flink.yarn.YarnTaskExecutorRunner -D
taskmanager.memory.framework.off-heap.size=134217728b -D
taskmanager.memory.network.max=456340281b -D
taskmanager.memory.network.min=456340281b -D
taskmanager.memory.framework.heap.size=134217728b -D
taskmanager.memory.managed.size=2738041755b -D taskmanager.cpu.cores=5.0 -D
taskmanager.memory.task.heap.size=1100585252b -D
taskmanager.memory.task.off-heap.size=0b --configDir .
-Djobmanager.rpc.address='shd177.yonghui.cn' -Dpipeline.classpaths=''
-Dweb.port='0' -Dexecution.target='embedded'
-Dweb.tmpdir='/tmp/flink-web-c415ad8e-c019-4398-869d-7c9e540c2479'
-Djobmanager.rpc.port='44058'
-Dpipeline.jars='file:/data1/yarn/nm/usercache/xiebo/appcache/application_1594105654926_6890682/container_e47_1594105654926_6890682_01_000001/yh-datacenter-platform-flink-1.0.0.jar'
-Drest.address='shd177.yonghui.cn'
-Dsecurity.kerberos.login.keytab='/data1/yarn/nm/usercache/xiebo/appcache/application_1594105654926_6890682/container_e47_1594105654926_6890682_01_000001/krb5.keytab'
1>
/data6/yarn/container-logs/application_1594105654926_6890682/container_e47_1594105654926_6890682_01_000002/taskmanager.out
2>
/data6/yarn/container-logs/application_1594105654926_6890682/container_e47_1594105654926_6890682_01_000002/taskmanager.err
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
发送自 Windows 10 版邮件应用
--
Name:谢波
Mobile:13764228893