Re:Re: Flink 1.10 on Yarn

chenkaibit Thu, 06 Aug 2020 22:45:14 -0700

hi xuhaiLong,看日志发生的 checkpoint nullpointer 是个已知的问题，具体可以查看下面两个jira。
你用的jdk版本是多少呢？目前发现使用 jdk8_40/jdk8_60 + flink-1.10 会出现 checkpoint 
nullpointer，可以把jdk升级下版本试一下
https://issues.apache.org/jira/browse/FLINK-18196
https://issues.apache.org/jira/browse/FLINK-17479





在 2020-08-07 12:50:23，"xuhaiLong" <xiagu...@163.com> 写道：

sorry，我添加错附件了


是的，taskmanager.memory.jvm-metaspace.size 为默认配置
On 8/7/2020 11:43，Yangze Guo<karma...@gmail.com> wrote：
日志没有贴成功，taskmanager.memory.jvm-metaspace.size目前是默认配置么？

Best,
Yangze Guo

On Fri, Aug 7, 2020 at 11:38 AM xuhaiLong <xiagu...@163.com> wrote:



Hi


场景：1 tm 三个slot，run了三个job


三个job 运行的时候 出现了 ck 过程中空指针异常，导致任务一致重启。最终导致`Metaspace` 空间占满，出现 
`java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error has 
occurred. This can mean two things: either the job requires a larger size of 
JVM metaspace to load classes or there is a class loading leak. In the first 
case 'taskmanager.memory.jvm-metaspace.size' configuration option should be 
increased. If the error persists (usually in cluster after several job 
(re-)submissions) then there is probably a class loading leak which has to be 
investigated and fixed. The task executor has to be shutdown...
`


附件为部分异常信息


疑问：
1. 为什么会在 ck 时候出现空指针？（三个 job 为同一个 kafka topic，通过ck 恢复 job 可以正常运行，应该不是数据的问题）
2. 通过日志看，是可以重启的，为什么自动重启后还存在这个问题，导致一直重启？


感谢~~~
从网易邮箱大师发来的云附件
08-07error.txt(730.4KB,2020年8月22日 11:37 到期)
下载

Re:Re: Flink 1.10 on Yarn

回复