有时候这种job持续2个多小时,我只能cancel job,但无法正常 cancel,都会导致 taskmanager 挂掉,错误如下
2021-01-31 23:04:23,677 ERROR
org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Task did
not exit gracefully within 180 + seconds.
org.apache.flink.util.FlinkRuntimeException: Task did not exit gracefully
within 18
打开了 debug 级别的日志,有这样的错误
2021-01-31 20:45:30,364 DEBUG
org.apache.flink.runtime.io.network.partition.ResultPartitionManager [] -
Released partition dc8a2804b6df6b0ceaee2610ccf6c6e5#312 produced by
448c5ac36dcda818f56ec5bbd728da10.
2021-01-31 20:45:30,392 DEBUG
org.apache.flink.runtime.taskexecutor.T
周期性batch mode 从 hive 提取数据插入 mysql,每批次 10K 到 20K 行数据,多数情况下
10-20秒可以完成,但不定期就会很长时间,能达到 20多分钟,但也能成功,查看了日志也看不到错误,检查 mysql 也没有发现锁表,怀疑 hive
metastore 的性能,但也没看出问题。
请教分析思路,从 flink 上能看出job 在等待什么吗?
--
Sent from: http://apache-flink.147419.n8.nabble.com/