Hi
You can configure the key `task.cancellation.timeout`[1] to increase
the timeout, and the code about this logic is here[2]
[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#task-cancellation-timeout
[2]
Hi,
个人推荐方式二,
1. 部分场景下,有些异常可以自动恢复,任务异常会自动重启,继续运行
2. 告警通知到介入处理,如果是人来介入处理的话,20s通常时间不是问题,到分钟级都可以
3. failure之前调用某个hook去通知相关方,应该是要修改jobmanager的代码,具体就要请教大佬们了。
在 2022-09-30 13:50:56,"casel.chen" 写道:
>当flink作业失败时如何第一时间发通知告警到相关方?现有方式
>方式一:flink作业本身提供的rest
Hi Vararu,
Flink ML has a custom implementation of WindowAssigner, called
EndOfStreamWindows, that might help solve your problem. Please have a check
to see if this meets your requirements.
Hi,
之前的问题还是没有搞定,不过现象更明晰了点,
版本:flink-1.15.1
场景:写hive数据的时候,写完提交分区,会异常
错误日志:
Caused by: java.io.FileNotFoundException:
/tmp/jm_253c182f914fb67750844d2e71864a5a/blobStorage/job_615800b00c211de674f17e46938daeb7/blob_p-a813f094892f1c71b7884d0aec7972edbeae08e3-65d1205985504738577e6a7d90385f17