有一个线上flink作业在人为主动创建保存点时失败,作业有两个算子:从kafka读取数据和写到mongodb,都是48个并行度,出错后查看到写mongodb算子一共48个task,完成了45个,还有3个tasks超时(超时时长设为3分钟),正常情况下完成一次checkpoint要4秒,状态大小只有23.7kb。出错后,查看作业日志如下。在创建保存点失败后作业周期性的检查点生成也都失败了(每个算子各有3个tasks超时)。使用的是FileStateBackend,DFS用的是阿里云oss。请问出错会是因为什么原因造成的?
+5 [2022-08-29 15:38:32] content: 2022-08-29 15:38:32,617 ERROR org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerStdoutFileHandler [] - Failed to transfer file from TaskExecutor sqrc-session-prod-taskmanager-1-30. +6 [2022-08-29 15:38:32] content: java.util.concurrent.CompletionException: org.apache.flink.util.FlinkException: The file STDOUT does not exist on the TaskExecutor. +7 [2022-08-29 15:38:32] content: at org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$requestFileUploadByFilePath$24(TaskExecutor.java:2064) ~[flink-dist_2.12-1.13.2.jar:1.13.2] +8 [2022-08-29 15:38:32] content: at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_312] +9 [2022-08-29 15:38:32] content: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_312] +10 [2022-08-29 15:38:32] content: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_312] +11 [2022-08-29 15:38:32] content: at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_312] +12 [2022-08-29 15:38:32] content: Caused by: org.apache.flink.util.FlinkException: The file STDOUT does not exist on the TaskExecutor. +13 [2022-08-29 15:38:32] content: ... 5 more +14 [2022-08-29 15:38:32] content: 2022-08-29 15:38:32,617 ERROR org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerStdoutFileHandler [] - Unhandled exception. +15 [2022-08-29 15:38:32] content: org.apache.flink.util.FlinkException: The file STDOUT does not exist on the TaskExecutor. +16 [2022-08-29 15:38:32] content: at org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$requestFileUploadByFilePath$24(TaskExecutor.java:2064) ~[flink-dist_2.12-1.13.2.jar:1.13.2] +17 [2022-08-29 15:38:32] content: at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_312] +18 [2022-08-29 15:38:32] content: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_312] +19 [2022-08-29 15:38:32] content: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_312] +20 [2022-08-29 15:38:32] content: at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_312]