Hi,看起来这个报错是用于输出信息的文件找不到了,可以尝试加一下这个配置再试一下“taskmanager.log.path”,找一下导致tasks超时的根本原因。
还可以试一下用火焰图或jstack查看一下那几个tasks超时的时候是卡在哪个方法上。










--

    Best!
    Xuyang





Hi,看起来这个报错是用于输出信息的文件找不到了,可以尝试加一下这个配置再试一下“taskmanager.log.path”,找一下导致tasks超时的根本原因。<br/>还可以试一下用火焰图或jstack查看一下那几个tasks超时的时候是卡在哪个方法上。
在 2022-08-29 16:19:15,"casel.chen" <casel_c...@126.com> 写道:
>有一个线上flink作业在人为主动创建保存点时失败,作业有两个算子:从kafka读取数据和写到mongodb,都是48个并行度,出错后查看到写mongodb算子一共48个task,完成了45个,还有3个tasks超时(超时时长设为3分钟),正常情况下完成一次checkpoint要4秒,状态大小只有23.7kb。出错后,查看作业日志如下。在创建保存点失败后作业周期性的检查点生成也都失败了(每个算子各有3个tasks超时)。使用的是FileStateBackend,DFS用的是阿里云oss。请问出错会是因为什么原因造成的?
>
>
>+5
>[2022-08-29 15:38:32]
>content: 
>2022-08-29 15:38:32,617 ERROR 
>org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerStdoutFileHandler 
>[] - Failed to transfer file from TaskExecutor 
>sqrc-session-prod-taskmanager-1-30.
>+6
>[2022-08-29 15:38:32]
>content: 
>java.util.concurrent.CompletionException: 
>org.apache.flink.util.FlinkException: The file STDOUT does not exist on the 
>TaskExecutor.
>+7
>[2022-08-29 15:38:32]
>content: 
>at 
>org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$requestFileUploadByFilePath$24(TaskExecutor.java:2064)
> ~[flink-dist_2.12-1.13.2.jar:1.13.2]
>+8
>[2022-08-29 15:38:32]
>content: 
>at 
>java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
> ~[?:1.8.0_312]
>+9
>[2022-08-29 15:38:32]
>content: 
>at 
>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ~[?:1.8.0_312]
>+10
>[2022-08-29 15:38:32]
>content: 
>at 
>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ~[?:1.8.0_312]
>+11
>[2022-08-29 15:38:32]
>content: 
>at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_312]
>+12
>[2022-08-29 15:38:32]
>content: 
>Caused by: org.apache.flink.util.FlinkException: The file STDOUT does not 
>exist on the TaskExecutor.
>+13
>[2022-08-29 15:38:32]
>content: 
>... 5 more
>+14
>[2022-08-29 15:38:32]
>content: 
>2022-08-29 15:38:32,617 ERROR 
>org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerStdoutFileHandler 
>[] - Unhandled exception.
>+15
>[2022-08-29 15:38:32]
>content: 
>org.apache.flink.util.FlinkException: The file STDOUT does not exist on the 
>TaskExecutor.
>+16
>[2022-08-29 15:38:32]
>content: 
>at 
>org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$requestFileUploadByFilePath$24(TaskExecutor.java:2064)
> ~[flink-dist_2.12-1.13.2.jar:1.13.2]
>+17
>[2022-08-29 15:38:32]
>content: 
>at 
>java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
> ~[?:1.8.0_312]
>+18
>[2022-08-29 15:38:32]
>content: 
>at 
>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ~[?:1.8.0_312]
>+19
>[2022-08-29 15:38:32]
>content: 
>at 
>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ~[?:1.8.0_312]
>+20
>[2022-08-29 15:38:32]
>content: 
>at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_312]

回复