Hi

集群负载比较大的时候,下游一直收不到request的partition,就会导致PartitionNotFoundException,建议增大 
taskmanager.network.request-backoff.max [1][2] 以增大重试次数

[1] 
https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#taskmanager-network-request-backoff-max
[2] https://juejin.cn/post/6844904185347964942#heading-8


祝好
唐云
________________________________
From: 赵一旦 <hinobl...@gmail.com>
Sent: Monday, November 23, 2020 13:08
To: user-zh@flink.apache.org <user-zh@flink.apache.org>
Subject: Re: Flink任务启动偶尔报错PartitionNotFoundException,会自动恢复。

这个报错和kafka没有关系的哈,我大概理解是提交任务的瞬间,jobManager/taskManager机器压力较大,存在机器之间心跳超时什么的?
这个partition应该是指flink运行图中的数据partition,我感觉。没有具体细看,每次提交的瞬间可能遇到这个问题,然后会自动重试成功。

zhisheng <zhisheng2...@gmail.com> 于2020年11月18日周三 下午10:51写道:

> 是不是有 kafka 机器挂了?
>
> Best
> zhisheng
>
> hailongwang <18868816...@163.com> 于2020年11月18日周三 下午5:56写道:
>
> > 感觉还有其它 root cause,可以看下还有其它日志不?
> >
> >
> > Best,
> > Hailong
> >
> > At 2020-11-18 15:52:57, "赵一旦" <hinobl...@gmail.com> wrote:
> > >2020-11-18 16:51:37
> > >org.apache.flink.runtime.io
> .network.partition.PartitionNotFoundException:
> > >Partition
> > b225fa9143dfa179d3a3bd223165d5c5#3@3fee4d51f5a43001ef743f3f15e4cfb2
> > >not found.
> > >    at org.apache.flink.runtime.io.network.partition.consumer.
> > >RemoteInputChannel.failPartitionRequest(RemoteInputChannel.java:267)
> > >    at org.apache.flink.runtime.io.network.partition.consumer.
> >
> >
> >RemoteInputChannel.retriggerSubpartitionRequest(RemoteInputChannel.java:166)
> > >    at org.apache.flink.runtime.io.network.partition.consumer.
> > >SingleInputGate.retriggerPartitionRequest(SingleInputGate.java:521)
> > >    at org.apache.flink.runtime.io.network.partition.consumer.
> >
> >
> >SingleInputGate.lambda$triggerPartitionStateCheck$1(SingleInputGate.java:765
> > >)
> > >    at
> java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture
> > >.java:670)
> > >    at java.util.concurrent.CompletableFuture$UniAccept.tryFire(
> > >CompletableFuture.java:646)
> > >    at java.util.concurrent.CompletableFuture$Completion.run(
> > >CompletableFuture.java:456)
> > >    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> > >    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(
> > >ForkJoinExecutorConfigurator.scala:44)
> > >    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> > >    at
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool
> > >.java:1339)
> > >    at
> > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> > >    at
> > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread
> > >.java:107)
> > >
> > >
> > >请问这是什么问题呢?
> >
>

回复