Hi Kamil,
When the producer receives the PartitionRequest from downstream task,
first it will check whether the requested partition is already registered. If
not, it will reponse PartitionNotFoundException.And the upstream task is
submitted and begins to run, it will registered all its partitions into
ResultPartitionManager. So your case is that the partition request is arrived
before the partition registration.Maybe the upstream task is submitted delay by
JobManager or some logics delay before register task in NetworkEnvironment. You
can debug the specific status in upstream when response the PartitionNotFound
to track the reason. Wish your further findings!
Cheers,Zhijiang
------------------------------------------------------------------发件人:Kamil
Dziublinski <kamil.dziublin...@gmail.com>发送时间:2017年4月4日(星期二) 17:20收件人:user
<user@flink.apache.org>主 题:PartitionNotFoundException on deploying streaming job
Hi guys,
When I run my streaming job I almost always have initially
PartitionNotFoundException. Job fails, after that restarts and it runs ok.I
wonder what is causing that and if I can adjust some parameters to not have
this initial failure.
I have flink session on yarn with 55 task managers. 4 cores and 4gb per TM.This
setup is using 77% of my yarn cluster.
Any ideas?
Thanks,Kamil.