Hi Ufuk,

Do you have any quick idea what could cause this problems in flink 1.4.2?
Seems like one operator takes too long to deploy and downstream tasks error
out on partition not found. This only seems to happen when the job is
restored from state and in fact that operator has some keyed and operator
state as well.

Deploying the same job from empty state works well. We tried increasing
the taskmanager.network.request-backoff.max that didnt help.

It would be great if you have some pointers where to look further, I havent
seen this happening before.

Thank you!
Gyula

The errror:
org.apache.flink.runtime.io.network.partition.: Partition
4c5e9cd5dd410331103f51127996068a@b35ef4ffe25e3d17c5d6051ebe2860cd not found.
    at
org.apache.flink.runtime.io.network.partition.ResultPartitionManager.createSubpartitionView(ResultPartitionManager.java:77)
    at
org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.requestSubpartition(LocalInputChannel.java:115)
    at
org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel$1.run(LocalInputChannel.java:159)
    at java.util.TimerThread.mainLoop(Timer.java:555)
    at java.util.TimerThread.run(Timer.java:505)

Reply via email to