Zhilong Hong created FLINK-22017:
------------------------------------

             Summary: Regions may never be scheduled when there are 
cross-region blocking edges
                 Key: FLINK-22017
                 URL: https://issues.apache.org/jira/browse/FLINK-22017
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Coordination
    Affects Versions: 1.13.0
            Reporter: Zhilong Hong
         Attachments: Illustration.jpg

For the topology with cross-region blocking edges, there are regions that may 
never be scheduled. The case is illustrated in the figure below.

!Illustration.jpg!

Let's denote the vertices with layer_number. It's clear that the edge connects 
v2_2 and v3_2 crosses region 1 and region 2. Since region 1 has no blocking 
edges connected to other regions, it will be scheduled first. When vertex2_2 is 
finished, PipelinedRegionSchedulingStrategy will trigger 
{{onExecutionStateChange}} for it.

As expected, region 2 will be scheduled since all its consumer partitions are 
consumable. But in fact region 2 won't be scheduled, because the result 
partition of vertex2_2 is not tagged as consumable. Whether it is consumable or 
not is determined by its IntermediateDataSet.

However, an IntermediateDataSet is consumable if and only if all the producers 
of its IntermediateResultPartitions are finished. This IntermediateDataSet will 
never be consumable since vertex2_3 is not scheduled. All in all, this forms a 
deadlock that a region will never be scheduled because it's not scheduled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to