[jira] [Closed] (FLINK-32484) AdaptiveScheduler combined restart during scaling out

Gyula Fora (Jira) Wed, 05 Jul 2023 07:15:05 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-32484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gyula Fora closed FLINK-32484.
------------------------------
    Resolution: Duplicate

> AdaptiveScheduler combined restart during scaling out
> -----------------------------------------------------
>
>                 Key: FLINK-32484
>                 URL: https://issues.apache.org/jira/browse/FLINK-32484
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / Core
>    Affects Versions: 1.17.0
>            Reporter: Prabhu Joseph
>            Priority: Major
>
> On a scaling-out operation, when nodes are added at different times, 
> AdaptiveScheduler does multiple restarts within a short period of time. On 
> one of our Flink jobs, we have seen AdaptiveScheduler restart the 
> ExecutionGraph every time there is a notification of new resources to it. 
> There are five restarts within 3 minutes.
> AdaptiveScheduler could provide a configurable restart window interval to the 
> user during which it combines the notified resources and restarts once when 
> the available resources are sufficient to fit the desired parallelism or when 
> the window times out. The window is created during the first notification of 
> resources received. This is applicable only when the execution graph is in 
> the executing state and not in the waiting for resources state.
>  
> {code:java}
> [root@ip-1-2-3-4 container_1688034805200_0002_01_000001]# grep -i scale *
> jobmanager.log:2023-06-29 10:46:58,061 INFO  
> org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler [] - New 
> resources are available. Restarting job to scale up.
> jobmanager.log:2023-06-29 10:47:57,317 INFO  
> org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler [] - New 
> resources are available. Restarting job to scale up.
> jobmanager.log:2023-06-29 10:48:53,314 INFO  
> org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler [] - New 
> resources are available. Restarting job to scale up.
> jobmanager.log:2023-06-29 10:49:27,821 INFO  
> org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler [] - New 
> resources are available. Restarting job to scale up.
> jobmanager.log:2023-06-29 10:50:15,672 INFO  
> org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler [] - New 
> resources are available. Restarting job to scale up.
> [root@ip-1-2-3-4 container_1688034805200_0002_01_000001]# {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-32484) AdaptiveScheduler combined restart during scaling out

Reply via email to