Hello Leon, As described by Chen below Adaptive Scheduler doesn't perform auto scale a Flink Job other than allocating the requested slots based on availability. Recently we implemented this with EMR managed scaling by combining adaptive scheduler since there's no direct support of auto scaling on yarn at Flink. If you are running an application on infrastructure similar to AWS EMR, you can use scaling policies to scale up the cluster and scale down the cluster based on requested slots but it wont really work with incoming traffic since there's no way of adjusting flink parallelism based on incoming traffic.
Regards,Madan On Wednesday, 28 June 2023 at 08:43:22 am GMT-7, Chen Zhanghao <zhanghao.c...@outlook.com> wrote: Hi Leon, Adaptive scheduler alone cannot autoscale a Flink job. It simply adjusts the parallelism of a job based on available slots [1]. To autoscale a job, we further need a policy to suggest the recommended resources for the job and a mechanism to adjust the allocated resources of the job (aka. available slots).For K8s standalone application mode, we can use reactive mode coupled with K8s HPA, where HPA collects pod metrics and autoscales the number of TMs, and adaptive scheduler rescales job according to the available slots. For YARN application mode, reactive mode is not available. However, in the coming 1.18 release, we can declare the desired resources through REST API to adjust the allocated resources of the job via FLIP-291 [2], but you still need a policy to suggest the recommended resources for the job and call the API, which you can refer to the autoscaler implemention in Flink K8s operator. [1] Elastic Scaling | Apache Flink[2] FLIP-291: Externalized Declarative Resource Management - Apache Flink - Apache Software Foundation[3] Autoscaler | Apache Flink Kubernetes Operator Best,Zhanghao Chen发件人: Leon Xu <l...@attentivemobile.com> 发送时间: 2023年6月27日 13:41 收件人: user <user@flink.apache.org> 主题: Questions regarding adaptive scheduler with YARN and application mode Hi Flink users, I am trying to use Adaptive Scheduler to auto scale our Flink streaming jobs (NOT batch job). Our jobs are running on YARN with application mode. There isn't much doc around how adaptive scheduler works. So I have some questions: - How does Adaptive Scheduler work with YARN/Application mode? If the scheduler decides to request more tasks will it trigger the request to YARN while the job is already running - What's the evaluation criteria to trigger a scale-up ? Is it possible to manually trigger a scale-up for testing purposes? Thanks