Hi Leon, Adaptive scheduler alone cannot autoscale a Flink job. It simply adjusts the parallelism of a job based on available slots [1]. To autoscale a job, we further need a policy to suggest the recommended resources for the job and a mechanism to adjust the allocated resources of the job (aka. available slots). For K8s standalone application mode, we can use reactive mode coupled with K8s HPA, where HPA collects pod metrics and autoscales the number of TMs, and adaptive scheduler rescales job according to the available slots. For YARN application mode, reactive mode is not available. However, in the coming 1.18 release, we can declare the desired resources through REST API to adjust the allocated resources of the job via FLIP-291 [2], but you still need a policy to suggest the recommended resources for the job and call the API, which you can refer to the autoscaler implemention in Flink K8s operator.
[1] Elastic Scaling | Apache Flink<https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/elastic_scaling/#adaptive-scheduler> [2] FLIP-291: Externalized Declarative Resource Management - Apache Flink - Apache Software Foundation<https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management> [3] Autoscaler | Apache Flink Kubernetes Operator<https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.5/docs/custom-resource/autoscaler/> Best, Zhanghao Chen ________________________________ 发件人: Leon Xu <l...@attentivemobile.com> 发送时间: 2023年6月27日 13:41 收件人: user <user@flink.apache.org> 主题: Questions regarding adaptive scheduler with YARN and application mode Hi Flink users, I am trying to use Adaptive Scheduler to auto scale our Flink streaming jobs (NOT batch job). Our jobs are running on YARN with application mode. There isn't much doc around how adaptive scheduler works. So I have some questions: 1. How does Adaptive Scheduler work with YARN/Application mode? If the scheduler decides to request more tasks will it trigger the request to YARN while the job is already running 2. What's the evaluation criteria to trigger a scale-up ? Is it possible to manually trigger a scale-up for testing purposes? Thanks