Thank you both. Looks like with the upcoming Flink 1.18 release, we should build an auto-scaler service to monitor the job and properly adjust the allocated resource from YARN.
Leon On Wed, Jun 28, 2023 at 9:08 AM Madan D <madan_de...@yahoo.com.au> wrote: > Hello Leon, > > As described by Chen below Adaptive Scheduler doesn't perform auto scale a > Flink Job other than allocating the requested slots based on availability. > Recently we implemented this with EMR managed scaling by combining adaptive > scheduler since there's no direct support of auto scaling on yarn at Flink. > > If you are running an application on infrastructure similar to AWS EMR, > you can use scaling policies to scale up the cluster and scale down the > cluster based on requested slots but it wont really work with incoming > traffic since there's no way of adjusting flink parallelism based on > incoming traffic. > > > Regards, > Madan > > > On Wednesday, 28 June 2023 at 08:43:22 am GMT-7, Chen Zhanghao < > zhanghao.c...@outlook.com> wrote: > > > Hi Leon, > > Adaptive scheduler alone cannot autoscale a Flink job. It simply adjusts > the parallelism of a job based on available slots [1]. To autoscale a job, > we further need a policy to suggest the recommended resources for the job > and a mechanism to adjust the allocated resources of the job (aka. available > slots). For K8s standalone application mode, we can use reactive mode > coupled with K8s HPA, where HPA collects pod metrics and autoscales the > number of TMs, and adaptive scheduler rescales job according to the available > slots. For YARN application mode, reactive mode is not available. However, > in the coming 1.18 release, we can declare the desired resources through > REST API to adjust the allocated resources of the job via FLIP-291 [2], > but you still need a policy to suggest the recommended resources for the > job and call the API, which you can refer to the autoscaler implemention in > Flink K8s operator. > > [1] Elastic Scaling | Apache Flink > <https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/elastic_scaling/#adaptive-scheduler> > [2] FLIP-291: Externalized Declarative Resource Management - Apache Flink > - Apache Software Foundation > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management> > [3] Autoscaler | Apache Flink Kubernetes Operator > <https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.5/docs/custom-resource/autoscaler/> > > Best, > Zhanghao Chen > ------------------------------ > *发件人:* Leon Xu <l...@attentivemobile.com> > *发送时间:* 2023年6月27日 13:41 > *收件人:* user <user@flink.apache.org> > *主题:* Questions regarding adaptive scheduler with YARN and application > mode > > Hi Flink users, > > I am trying to use Adaptive Scheduler to auto scale our Flink streaming > jobs (NOT batch job). Our jobs are running on YARN with application mode. > There isn't much doc around how adaptive scheduler works. So I have some > questions: > > > 1. How does Adaptive Scheduler work with YARN/Application mode? If the > scheduler decides to request more tasks will it trigger the request to YARN > while the job is already running > > 2. What's the evaluation criteria to trigger a scale-up ? Is it > possible to manually trigger a scale-up for testing purposes? > > > Thanks >