Thank you both. Looks like with the upcoming Flink 1.18 release, we should
build an auto-scaler service to monitor the job and properly adjust the
allocated resource from YARN.


Leon

On Wed, Jun 28, 2023 at 9:08 AM Madan D <madan_de...@yahoo.com.au> wrote:

> Hello Leon,
>
> As described by Chen below Adaptive Scheduler doesn't perform auto scale a
> Flink Job other than allocating the requested slots based on availability.
> Recently we implemented this with EMR managed scaling by combining adaptive
> scheduler since there's no direct support of auto scaling on yarn at Flink.
>
> If you are running an application on infrastructure similar to AWS EMR,
> you can use scaling policies to scale up the cluster and scale down the
> cluster based on requested slots but it wont really work with incoming
> traffic since there's no way of adjusting flink parallelism based on
> incoming traffic.
>
>
> Regards,
> Madan
>
>
> On Wednesday, 28 June 2023 at 08:43:22 am GMT-7, Chen Zhanghao <
> zhanghao.c...@outlook.com> wrote:
>
>
> Hi Leon,
>
> Adaptive scheduler alone cannot autoscale a Flink job. It simply adjusts
> the parallelism of a job based on available slots [1]. To autoscale a job,
> we further need a policy to suggest the recommended resources for the job
> and a mechanism to adjust the allocated resources of the job (aka. available
> slots). For K8s standalone application mode, we can use reactive mode
> coupled with K8s HPA, where HPA collects pod metrics and autoscales the
> number of TMs, and adaptive scheduler rescales job according to the available
> slots. For YARN application mode, reactive mode is not available. However,
> in the coming 1.18 release, we can declare the desired resources through
> REST API to adjust the allocated resources of the job via FLIP-291 [2],
> but you still need a policy to suggest the recommended resources for the
> job and call the API, which you can refer to the autoscaler implemention in
> Flink K8s operator.
>
> [1] Elastic Scaling | Apache Flink
> <https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/elastic_scaling/#adaptive-scheduler>
> [2] FLIP-291: Externalized Declarative Resource Management - Apache Flink
> - Apache Software Foundation
> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management>
> [3] Autoscaler | Apache Flink Kubernetes Operator
> <https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.5/docs/custom-resource/autoscaler/>
>
> Best,
> Zhanghao Chen
> ------------------------------
> *发件人:* Leon Xu <l...@attentivemobile.com>
> *发送时间:* 2023年6月27日 13:41
> *收件人:* user <user@flink.apache.org>
> *主题:* Questions regarding adaptive scheduler with YARN and application
> mode
>
> Hi Flink users,
>
> I am trying to use Adaptive Scheduler to auto scale our Flink streaming
> jobs (NOT batch job). Our jobs are running on YARN with application mode.
> There isn't much doc around how adaptive scheduler works. So I have some
> questions:
>
>
>    1. How does Adaptive Scheduler work with YARN/Application mode? If the
>    scheduler decides to request more tasks will it trigger the request to YARN
>    while the job is already running
>
>    2. What's the evaluation criteria to trigger a scale-up ? Is it
>    possible to manually trigger a scale-up for testing purposes?
>
>
> Thanks
>

Reply via email to