Not quite possible based on the current version. We run an internal version of the Autoscaler in our production env. One major diff is that we let the whole pipeline (except the source/sink) to have the same parallelism to avoid uneven task distribution. The change is relatively simple, just run the algorithm per vertex, and take the max of them.
Best, Zhanghao Chen ________________________________ From: Salva Alcántara <[email protected]> Sent: Thursday, August 14, 2025 12:24 To: user <[email protected]> Subject: Re: Autoscaling Global Scaling Factor (???) That was on my agenda already. Will try and let you know how it goes. Regarding my questions, do you think it's possible to achieve any of those points to make the autoscaler work as when you simply add/remove replicas by hand? Thanks Chen! Salva On Thu, Aug 14, 2025 at 2:58 AM Zhanghao Chen <[email protected]<mailto:[email protected]>> wrote: Hi, you may upgrade Flink to 1.19.3 or 1.20.2 or 2.0.1+. There's a known issue that Autoscaler may not minimize the number of TMs during downscaling with adaptive scheduler [1]. [1] https://issues.apache.org/jira/browse/FLINK-33977 Best, Zhanghao Chen ________________________________ From: Salva Alcántara <[email protected]<mailto:[email protected]>> Sent: Wednesday, August 13, 2025 20:56 To: user <[email protected]<mailto:[email protected]>> Subject: RE: Autoscaling Global Scaling Factor (???) BTW, I'm running on Flink 1.18.1 on top of operator 1.12.1 and the following autoscaler settings: ``` job.autoscaler.enabled: "true" job.autoscaler.scaling.enabled: "true" job.autoscaler.scale-down.enabled: "true" job.autoscaler.vertex.max-parallelism: "8" job.autoscaler.vertex.min-parallelism: "1" jobmanager.scheduler: adaptive job.autoscaler.metrics.window: 15m job.autoscaler.metrics.busy-time.aggregator: MAX job.autoscaler.backlog-processing.lag-threshold: 2m job.autoscaler.scaling.effectiveness.detection.enabled: "true" job.autoscaler.scaling.effectiveness.threshold: "0.3" job.autoscaler.scaling.event.interval: 10m job.autoscaler.stabilization.interval: 5m job.autoscaler.scale-up.max-factor: "100000.0" job.autoscaler.scaling.key-group.partitions.adjust.mode: "EVENLY_SPREAD" job.autoscaler.scale-down.interval: 30m job.autoscaler.scale-down.max-factor: "0.5" job.autoscaler.memory.tuning.scale-down-compensation.enabled: "true" job.autoscaler.catch-up.duration: 5m job.autoscaler.restart.time: 15m job.autoscaler.restart.time-tracking.enabled: "true" job.autoscaler.utilization.target: "0.8" ``` Regards, Salva
