Re: Understanding flink-autoscaler behavior

Chetas Joshi Wed, 12 Jun 2024 15:34:15 -0700

Hi Zhanghao,

If I am using the autoscaler (flink-k8s-operator) without enabling
auto-tuning, the documentation says it triggers in-place scaling without
going through a full job upgrade lifecycle. Does this mean it won't trigger
a checkpoint before scaling up or scaling down?


I am observing that entries are being read from Kafka (the source in my
flink app) multiple times if the flink app goes through scaling. Do I need
the autotuning to be turned on for exactly once processing?

Thanks
Chetas

On Fri, Jun 7, 2024 at 8:28 PM Zhanghao Chen <zhanghao.c...@outlook.com>
wrote:

> Hi,
>
> Reactive mode and the Autoscaler in Kubernetes operator are two different
> approaches towards elastic scaling of Flink.
>
> Reactive mode [1] has to be used together with the passive resource
> management approach of Flink (only Standalone mode takes this approach),
> where the TM number is controlled by external systems and the job
> parallelism is derived from the available resources in a reactive manner.
> The workflow is as follows:
>
> External service (like K8s HPA controller) monitors job load and adjust TM
> number to match the load -> Flink adjust job parallelism *reactively*.
>
> The Autoscaler [2] is to be used with the active resource management
> approach of Flink (YARN/K8s Application/Session mode), where The TM number
> is derived from the job resource need, and Flink actively requests/releases
> TMs to match job need. The workflow for the Autoscaler is:
>
> Autoscaler monitors job load and adjusts job parallelism *actively* to
> match the load -> Flink adjusts TM number to match the job need.
>
> I would recommend using the Autoscaler instead of the Reactive mode for:
>
>    1. The Autoscaler provides a all-in-one solution while Reactive mode
>    needs to be paired with an external TM number adjusting system.
>    2. The Autoscaler has a fine-tuned scaling algorithm tailored for
>    streaming workloads. It will fine-tune each operator's parallelism and take
>    task load balancing in mind while reactive mode just adjust all operator's
>    parallelism uniformly.
>
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/elastic_scaling/#reactive-mode
> [2]
> https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.8/docs/custom-resource/autoscaler/
>
>
> Best,
> Zhanghao Chen
> ------------------------------
> *From:* Sachin Sharma <sachinapr...@gmail.com>
> *Sent:* Saturday, June 8, 2024 3:02
> *To:* Gyula Fóra <gyula.f...@gmail.com>
> *Cc:* Chetas Joshi <chetas.jo...@gmail.com>; Oscar Perez via user <
> user@flink.apache.org>
> *Subject:* Re: Understanding flink-autoscaler behavior
>
> Hi,
>
> I have a question related to this.
>
> I am doing a POC with Kubernetes operator 1.8 and flink 1.18 version with
> Reactive mode enabled, I added some dummy slow and fast operator to the
> flink job and i can see there is a back pressure accumulated. but i am not
> sure why my Flink task managers are not scaled by the operator. Also, can
> someone explain if autoscalers job is just to add more task manager and
> then Reactive mode will adjust the parallelism based on the configurations?
> As per the Operator documentation - Users configure the target
> utilization percentage of the operators in the pipeline, e.g. keep the all
> operators between 60% - 80% busy. The autoscaler then finds a parallelism
> configuration such that the output rates of all operators match the input
> rates of all their downstream operators at the targeted utilization. So
> does the autoscaler(Part of kubernetes operator) controls the parallelism
> or the Reactive mode in the Flink job controls it.
>
> Thanks & Regards,
> Sachin Sharma
>
>
>
>
> On Fri, Jun 7, 2024 at 4:55 AM Gyula Fóra <gyula.f...@gmail.com> wrote:
>
> Hi!
>
> To simplify things you can generally look at TRUE_PROCESSING_RATE,
> SCALUE_UP_RATE_THRESHOLD and SCALE_DOWN_RATE_THRESHOLD.
> If TPR is below the scale up threshold then we should scale up and if its
> above the scale down threshold then we scale down.
>
> In your case what we see for your source
> (cbc357ccb763df2852fee8c4fc7d55f2) as logged is that:
>
> TPR: 17498
> SCALE_UP_THRESHOLD: 83995
>
> So it should definitely be scaled up in theory, however you can also see:
> `Updating source cbc357ccb763df2852fee8c4fc7d55f2 max parallelism based
> on available partitions to 1`
>
> This means that the source max parallelism was determined by the available
> kafka partitions to be 1. It would not make sense to increase the
> parallelism even though we are clearly falling behind as the source cannot
> consume a single partition in parallel.
>
> Hope this helps
> Gyula
>
> On Fri, Jun 7, 2024 at 3:41 AM Chetas Joshi <chetas.jo...@gmail.com>
> wrote:
>
> Hi Community,
>
> I want to understand the following logs from the flink-k8s-operator
> autoscaler. My flink pipeline running on 1.18.0 and using
> flink-k8s-operator (1.8.0) is not scaling up even though the source vertex
> is back-pressured.
>
>
> 2024-06-06 21:33:35,270 o.a.f.a.ScalingMetricCollector 
> [DEBUG][flink/pipeline-pipelinelocal]
> Updating source cbc357ccb763df2852fee8c4fc7d55f2 max parallelism based on
> available partitions to 1
>
> 2024-06-06 21:33:35,276 o.a.f.a.RestApiMetricsCollector 
> [DEBUG][flink/pipeline-pipelinelocal]
> Querying metrics {busyTimeMsPerSecond=BUSY_TIME_PER_SEC,
> Source__dd-log-source.numRecordsOut=SOURCE_TASK_NUM_RECORDS_OUT,
> Source__dd-log-source.numRecordsIn=SOURCE_TASK_NUM_RECORDS_IN,
> Source__dd-log-source.pendingRecords=PENDING_RECORDS,
> Source__dd-log-source.numRecordsInPerSecond=SOURCE_TASK_NUM_RECORDS_IN_PER_SEC,
> backPressuredTimeMsPerSecond=BACKPRESSURE_TIME_PER_SEC} for
> cbc357ccb763df2852fee8c4fc7d55f2
>
> 2024-06-06 21:33:35,282 o.a.f.a.RestApiMetricsCollector 
> [DEBUG][flink/pipeline-pipelinelocal]
> Querying metrics {busyTimeMsPerSecond=BUSY_TIME_PER_SEC} for
> 61214243927da46230dfd349fba7b8e6
>
> 2024-06-06 21:33:35,286 o.a.f.a.RestApiMetricsCollector 
> [DEBUG][flink/pipeline-pipelinelocal]
> Querying metrics {busyTimeMsPerSecond=BUSY_TIME_PER_SEC} for
> 7758b2b5ada48872db09a5c48176e34e
>
> 2024-06-06 21:33:35,291 o.a.f.a.RestApiMetricsCollector 
> [DEBUG][flink/pipeline-pipelinelocal]
> Querying metrics {busyTimeMsPerSecond=BUSY_TIME_PER_SEC} for
> eab9c0013081b8479e60463931f3a593
>
> 2024-06-06 21:33:35,304 o.a.f.a.ScalingMetricCollector 
> [DEBUG][flink/pipeline-pipelinelocal]
> Calculating vertex scaling metrics for cbc357ccb763df2852fee8c4fc7d55f2
> from
> {BACKPRESSURE_TIME_PER_SEC=AggregatedMetric{id='backPressuredTimeMsPerSecond',
> mim='0.0', max='0.0', avg='0.0', sum='0.0'},
> BUSY_TIME_PER_SEC=AggregatedMetric{id='busyTimeMsPerSecond', mim='192.0',
> max='192.0', avg='192.0', sum='192.0'},
> SOURCE_TASK_NUM_RECORDS_OUT=AggregatedMetric{id='Source__dd-log-source.numRecordsOut',
> mim='613279.0', max='613279.0', avg='613279.0', sum='613279.0'},
> PENDING_RECORDS=AggregatedMetric{id='Source__dd-log-source.pendingRecords',
> mim='0.0', max='0.0', avg='0.0', sum='0.0'},
> SOURCE_TASK_NUM_RECORDS_IN=AggregatedMetric{id='Source__dd-log-source.numRecordsIn',
> mim='613279.0', max='613279.0', avg='613279.0', sum='613279.0'},
> SOURCE_TASK_NUM_RECORDS_IN_PER_SEC=AggregatedMetric{id='Source__dd-log-source.numRecordsInPerSecond',
> mim='1682.7333333333333', max='1682.7333333333333',
> avg='1682.7333333333333', sum='1682.7333333333333'}}
>
> 2024-06-06 21:33:35,304 o.a.f.a.ScalingMetricCollector 
> [DEBUG][flink/pipeline-pipelinelocal]
> Vertex scaling metrics for cbc357ccb763df2852fee8c4fc7d55f2:
> {ACCUMULATED_BUSY_TIME=32301.0, NUM_RECORDS_OUT=125.0, LOAD=0.192,
> NUM_RECORDS_IN=613279.0, OBSERVED_TPR=Infinity, LAG=0.0}
>
> 2024-06-06 21:33:35,304 o.a.f.a.ScalingMetricCollector 
> [DEBUG][flink/pipeline-pipelinelocal]
> Calculating vertex scaling metrics for eab9c0013081b8479e60463931f3a593
> from {BUSY_TIME_PER_SEC=AggregatedMetric{id='busyTimeMsPerSecond',
> mim='0.0', max='0.0', avg='0.0', sum='0.0'}}
>
> 2024-06-06 21:33:35,304 o.a.f.a.ScalingMetricCollector 
> [DEBUG][flink/pipeline-pipelinelocal]
> Vertex scaling metrics for eab9c0013081b8479e60463931f3a593:
> {ACCUMULATED_BUSY_TIME=0.0, NUM_RECORDS_OUT=0.0, LOAD=0.0,
> NUM_RECORDS_IN=8.0}
>
> 2024-06-06 21:33:35,304 o.a.f.a.ScalingMetricCollector 
> [DEBUG][flink/pipeline-pipelinelocal]
> Calculating vertex scaling metrics for 61214243927da46230dfd349fba7b8e6
> from {BUSY_TIME_PER_SEC=AggregatedMetric{id='busyTimeMsPerSecond',
> mim='0.0', max='0.0', avg='0.0', sum='0.0'}}
>
> 2024-06-06 21:33:35,304 o.a.f.a.ScalingMetricCollector 
> [DEBUG][flink/pipeline-pipelinelocal]
> Vertex scaling metrics for 61214243927da46230dfd349fba7b8e6:
> {ACCUMULATED_BUSY_TIME=0.0, NUM_RECORDS_OUT=0.0, LOAD=0.0,
> NUM_RECORDS_IN=8.0}
>
> 2024-06-06 21:33:35,304 o.a.f.a.ScalingMetricCollector 
> [DEBUG][flink/pipeline-pipelinelocal]
> Calculating vertex scaling metrics for 7758b2b5ada48872db09a5c48176e34e
> from {BUSY_TIME_PER_SEC=AggregatedMetric{id='busyTimeMsPerSecond',
> mim='0.0', max='0.0', avg='0.0', sum='0.0'}}
>
> 2024-06-06 21:33:35,305 o.a.f.a.ScalingMetricCollector 
> [DEBUG][flink/pipeline-pipelinelocal]
> Vertex scaling metrics for 7758b2b5ada48872db09a5c48176e34e:
> {ACCUMULATED_BUSY_TIME=0.0, NUM_RECORDS_OUT=8.0, LOAD=0.0,
> NUM_RECORDS_IN=117.0}
>
> 2024-06-06 21:33:35,305 o.a.f.a.ScalingMetricCollector 
> [DEBUG][flink/pipeline-pipelinelocal]
> Global metrics: {NUM_TASK_SLOTS_USED=1.0,
> HEAP_MAX_USAGE_RATIO=0.6800108099126959, HEAP_MEMORY_USED=4.74886648E8,
> METASPACE_MEMORY_USED=1.40677456E8, MANAGED_MEMORY_USED=0.0}
>
> 2024-06-06 21:33:35,306 o.a.f.a.ScalingTracking        
> [DEBUG][flink/pipeline-pipelinelocal]
> Cannot record restart duration because already set in the latest record:
> PT0.114185S
>
> 2024-06-06 21:33:35,307 o.a.f.a.JobAutoScalerImpl      
> [DEBUG][flink/pipeline-pipelinelocal]
> Collected metrics:
> CollectedMetricHistory(jobTopology=JobTopology(vertexInfos={cbc357ccb763df2852fee8c4fc7d55f2=VertexInfo(id=cbc357ccb763df2852fee8c4fc7d55f2,
> inputs={}, outputs={61214243927da46230dfd349fba7b8e6=REBALANCE,
> 7758b2b5ada48872db09a5c48176e34e=HASH}, parallelism=1, maxParallelism=1,
> originalMaxParallelism=20, finished=false,
> ioMetrics=IOMetrics(numRecordsIn=0, numRecordsOut=125,
> accumulatedBusyTime=32301.0)),
> eab9c0013081b8479e60463931f3a593=VertexInfo(id=eab9c0013081b8479e60463931f3a593,
> inputs={7758b2b5ada48872db09a5c48176e34e=REBALANCE}, outputs={},
> parallelism=1, maxParallelism=1, originalMaxParallelism=1, finished=false,
> ioMetrics=IOMetrics(numRecordsIn=8, numRecordsOut=0,
> accumulatedBusyTime=0.0)),
> 61214243927da46230dfd349fba7b8e6=VertexInfo(id=61214243927da46230dfd349fba7b8e6,
> inputs={cbc357ccb763df2852fee8c4fc7d55f2=REBALANCE}, outputs={},
> parallelism=1, maxParallelism=1, originalMaxParallelism=1, finished=false,
> ioMetrics=IOMetrics(numRecordsIn=8, numRecordsOut=0,
> accumulatedBusyTime=0.0)),
> 7758b2b5ada48872db09a5c48176e34e=VertexInfo(id=7758b2b5ada48872db09a5c48176e34e,
> inputs={cbc357ccb763df2852fee8c4fc7d55f2=HASH},
> outputs={eab9c0013081b8479e60463931f3a593=REBALANCE}, parallelism=1,
> maxParallelism=20, originalMaxParallelism=20, finished=false,
> ioMetrics=IOMetrics(numRecordsIn=117, numRecordsOut=8,
> accumulatedBusyTime=0.0))}, finishedVertices=[],
> verticesInTopologicalOrder=[cbc357ccb763df2852fee8c4fc7d55f2,
> 61214243927da46230dfd349fba7b8e6, 7758b2b5ada48872db09a5c48176e34e,
> eab9c0013081b8479e60463931f3a593]),
> metricHistory={2024-06-06T21:32:35.170678Z=CollectedMetrics(vertexMetrics={cbc357ccb763df2852fee8c4fc7d55f2={ACCUMULATED_BUSY_TIME=20821.0,
> NUM_RECORDS_OUT=109.0, LOAD=0.0, NUM_RECORDS_IN=512339.0,
> OBSERVED_TPR=Infinity, LAG=0.0},
> eab9c0013081b8479e60463931f3a593={ACCUMULATED_BUSY_TIME=0.0,
> NUM_RECORDS_OUT=0.0, LOAD=0.0, NUM_RECORDS_IN=7.0},
> 61214243927da46230dfd349fba7b8e6={ACCUMULATED_BUSY_TIME=0.0,
> NUM_RECORDS_OUT=0.0, LOAD=0.0, NUM_RECORDS_IN=7.0},
> 7758b2b5ada48872db09a5c48176e34e={ACCUMULATED_BUSY_TIME=0.0,
> NUM_RECORDS_OUT=7.0, LOAD=0.0, NUM_RECORDS_IN=102.0}},
> globalMetrics={NUM_TASK_SLOTS_USED=1.0,
> HEAP_MAX_USAGE_RATIO=0.5849425971687019, HEAP_MEMORY_USED=4.08495608E8,
> METASPACE_MEMORY_USED=1.43093792E8, MANAGED_MEMORY_USED=0.0}),
> 2024-06-06T21:33:35.258489Z=CollectedMetrics(vertexMetrics={cbc357ccb763df2852fee8c4fc7d55f2={ACCUMULATED_BUSY_TIME=32301.0,
> NUM_RECORDS_OUT=125.0, LOAD=0.192, NUM_RECORDS_IN=613279.0,
> OBSERVED_TPR=Infinity, LAG=0.0},
> eab9c0013081b8479e60463931f3a593={ACCUMULATED_BUSY_TIME=0.0,
> NUM_RECORDS_OUT=0.0, LOAD=0.0, NUM_RECORDS_IN=8.0},
> 61214243927da46230dfd349fba7b8e6={ACCUMULATED_BUSY_TIME=0.0,
> NUM_RECORDS_OUT=0.0, LOAD=0.0, NUM_RECORDS_IN=8.0},
> 7758b2b5ada48872db09a5c48176e34e={ACCUMULATED_BUSY_TIME=0.0,
> NUM_RECORDS_OUT=8.0, LOAD=0.0, NUM_RECORDS_IN=117.0}},
> globalMetrics={NUM_TASK_SLOTS_USED=1.0,
> HEAP_MAX_USAGE_RATIO=0.6800108099126959, HEAP_MEMORY_USED=4.74886648E8,
> METASPACE_MEMORY_USED=1.40677456E8, MANAGED_MEMORY_USED=0.0})},
> jobRunningTs=2024-06-06T21:17:35.712Z, fullyCollected=true)
>
> 2024-06-06 21:33:35,307 o.a.f.a.ScalingMetricEvaluator 
> [DEBUG][flink/pipeline-pipelinelocal]
> Restart time used in metrics evaluation: PT5M
>
> 2024-06-06 21:33:35,307 o.a.f.a.ScalingMetricEvaluator 
> [DEBUG][flink/pipeline-pipelinelocal]
> Using busy time based tpr 17498.932104004747 for
> cbc357ccb763df2852fee8c4fc7d55f2.
>
> 2024-06-06 21:33:35,307 o.a.f.a.ScalingMetricEvaluator 
> [DEBUG][flink/pipeline-pipelinelocal]
> Computing edge (cbc357ccb763df2852fee8c4fc7d55f2,
> 61214243927da46230dfd349fba7b8e6) data rate for single input downstream task
>
> 2024-06-06 21:33:35,307 o.a.f.a.ScalingMetricEvaluator 
> [DEBUG][flink/pipeline-pipelinelocal]
> Computed output ratio for edge (cbc357ccb763df2852fee8c4fc7d55f2 ->
> 61214243927da46230dfd349fba7b8e6) : 9.906875371507826E-6
>
> 2024-06-06 21:33:35,308 o.a.f.a.ScalingMetricEvaluator 
> [DEBUG][flink/pipeline-pipelinelocal]
> Computing edge (cbc357ccb763df2852fee8c4fc7d55f2,
> 7758b2b5ada48872db09a5c48176e34e) data rate for single input downstream task
>
> 2024-06-06 21:33:35,308 o.a.f.a.ScalingMetricEvaluator 
> [DEBUG][flink/pipeline-pipelinelocal]
> Computed output ratio for edge (cbc357ccb763df2852fee8c4fc7d55f2 ->
> 7758b2b5ada48872db09a5c48176e34e) : 1.486031305726174E-4
>
> 2024-06-06 21:33:35,308 o.a.f.a.ScalingMetricEvaluator 
> [DEBUG][flink/pipeline-pipelinelocal]
> Computing edge (7758b2b5ada48872db09a5c48176e34e,
> eab9c0013081b8479e60463931f3a593) data rate for single input downstream task
>
> 2024-06-06 21:33:35,308 o.a.f.a.ScalingMetricEvaluator 
> [DEBUG][flink/pipeline-pipelinelocal]
> Computed output ratio for edge (7758b2b5ada48872db09a5c48176e34e ->
> eab9c0013081b8479e60463931f3a593) : 0.06666666666666667
>
> 2024-06-06 21:33:35,308 o.a.f.a.JobAutoScalerImpl      
> [DEBUG][flink/pipeline-pipelinelocal]
> Evaluated metrics:
> EvaluatedMetrics(vertexMetrics={cbc357ccb763df2852fee8c4fc7d55f2={TARGET_DATA_RATE=EvaluatedScalingMetric(current=NaN,
> average=1679.897), PARALLELISM=EvaluatedScalingMetric(current=1.0,
> average=NaN),
> SCALE_UP_RATE_THRESHOLD=EvaluatedScalingMetric(current=83995.0,
> average=NaN), MAX_PARALLELISM=EvaluatedScalingMetric(current=1.0,
> average=NaN), TRUE_PROCESSING_RATE=EvaluatedScalingMetric(current=NaN,
> average=17498.932), LOAD=EvaluatedScalingMetric(current=NaN,
> average=0.096),
> SCALE_DOWN_RATE_THRESHOLD=EvaluatedScalingMetric(current=Infinity,
> average=NaN), CATCH_UP_DATA_RATE=EvaluatedScalingMetric(current=0.0,
> average=NaN), LAG=EvaluatedScalingMetric(current=0.0, average=NaN)},
> eab9c0013081b8479e60463931f3a593={TARGET_DATA_RATE=EvaluatedScalingMetric(current=NaN,
> average=0.017), PARALLELISM=EvaluatedScalingMetric(current=1.0,
> average=NaN), SCALE_UP_RATE_THRESHOLD=EvaluatedScalingMetric(current=1.0,
> average=NaN), MAX_PARALLELISM=EvaluatedScalingMetric(current=1.0,
> average=NaN), TRUE_PROCESSING_RATE=EvaluatedScalingMetric(current=NaN,
> average=Infinity), LOAD=EvaluatedScalingMetric(current=NaN, average=0.0),
> SCALE_DOWN_RATE_THRESHOLD=EvaluatedScalingMetric(current=Infinity,
> average=NaN), CATCH_UP_DATA_RATE=EvaluatedScalingMetric(current=0.0,
> average=NaN)},
> 61214243927da46230dfd349fba7b8e6={TARGET_DATA_RATE=EvaluatedScalingMetric(current=NaN,
> average=0.017), PARALLELISM=EvaluatedScalingMetric(current=1.0,
> average=NaN), SCALE_UP_RATE_THRESHOLD=EvaluatedScalingMetric(current=1.0,
> average=NaN), MAX_PARALLELISM=EvaluatedScalingMetric(current=1.0,
> average=NaN), TRUE_PROCESSING_RATE=EvaluatedScalingMetric(current=NaN,
> average=Infinity), LOAD=EvaluatedScalingMetric(current=NaN, average=0.0),
> SCALE_DOWN_RATE_THRESHOLD=EvaluatedScalingMetric(current=Infinity,
> average=NaN), CATCH_UP_DATA_RATE=EvaluatedScalingMetric(current=0.0,
> average=NaN)},
> 7758b2b5ada48872db09a5c48176e34e={TARGET_DATA_RATE=EvaluatedScalingMetric(current=NaN,
> average=0.25), PARALLELISM=EvaluatedScalingMetric(current=1.0,
> average=NaN), SCALE_UP_RATE_THRESHOLD=EvaluatedScalingMetric(current=13.0,
> average=NaN), MAX_PARALLELISM=EvaluatedScalingMetric(current=20.0,
> average=NaN), TRUE_PROCESSING_RATE=EvaluatedScalingMetric(current=NaN,
> average=Infinity), LOAD=EvaluatedScalingMetric(current=NaN, average=0.0),
> SCALE_DOWN_RATE_THRESHOLD=EvaluatedScalingMetric(current=Infinity,
> average=NaN), CATCH_UP_DATA_RATE=EvaluatedScalingMetric(current=0.0,
> average=NaN)}},
> globalMetrics={HEAP_MAX_USAGE_RATIO=EvaluatedScalingMetric(current=0.68,
> average=0.632), NUM_TASK_SLOTS_USED=EvaluatedScalingMetric(current=1.0,
> average=NaN), GC_PRESSURE=EvaluatedScalingMetric(current=NaN, average=NaN),
> HEAP_MEMORY_USED=EvaluatedScalingMetric(current=4.74886648E8,
> average=4.41691128E8),
> METASPACE_MEMORY_USED=EvaluatedScalingMetric(current=1.40677456E8,
> average=1.41885624E8),
> MANAGED_MEMORY_USED=EvaluatedScalingMetric(current=0.0, average=0.0)})
>
> 2024-06-06 21:33:35,308 o.a.f.a.ScalingExecutor        
> [DEBUG][flink/pipeline-pipelinelocal]
> Restart time used in scaling summary computation: PT5M
>
> 2024-06-06 21:33:35,308 o.a.f.a.JobVertexScaler        
> [DEBUG][flink/pipeline-pipelinelocal]
> Target processing capacity for cbc357ccb763df2852fee8c4fc7d55f2 is 168270.0
>
> 2024-06-06 21:33:35,308 o.a.f.a.JobVertexScaler        
> [DEBUG][flink/pipeline-pipelinelocal]
> Capped target processing capacity for cbc357ccb763df2852fee8c4fc7d55f2 is
> 168270.0
>
> 2024-06-06 21:33:35,308 o.a.f.a.JobVertexScaler        
> [DEBUG][flink/pipeline-pipelinelocal]
> Specified autoscaler maximum parallelism 200 is greater than the operator
> max parallelism 1. This means the operator max parallelism can never be
> reached.
>
> 2024-06-06 21:33:35,308 o.a.f.a.JobVertexScaler        
> [DEBUG][flink/pipeline-pipelinelocal]
> Target processing capacity for eab9c0013081b8479e60463931f3a593 is 2.0
>
> 2024-06-06 21:33:35,308 o.a.f.a.JobVertexScaler        
> [DEBUG][flink/pipeline-pipelinelocal]
> Computed scale factor of 0.0 for eab9c0013081b8479e60463931f3a593 is capped
> by maximum scale down factor to 0.4
>
> 2024-06-06 21:33:35,308 o.a.f.a.JobVertexScaler        
> [DEBUG][flink/pipeline-pipelinelocal]
> Capped target processing capacity for eab9c0013081b8479e60463931f3a593 is
> Infinity
>
> 2024-06-06 21:33:35,308 o.a.f.a.JobVertexScaler        
> [DEBUG][flink/pipeline-pipelinelocal]
> Specified autoscaler maximum parallelism 200 is greater than the operator
> max parallelism 1. This means the operator max parallelism can never be
> reached.
>
> 2024-06-06 21:33:35,308 o.a.f.a.JobVertexScaler        
> [DEBUG][flink/pipeline-pipelinelocal]
> Target processing capacity for 61214243927da46230dfd349fba7b8e6 is 2.0
>
> 2024-06-06 21:33:35,309 o.a.f.a.JobVertexScaler        
> [DEBUG][flink/pipeline-pipelinelocal]
> Computed scale factor of 0.0 for 61214243927da46230dfd349fba7b8e6 is capped
> by maximum scale down factor to 0.4
>
> 2024-06-06 21:33:35,309 o.a.f.a.JobVertexScaler        
> [DEBUG][flink/pipeline-pipelinelocal]
> Capped target processing capacity for 61214243927da46230dfd349fba7b8e6 is
> Infinity
>
> 2024-06-06 21:33:35,309 o.a.f.a.JobVertexScaler        
> [DEBUG][flink/pipeline-pipelinelocal]
> Specified autoscaler maximum parallelism 200 is greater than the operator
> max parallelism 1. This means the operator max parallelism can never be
> reached.
>
> 2024-06-06 21:33:35,309 o.a.f.a.JobVertexScaler        
> [DEBUG][flink/pipeline-pipelinelocal]
> Target processing capacity for 7758b2b5ada48872db09a5c48176e34e is 25.0
>
> 2024-06-06 21:33:35,309 o.a.f.a.JobVertexScaler        
> [DEBUG][flink/pipeline-pipelinelocal]
> Computed scale factor of 0.0 for 7758b2b5ada48872db09a5c48176e34e is capped
> by maximum scale down factor to 0.4
>
> 2024-06-06 21:33:35,309 o.a.f.a.JobVertexScaler        
> [DEBUG][flink/pipeline-pipelinelocal]
> Capped target processing capacity for 7758b2b5ada48872db09a5c48176e34e is
> Infinity
>
> 2024-06-06 21:33:35,309 o.a.f.a.JobVertexScaler        
> [DEBUG][flink/pipeline-pipelinelocal]
> Specified autoscaler maximum parallelism 200 is greater than the operator
> max parallelism 20. This means the operator max parallelism can never be
> reached.
>
> 2024-06-06 21:33:35,309 o.a.f.a.ScalingExecutor        [INFO ][flink/
> pipeline-pipelinelocal] All job vertices are currently running at their
> target parallelism.
>
>
> Some Questions
>
> 1. Does the autoscaler decide to scale a job vertex when the target
> processing capacity is higher than the current processing capacity? If yes,
> how to check the current processing capacity?
>
> 2. Which metrics in the above logs say the target utilization threshold is
> not reached and hence all the vertices are running at target parallelism?
>
>
> Thank you
> Chetas
>
>
>

Re: Understanding flink-autoscaler behavior

Reply via email to