You could also check out the Autoscaler logic in the Flink Kubernetes
Operator (
https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/autoscaler/
)
On the current main and in the upcoming 1.5.0 release the mechanism is
pretty nice and solid :)

It works with the native integration so you can also set standby TMs with a
simple config.

Cheers,
Gyula

On Fri, Apr 28, 2023 at 7:31 AM Wei Hou <wei....@airbnb.com> wrote:

> Thank you for all your responses! I think Gyula is right, simply do a MAX -
> some_offset is not ideal as it can make the standby TM useless.
> It is difficult for the scheduler to determine whether a pod has been lost
> or scaled down when we enable autoscaling, which affects its decision to
> utilize standby TMs. We probably need to monitor the HPA events in order to
> get this information.
> I will wait to see if there is a solution for this problem in the future.
>
>
> On Wed, Apr 26, 2023 at 7:20 AM Gyula Fóra <gyula.f...@gmail.com> wrote:
>
>> I think the behaviour is going to get a little weird because this would
>> actually defeat the purpose of the standby TM.
>> MAX - some offset will decrease once you lose a TM so in this case we
>> would scale down to again have a spare (which we never actually use.)
>>
>> Gyula
>>
>> On Wed, Apr 26, 2023 at 4:02 PM Chesnay Schepler <ches...@apache.org>
>> wrote:
>>
>>> Reactive mode doesn't support standby taskmanagers. As you said it
>>> always uses all available resources in the cluster.
>>>
>>> I can see it being useful though to not always scale to MAX but (MAX -
>>> some_offset).
>>>
>>> I'd suggest to file a ticket.
>>>
>>> On 26/04/2023 00:17, Wei Hou via user wrote:
>>> > Hi Flink community,
>>> >
>>> > We are trying to use Flink’s reactive mode with Kubernetes HPA for
>>> autoscaling, however since the reactive mode will always use all available
>>> resources, it causes a problem when we need standby task managers for fast
>>> failure recover: The job will always use these extra standby task managers
>>> as active task manager to process data.
>>> >
>>> > I wonder if you have any suggestion on this, should we avoid using
>>> Flink reactive mode together with standby task managers?
>>> >
>>> > Best,
>>> > Wei
>>> >
>>> >
>>>
>>>

Reply via email to