Thank you for all your responses! I think Gyula is right, simply do a MAX -
some_offset is not ideal as it can make the standby TM useless.
It is difficult for the scheduler to determine whether a pod has been lost
or scaled down when we enable autoscaling, which affects its decision to
utilize standby TMs. We probably need to monitor the HPA events in order to
get this information.
I will wait to see if there is a solution for this problem in the future.


On Wed, Apr 26, 2023 at 7:20 AM Gyula Fóra <gyula.f...@gmail.com> wrote:

> I think the behaviour is going to get a little weird because this would
> actually defeat the purpose of the standby TM.
> MAX - some offset will decrease once you lose a TM so in this case we
> would scale down to again have a spare (which we never actually use.)
>
> Gyula
>
> On Wed, Apr 26, 2023 at 4:02 PM Chesnay Schepler <ches...@apache.org>
> wrote:
>
>> Reactive mode doesn't support standby taskmanagers. As you said it
>> always uses all available resources in the cluster.
>>
>> I can see it being useful though to not always scale to MAX but (MAX -
>> some_offset).
>>
>> I'd suggest to file a ticket.
>>
>> On 26/04/2023 00:17, Wei Hou via user wrote:
>> > Hi Flink community,
>> >
>> > We are trying to use Flink’s reactive mode with Kubernetes HPA for
>> autoscaling, however since the reactive mode will always use all available
>> resources, it causes a problem when we need standby task managers for fast
>> failure recover: The job will always use these extra standby task managers
>> as active task manager to process data.
>> >
>> > I wonder if you have any suggestion on this, should we avoid using
>> Flink reactive mode together with standby task managers?
>> >
>> > Best,
>> > Wei
>> >
>> >
>>
>>

Reply via email to