Without HA, your job can restore from the latest successful checkpoint only
if your jobmanager process / pod has not failed. If the jobmanager failed,
the new jobmanager brought up by Kubernetes will not be able to find the
latest successful checkpoint without HA. Jobmanager can fail due to not
only pod evictions, but also other problems (jvm out-of-memory, remote
storage connection downtime, etc.).

Thank you~

Xintong Song



On Tue, Oct 26, 2021 at 7:39 AM Deshpande, Omkar <omkar_deshpa...@intuit.com>
wrote:

> Hello,
>
> We are running flink on Kubernetes(Standalone) in application cluster
> mode. The job manager is deployed as a deployment.
> We only deploy one instance/replica of job manager. So, the leader
> election service is not required.
> And we have set flink task execution retries to infinite.
>
> Do we still need a HA setup? We have tested our application without
> configuring the HA, and it seems to restore from checkpoints after failures.
> Does the flink job manager keep the information that it would otherwise
> store in HA system, in memory?
> If it does, then the only reason to configure HA is to achieve resiliency
> in case of pod evictions(caused by node failures or scheduling etc.)?
>
> Thanks,
> Omkar
>

Reply via email to