Re: Per-job mode job restart and HA configuration

Yang Wang Mon, 03 Aug 2020 22:57:46 -0700

Hi Suchithra,

Roman is right. You still need zookeeper HA configured so that the job
could recover successfully when jobmanager failover.
Although job jar is bundled in the image, the checkpoint counter and path
need to be stored in zookpeeper. When the jobmanager
terminated exceptionally and relaunched by K8s, we need to recover from the
latest checkpoint automatically.

Another reason is for leader election and retrieval. For some corner cases,
for example, kubelet is crashed, two jobmanager may be
running even the replica of deployment is 1. We need zookeeper for the
leader election and leader retrieval so that the taskmanager
could find the active jobmanager.

A native K8s HA is requested in FLINK-12884[1], i will try to push it
implemented in next major release(1.12). After that, the HA configuration
on K8s will be more convenient.

[1]. https://issues.apache.org/jira/browse/FLINK-12884

Best,
Yang

Khachatryan Roman <khachatryan.ro...@gmail.com> 于2020年8月3日周一 下午10:03写道：

> Hi Suchithra,
>
> Yes, you need to pass these parameters to standalone-job.sh in Kubernetes
> job definition.
>
> I'm pulling in Patrick as he might know this subject better.
>
> Regards,
> Roman
>
>
> On Mon, Aug 3, 2020 at 12:24 PM V N, Suchithra (Nokia - IN/Bangalore) <
> suchithra....@nokia.com> wrote:
>
>> Hello,
>>
>>
>>
>> I am using Flink version 1.10.1 in Kubernetes environment. In per-Job
>> mode of flink, to achieve HA do we need zookeeper and HA parameters to
>> restart the job? I am suspicious because job jar is part of the docker
>> itself.
>>
>>
>>
>> Thanks,
>>
>> Suchithra
>>
>

Re: Per-job mode job restart and HA configuration

Reply via email to