Re: JobManager restarts on job failure

2022-09-26 Thread Matthias Pohl via user
er. >>>> I think this is not about the operator issue, kubernetes deployment >>>> just restarts the fallen pod, restarted jobmanager without HA metadata >>>> starts the job itself from an empty state. >>>> >>>> I'm looking for a way to prevent it from exiting in case of an >>&g

Re: JobManager restarts on job failure

2022-09-26 Thread Gyula Fóra
metadata >>> starts the job itself from an empty state. >>> >>> I'm looking for a way to prevent it from exiting in case of an job error >>> (we use application mode cluster). >>> >>> >>> >>> -- >>> *От:* Gyula Fóra >>>

Re: JobManager restarts on job failure

2022-09-26 Thread Matthias Pohl via user
gt;> *От:* Gyula Fóra >> *Отправлено:* 20 сентября 2022 г. 19:49:37 >> *Кому:* Evgeniy Lyutikov >> *Копия:* user@flink.apache.org >> *Тема:* Re: JobManager restarts on job failure >> >> The best thing for you to do would be to upgrade to Flink 1.15 and

Re: JobManager restarts on job failure

2022-09-20 Thread Gyula Fóra
lication mode cluster). > > > > -- > *От:* Gyula Fóra > *Отправлено:* 20 сентября 2022 г. 19:49:37 > *Кому:* Evgeniy Lyutikov > *Копия:* user@flink.apache.org > *Тема:* Re: JobManager restarts on job failure > > The best thing for you to do would

Re: JobManager restarts on job failure

2022-09-20 Thread Evgeniy Lyutikov
application mode cluster). От: Gyula Fóra Отправлено: 20 сентября 2022 г. 19:49:37 Кому: Evgeniy Lyutikov Копия: user@flink.apache.org Тема: Re: JobManager restarts on job failure The best thing for you to do would be to upgrade to Flink 1.15 and the latest operator

Re: JobManager restarts on job failure

2022-09-20 Thread Gyula Fóra
The best thing for you to do would be to upgrade to Flink 1.15 and the latest operator version. In Flink 1.15 we have the option to interact with the Flink jobmanager even after the job FAILED and the operator leverages this for a much more robust behaviour. In any case the operator should not

JobManager restarts on job failure

2022-09-20 Thread Evgeniy Lyutikov
Hi, We using flink 1.14.4 with flink kubernetes operator. Sometimes when updating a job, it fails on startup and flink removes all HA metadata and exits the jobmanager. 2022-09-14 14:54:44,534 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Restoring job