Re: Failed to resume from HA when the checkpoint has been deleted.

2024-06-11 Thread Zhanghao Chen
job monitoring system to manually recover it. Best, Zhanghao Chen From: Jean-Marc Paulin Sent: Tuesday, June 11, 2024 16:04 To: Zhanghao Chen ; user@flink.apache.org Subject: Re: Failed to resume from HA when the checkpoint has been deleted. Thanks for you reply

Re: Failed to resume from HA when the checkpoint has been deleted.

2024-06-11 Thread Jean-Marc Paulin
in that scenario. But maybe there isn't any. Best regards JM From: Zhanghao Chen Sent: Tuesday, June 11, 2024 03:56 To: Jean-Marc Paulin ; user@flink.apache.org Subject: [EXTERNAL] Re: Failed to resume from HA when the checkpoint has been deleted. Hi, In this case, you

Re: Failed to resume from HA when the checkpoint has been deleted.

2024-06-10 Thread Zhanghao Chen
to resume from HA when the checkpoint has been deleted. Hi, We have a 1.19 Flink streaming job, with HA enabled (ZooKeeper), checkpoint/savepoint in S3. We had an outage and now the jobmanager keeps restarting. We think it because it read the job id to be restarted from ZooKeeper, but because we lost

Failed to resume from HA when the checkpoint has been deleted.

2024-06-10 Thread Jean-Marc Paulin
Hi, We have a 1.19 Flink streaming job, with HA enabled (ZooKeeper), checkpoint/savepoint in S3. We had an outage and now the jobmanager keeps restarting. We think it because it read the job id to be restarted from ZooKeeper, but because we lost our S3 Storage as part of the outage it cannot