Re: How to setup HA properly with Kubernetes Standalone Application Cluster

2021-05-17 Thread Yang Wang
Hi ChangZhuo, IIRC, even you have specified a savepoint when starting, the JobManager could recover from the latest checkpoint when the JobManager failed. Because when recovering, DefaultCompletedCheckpointStore will sort all the checkpoints(including the savepoint) and pick the latest one. So,

Re: How to setup HA properly with Kubernetes Standalone Application Cluster

2021-05-14 Thread 陳昌倬
On Fri, May 14, 2021 at 02:00:41PM +0200, Fabian Paul wrote: > Hi Chen, > > Can you tell us a bit more about the job you are using? > The intended behaviour you are seeking can only be achieved > If the Kubernetes HA Services are enabled [1][2]. > Otherwise the job cannot recall past

Re: How to setup HA properly with Kubernetes Standalone Application Cluster

2021-05-14 Thread Fabian Paul
Hi Chen, Can you tell us a bit more about the job you are using? The intended behaviour you are seeking can only be achieved If the Kubernetes HA Services are enabled [1][2]. Otherwise the job cannot recall past checkpoints. Best, Fabian [1]

How to setup HA properly with Kubernetes Standalone Application Cluster

2021-05-14 Thread 陳昌倬
Hi, Recently, we changed our deployment to Kubernetes Standalone Application Cluster for reactive mode. According to [0], we use Kubernetes Job with --fromSavepoint to upgrade our application without losing state. The Job config is identical to the one in document. However, we found that in this