Hi, jonas

If you restart flink cluster by delete/create deployment directly, it will
be automatically restored from the latest checkpoint[1], so maybe just
enabling the checkpoint is enough.
But if you want to use savepoint, you need to check whether the latest
savepoint is successful (check whether have _metadata in savepoint dir is
useful in most scenarios, but in some cases the _metadata may not be
completed).

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/

Best,
Weihua


On Tue, Jul 5, 2022 at 10:54 PM jonas eyob <jonas.e...@gmail.com> wrote:

> Hi!
>
> We are running a Standalone job on Kubernetes using application deployment
> mode, with HA enabled.
>
> We have attempted to automate how we create and restore savepoints by
> running a script for generating a savepoint (using k8 preStop hook) and
> another one for restoring from a savepoint (located in a S3 bucket).
>
> Restoring from a savepoint is typically not a problem once we have a
> savepoint generated and accessible in our s3 bucket. The problem is
> generating the savepoint which hasn't been very reliable thus far. Logs are
> not particularly helpful either so we wanted to rethink how we go about
> taking savepoints.
>
> Are there any best practices for doing this in a CI/CD manner given our
> setup?
>
> --
>
>

Reply via email to