Hi,

if you start a Flink job from a savepoint and the job needs to recover,
then it will only reuse the savepoint if no later checkpoint has been
created. Flink will always use the latest checkpoint/savepoint taken.

Cheers,
Till

On Wed, Dec 16, 2020 at 9:47 PM vishalovercome <vis...@moengage.com> wrote:

> My flink job runs in kubernetes. This is the setup:
>
> 1. One job running as a job cluster with one job manager
> 2. HA powered by zookeeper (works fine)
> 3. Job/Deployment manifests stored in Github and deployed to kubernetes by
> Argo
> 4. State persisted to S3
>
> If I were to stop (drain and take a savepoint) and resume, I'll have to
> update the job manager manifest with the savepoint location and save it in
> Github and redeploy. After deployment, I'll presumably have to modify the
> manifest again to remove the savepoint location so as to avoid starting the
> application from the same savepoint. This raises some questions:
>
> 1. If the job manager were to crash before the manifest is updated again
> then won't kubernetes restart the job manager from the savepoint rather
> than
> the latest checkpoint?
> 2. Is there a way to ensure that restoration from a savepoint doesn't
> happen
> more than once? Or not after first successful checkpoint?
> 3. If even one checkpoint has been finalized, then the job should prefer
> the
> checkpoint rather than the savepoint. Will that happen automatically given
> zookeeper?
> 4. Is it possible to not have to remove the savepoint path from the
> kubernetes manifest and simply rely on newer checkpoints/savepoints? It
> feels rather clumsy to have to add and remove back manually. We could use a
> cron job to remove it but its still clumsy.
> 5. Is there a way of asking flink to use the latest savepoint rather than
> specifying the location of the savepoint? If I were to manually rename the
> s3 savepoint location to something fixed (s3://fixed_savepoint_path_always)
> then would there be any problem restoring the job?
> 6. Any open source tool that solves this problem?
>
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Reply via email to