Hi, if you start a Flink job from a savepoint and the job needs to recover, then it will only reuse the savepoint if no later checkpoint has been created. Flink will always use the latest checkpoint/savepoint taken.
Cheers, Till On Wed, Dec 16, 2020 at 9:47 PM vishalovercome <vis...@moengage.com> wrote: > My flink job runs in kubernetes. This is the setup: > > 1. One job running as a job cluster with one job manager > 2. HA powered by zookeeper (works fine) > 3. Job/Deployment manifests stored in Github and deployed to kubernetes by > Argo > 4. State persisted to S3 > > If I were to stop (drain and take a savepoint) and resume, I'll have to > update the job manager manifest with the savepoint location and save it in > Github and redeploy. After deployment, I'll presumably have to modify the > manifest again to remove the savepoint location so as to avoid starting the > application from the same savepoint. This raises some questions: > > 1. If the job manager were to crash before the manifest is updated again > then won't kubernetes restart the job manager from the savepoint rather > than > the latest checkpoint? > 2. Is there a way to ensure that restoration from a savepoint doesn't > happen > more than once? Or not after first successful checkpoint? > 3. If even one checkpoint has been finalized, then the job should prefer > the > checkpoint rather than the savepoint. Will that happen automatically given > zookeeper? > 4. Is it possible to not have to remove the savepoint path from the > kubernetes manifest and simply rely on newer checkpoints/savepoints? It > feels rather clumsy to have to add and remove back manually. We could use a > cron job to remove it but its still clumsy. > 5. Is there a way of asking flink to use the latest savepoint rather than > specifying the location of the savepoint? If I were to manually rename the > s3 savepoint location to something fixed (s3://fixed_savepoint_path_always) > then would there be any problem restoring the job? > 6. Any open source tool that solves this problem? > > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >