Re: Restore from checkpoint

Jinzhong Li Sun, 19 May 2024 21:11:27 -0700

Hi Phil,

I think you can use the "-s :checkpointMetaDataPath" arg  to resume the job
from a retained checkpoint[1].


[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/checkpoints/#resuming-from-a-retained-checkpoint

Best,
Jinzhong Li


On Mon, May 20, 2024 at 2:29 AM Phil Stavridis <[email protected]> wrote:

> Hi Lu,
>
> Thanks for your reply. In what way are the paths to get passed to the job
> that needs to used the checkpoint? Is the standard way, using -s :/<path>
> or by passing the path in the module as a Python arg?
>
> Kind regards
> Phil
>
> > On 18 May 2024, at 03:19, jiadong.lu <[email protected]> wrote:
> >
> > Hi Phil,
> >
> > AFAIK, the error indicated your path was incorrect.
> > your should use '/opt/flink/checkpoints/1875588e19b1d8709ee62be1cdcc' or
> 'file:///opt/flink/checkpoints/1875588e19b1d8709ee62be1cdcc' instead.
> >
> > Best.
> > Jiadong.Lu
> >
> > On 5/18/24 2:37 AM, Phil Stavridis wrote:
> >> Hi,
> >> I am trying to test how the checkpoints work for restoring state, but
> not sure how to run a new instance of a flink job, after I have cancelled
> it, using the checkpoints which I store in the filesystem of the job
> manager, e.g. /opt/flink/checkpoints.
> >> I have tried passing the checkpoint as an argument in the function and
> use it while setting the checkpoint but it looks like the way it is done is
> something like below:
> >> docker-compose exec jobmanager flink run -s
> :/opt/flink/checkpoints/1875588e19b1d8709ee62be1cdcc -py
> /opt/app/flink_job.py
> >> But I am getting error:
> >> Caused by: java.io.IOException: Checkpoint/savepoint path
> ':/opt/flink/checkpoints/1875588e19b1d8709ee62be1cdcc' is not a valid file
> URI. Either the pointer path is invalid, or the checkpoint was created by a
> different state backend.
> >> What is wrong with the  way the job is re-submitted to the cluster?
> >> Kind regards
> >> Phil
>
>

Re: Restore from checkpoint

Reply via email to