Hi, We are running few jobs on yarn and in case of some failure (that the job could not recover from on its own) we want to use last successful external checkpoint to restore the job from manually. The problem is that the ${state.checkpoints.dir} contains checkpoint directories for all jobs that we are running. How can we find out the last successful external checkpoint for some particular job? Will be grateful for any pointers.
Regards, Dawid
signature.asc
Description: Message signed with OpenPGP