Hi Ethan! Thanks for raising the issue, this is indeed a bug - for the previous code path, it falls back to "execution graph store" for completed jobs. I've raise a JIRA here - https://issues.apache.org/jira/browse/FLINK-33872 I've also managed to RC and fix it in the associated PR - https://github.com/apache/flink/pull/23949.
Regards, Hong On Thu, Dec 14, 2023 at 10:07 AM Ethan T Yang <ivanygy...@gmail.com> wrote: > Hi Hong Liang Teoh, > I think you are the owner of the ticket below. Can you take a look see if > a bug in the code that breaks retrieving checkpoint history of the > cancelled job? > > Thanks, > Ivan > > On Dec 10, 2023, at 8:46 AM, Surendra Singh Lilhore < > surendralilh...@gmail.com> wrote: > > Hi Ethan, > > Looks like this got changed after > https://issues.apache.org/jira/browse/FLINK-32469. > > Now the checkpoint history call throws below exception for canceled job. > > 2023-12-10 21:50:12,990 ERROR > org.apache.flink.runtime.rest.handler.job.checkpoints. > CheckpointingStatisticsHandler [] - Exception occurred in REST handler: > Job 7504e7a6106093a3a9c7ef35f52ce6cf not found > > > Thanks > Surendra > > > On Sat, Dec 9, 2023 at 12:26 PM Ethan T Yang <ivanygy...@gmail.com> wrote: > >> Hello Surendra, >> Thank you for replying my question. I already have this code >> >> >> env.getCheckpointConfig().setExternalizedCheckpointCleanup(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION); >> >> I also tried use the rest api to retrieve a cancelled job, and no >> checkpoint was found from the rest api either. We use this conf >> >> # s3 checkpointing >> state.backend: filesystem >> state.checkpoints.dir: {{ .Values.jobManager.checkpointUrl }} >> state.savepoints.dir: {{ .Values.jobManager.savepointUrl }} >> >> The actual checkpoint is there in s3 after cancellation. Can someone >> point me to the code where the checkpoint history is maintained? >> >> Thanks, >> Ethan >> >> On Dec 8, 2023, at 8:23 AM, Surendra Singh Lilhore < >> surendralilh...@gmail.com> wrote: >> >> >> Hi Ethan, >> >> Can you try : >> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpoints/#retained-checkpoints >> >> Thanks >> Surendra >> >> >> On Thu, Dec 7, 2023 at 4:47 PM Ethan T Yang <ivanygy...@gmail.com> wrote: >> >>> Hi Flink Users, >>> >>> After migration from Flink 1.13.1 -> 1.18.0, I am no longer seeing the >>> checkpoint history after cancelling a job. I am wonder which setting to >>> enable so that I can see the checkpoint history on a cancelled job in Flink >>> 1.18.0. Below is the screenshot of what I can see in Flink 1.13.1. Hope to >>> get back the same view in the new version. >>> >>> Thanks, >>> Ethan >>> >>> >> >