Hi Martijn,

We're currently utilizing flink-s3-fs-presto. After reviewing the
flink-s3-fs-hadoop source code, I believe we would encounter similar issues
with it as well.

When we say, 'The purpose of a checkpoint, in principle, is that Flink
manages its lifecycle,' I think it implies that the automatic cleanup of
old checkpoints is an integral part of Flink's lifecycle management.
However, is there a configuration option available that allows us to
disable this automatic cleanup? We're considering leveraging AWS S3's
lifecycle management capabilities to handle this aspect instead of relying
on Flink.

Best,
Yang

On Tue, 7 Nov 2023 at 18:44, Martijn Visser <martijnvis...@apache.org>
wrote:

> Ah, I actually misread checkpoint and savepoints, sorry. The purpose
> of a checkpoint in principle is that Flink manages its lifecycle.
> Which S3 interface are you using for the checkpoint storage?
>
> On Tue, Nov 7, 2023 at 6:39 PM Martijn Visser <martijnvis...@apache.org>
> wrote:
> >
> > Hi Yang,
> >
> > If you use the NO_CLAIM mode, Flink will not assume ownership of a
> > snapshot and leave it up to the user to delete them. See the blog [1]
> > for more details.
> >
> > Best regards,
> >
> > Martijn
> >
> > [1]
> https://flink.apache.org/2022/05/06/improvements-to-flink-operations-snapshots-ownership-and-savepoint-formats/#no_claim-default-mode
> >
> > On Tue, Nov 7, 2023 at 5:29 PM Junrui Lee <jrlee....@gmail.com> wrote:
> > >
> > > Hi Yang,
> > >
> > >
> > > You can try configuring
> "execution.checkpointing.externalized-checkpoint-retention:
> RETAIN_ON_CANCELLATION"[1] and increasing the value of
> "state.checkpoints.num-retained"[2] to retain more checkpoints.
> > >
> > >
> > > Here are the official documentation links for more details:
> > >
> > > [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#execution-checkpointing-externalized-checkpoint-retention
> > >
> > > [2]
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#state-checkpoints-num-retained
> > >
> > >
> > > Best,
> > >
> > > Junrui
> > >
> > >
> > > Yang LI <yang.hunter...@gmail.com> 于2023年11月7日周二 22:02写道:
> > >>
> > >> Dear Flink Community,
> > >>
> > >> In our Flink application, we persist checkpoints to AWS S3. Recently,
> during periods of high job parallelism and traffic, we've experienced
> checkpoint failures. Upon investigating, it appears these may be related to
> S3 delete object requests interrupting checkpoint re-uploads, as evidenced
> by numerous InterruptedExceptions.
> > >>
> > >> We aim to explore options for disabling the deletion of stale
> checkpoints. Despite consulting the Flink configuration documentation and
> conducting various tests, the appropriate setting to prevent old checkpoint
> cleanup remains elusive.
> > >>
> > >> Could you advise if there's a method to disable the automatic cleanup
> of old Flink checkpoints?
> > >>
> > >> Best,
> > >> Yang
>

Reply via email to