Hello, This can be caused by several reasons such as back-pressure, large snapshots or bugs.
Could you please share: - the stats of the previous (successful) checkpoints - back-pressure metrics for sources - which Flink version do you use? Regards, Roman On Thu, Mar 11, 2021 at 7:03 AM Alexey Trenikhun <yen...@msn.com> wrote: > > Hello, > We are experiencing the problem with checkpoints failing due to timeout > (already set to 30 minute, still failing), checkpoints were not too big > before they started to fail, around 1.2Gb. Looks like one of sources (Kafka) > never acknowledged (see attached screenshot). What could be the reason? > > Thanks, > Alexey > >