Hi, Jai.
Could you share your configuration about the checkpoint (interval,
min-pause, and so on)  and the checkpoint details in the Flink UI ?
I guess the delay of the checkpoint may be related to the last checkpoint
completion time as you could see in the
CheckpointRequestDecider#chooseRequestToExecute.
Maybe your checkpoint will last longer every 3rd or 4th checkpoints due to
the flush mechanism of rocksdb?

Best,
Hangxiang.

On Wed, Jun 15, 2022 at 6:27 AM Jai Patel <jai.pa...@cloudkitchens.com>
wrote:

> We've noticed a spike in the start delays in our incremental checkpoints
> every 15 minutes.  The Flink job seems to start out smooth, with
> checkpoints in in the 15s range and negligible start delays.  Then every
> 3rd or 4th checkpoint has a long start delay (~2-3 minutes).  Teh
> checkpoints in between have negligible start delays and are fast.  So:
>
> 2-3 fast with negligible start delay, total time 15-30s
> 1-2 slow with 2-3 minute start delay, total time 15-30s longer than the
> start delay.
>
> What could cause this?  We have a couple output topics that are
> EXACTLY_ONCE, but I switched them to AT_LEAST_ONCE and continued to see the
> behavior.
>
> Thanks.
> Jai
>

Reply via email to