Hi!
I think this warning from the documentation is a bit over the top. Yes,
unaligned checkpoints in that regard are adding an extra source of
indeterminism, however please note that Flink doesn't give any guarantees
that the results will be the same after a recovery, as the order of the
records c
Hi Piotr,
I also agree with Zhanghao's assessment on the limitations of unaligned
checkpoints. Some of them are already handled properly by Flink, but in the
case of the "Interplay with watermarks" limitation, it is quite confusing
for a new user to find that their code doesn't generate consistent
Hi thanks for the responses,
And thanks for pointing out the jobs upgrade issue. Indeed that has
slipped my mind. I was mistakenly
thinking that we are supporting all upgrades only via savepoint. Anyway,
maybe in that case we should
guide users towards that? Using savepoints for upgrades? That wou
Hi Piotr,
Thanks for driving this! Generally I support enabling the alignment timeout
for aligned checkpoint. And I second Rui's opinion, 30s seems a reasonable
value.
However I'm worried if there are some operators that do not support the
unaligned CP, which may cause data accuracy problems (as
Thanks to Piotr driving this proposal!
Enabling unaligned checkpoint with aligned checkpoints timeout
is fine for me. I'm not sure if aligned checkpoints timeout =5s is
too aggressive. If the unaligned checkpoint is enabled by default
for all jobs, I recommend that the aligned checkpoints timeout
Hi Piotr,
As a platform administer who runs kilos of Flink jobs, I'd be against the idea
to enable unaligned cp by default for our jobs. It may help a significant
portion of the users, but the subtle issues around unaligned CP for a few jobs
will probably raise a lot more on-calls and incidents