A lower checkpoint interval (== more checkpoints / time) will consume
more resources and hence can affect the job performance.
It ultimately boils down to how much latency you are willing to accept
when a failure occurs and data has to be re-processed (more checkpoints
=> less data).
How long
Hi. I'm playing around with optimizing our checkpoint intervals and sizes.
Are there any best practices around this? I have a ~7 sequential joins and
a few sinks. I'm curious what would result in the better throughput and
latency trade offs. I'd assume less frequent checkpointing would