subject:"Best practices around checkpoint intervals and sizes\?"

Re: Best practices around checkpoint intervals and sizes?

2021-02-18 Thread Chesnay Schepler

A lower checkpoint interval (== more checkpoints / time) will consume more resources and hence can affect the job performance. It ultimately boils down to how much latency you are willing to accept when a failure occurs and data has to be re-processed (more checkpoints => less data). How long

Best practices around checkpoint intervals and sizes?

2021-02-17 Thread Dan Hill

Hi. I'm playing around with optimizing our checkpoint intervals and sizes. Are there any best practices around this? I have a ~7 sequential joins and a few sinks. I'm curious what would result in the better throughput and latency trade offs. I'd assume less frequent checkpointing would