Hello, I was reading about Flink's checkpoint and wanted to check if I correctly understood the usage of barriers for exactly once processing. 1) Operator does alignment by buffering records coming after a barrier until it receives barrier from all upstream operators instances. 2) Barrier is always preceded by a watermark to trigger processing all windows that are complete. 3) Records in windows that are not triggered are also saved as part of checkpoint. These windows are repopulated when restoring from checkpoints.
In production setups, were there any cases where alignment during checkpointing caused unacceptable latency? If so, is there a way to indicate say wait for a MAX 100 ms? That way we have exactly-once in most situations but prefer at least once over higher latency in corner cases. Srikanth