The acknowledgement has to be synchronous since Flink assume that after
notifyCheckpointComplete() all data has been persisted to external
systems. For example, if record 1 to 100 were passed to the sink and a
checkpoint occurs and completed, on restart Flink would continue with
record 101. But if the sink does not synchronously waits for all updates
to be persisted the checkpoint may finish, and if then send asynchronous
update (say for record 99) then Flink will _still_ resume from record 101.
On 05.03.2019 15:07, Gabriel Candal wrote:
Hi,
Recently I've opened a Stack Overflow question
<https://stackoverflow.com/questions/54909315/why-does-checkpointing-impact-latency-so-much> about
latency spikes (~500ms) after a checkpoint operation, even though the
operation itself was relatively fast (~50ms).
I've come to realize that the cause for the latency was that the job
was waiting for the RMQSource to acknowledgeSessionIDs during
notifyCheckpointComplete.
I've noticed that the Kafka connectors do the equivalent operation
(committing offsets) asynchronously, at least from 09 onwards. My
question to you is: can you see any reason why does this
acknowledgement have to synchronous on RabbitMQ?
I believe it should be ok, given that those messages are already
reflected in the checkpointed state, but I'm not sure if there are any
negatives consequences correctness-wise.
Thanks,