Hi,

do you observe such long checkpoint times also without performing external calls? If not, I guess the communication to the external system is flaky.

Maybe you have to rethink how you perform such calls in order to make the pipeline more robust against these latencies. Flink also offers an async operator [1] for exactly such cases, this could be worth a look.

Regards,
Timo

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/asyncio.html


Am 05.11.18 um 18:52 schrieb PranjalChauhan:
Hi,

I am new Fink user and currently, using Flink 1.2.1 version. I am trying to
understand how checkpoints actually work when Window operator is processing
events.

My pipeline has the following flow where each operator's parallelism is 1.
source -> flatmap -> tumbling window -> sink
In this pipeline, I had configured the window to be evaluated every 1 hour
(3600 seconds) and the checkpoint interval was 5 mins. The checkpoint
timeout was set to 1 hour as I wanted the checkpoints to complete.

In my window function, the job makes https call to another service so window
function may take some time to evaluate/process all events.

Please refer the following image. In this case, the window was triggered at
23:00:00. Checkpoint 12 was triggered soon after that and I notice that
checkpoint 12 takes long time to complete (compared to other checkpoints
when window function is not processing events).
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1766/overall_checkpoint_duration_summary_when_waiting_for_window_operator.png>

Following images shows checkpoint 12 details of window & sink operators.
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1766/window_operator_checkpoint_duration_after_window_interval.png>
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1766/sink_operator_checkpoint_duration_after_window_interval.png>

I see that the time spent for checkpoint was actually just 5 ms & 8 ms
(checkpoint duration sync) for window & sink operators. However, End to End
Duration for checkpoint was 11m 12s for both window & sink operator.

Is this expected behavior? If yes, do you have any suggestion to reduce the
end to end checkpoint duration?

Please let me know if any more information is needed.

Thanks.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Reply via email to