Hey Rex,

What do you mean by "Start Delay" when recovering from a checkpoint? Did
you mean when taking a checkpoint? If so:

1. https://www.google.com/search?q=flink+checkpoint+start+delay
2. top 3 result (at least for me)
https://ci.apache.org/projects/flink/flink-docs-stable/ops/monitoring/checkpoint_monitoring.html
> Start Delay: The time it took for the first checkpoint barrier to reach
this subtasks since the checkpoint barrier has been created.

3. https://www.google.com/search?q=flink+checkpoint+barrier
4. top 2 result (at least for me)
https://ci.apache.org/projects/flink/flink-docs-stable/concepts/stateful-stream-processing.html#barriers
> A core element in Flink’s distributed snapshotting are the stream
barriers. These barriers are injected into the data stream and flow with
the records as part of the data stream.

Long start delay or alignment time means checkpoint barriers are
propagating slowly through the job graph, usually a symptom of a
back-pressure. It's best to solve the back-pressure problem, via optimising
your job or scaling it up.

Alternatively you could use unaligned checkpoints [1], at a cost of larger
checkpoint size and higher IO usage. Note here that if you are using Flink
1.12.x, I would refrain from using unaligned checkpoints on the production
because of some bugs [2] that we are fixing right now. On Flink 1.11.x it
should be fine.

Cheers,
Piotrek

[1]
https://flink.apache.org/2020/10/15/from-aligned-to-unaligned-checkpoints-part-1.html
[2] https://issues.apache.org/jira/browse/FLINK-20654



pon., 18 sty 2021 o 21:32 Rex Fenley <r...@remind101.com> napisał(a):

> Hello,
>
> When we are recovering on a checkpoint it will take multiple minutes. The
> time is usually taken by "Start Delay". What is Start Delay and how can we
> optimize for it?
>
> Thanks!
>
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>  |
>  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
> <https://www.facebook.com/remindhq>
>

Reply via email to