By default, a checkpoint times out after 10 minutes. This means if not all
operators are able to confirm the checkpoint, it will be cancelled.

If you have an operator that is blocking for more than 10 minutes on a
single record (because this record contains millions of elements that are
written to an external system), then yes, this operator can cause your
checkpoints to time out.

On Mon, Feb 1, 2021 at 5:26 PM Marco Villalobos <mvillalo...@kineteque.com>
wrote:

> Actually, perhaps I misworded it.  This particular checkpoint seems to
> occur in an operator that is flat mapping (it is actually a keyed
> processing function) a single blob data-structure into several hundred
> thousands elements (sometimes a million) that immediately flow into a sink.
> I am speculating that the sink writes to the database were taking too long
> and causing a checkpoint to fail, but I changed that sink into a print, and
> the checkpoint still failed, so it must be something else.
>
> I don't know deep details regarding Flinks internals, but I am speculating
> that the data between this operator and sink has to be checkpointed before
> the sink actually does something.
>
> On Mon, Feb 1, 2021 at 2:37 AM Chesnay Schepler <ches...@apache.org>
> wrote:
>
>> 1) An operator that just blocks for a long time (for example, because it
>> does a synchronous call to some external service) can indeed cause a
>> checkpoint timeout.
>>
>> 2) What kind of effects are you worried about?
>>
>> On 1/28/2021 8:05 PM, Marco Villalobos wrote:
>> > Is it possible that checkpointing times out due to an operator taking
>> > too long?
>> >
>> > Also, does windowing affect the checkpoint barriers?
>>
>>
>>

Reply via email to