Re: Dealing with "large" checkpoint state of a Beam pipeline in Flink

Kaymak, Tobias Tue, 12 Feb 2019 12:34:39 -0800

Thank you! I am using similar values but my problem was that my FILE_LOADS
were sometimes failing and this lead to this behavior. The pipeline didnt
fail though (which I was assuming it would do) it simply retried the
loading forever. (Retries were set to the default (-1)).


On Tue, Feb 12, 2019 at 6:11 PM Juan Carlos Garcia <jcgarc...@gmail.com>
wrote:

> I forgot to mention that we uses hdfs as storage for checkpoint /
> savepoint.
>
> Juan Carlos Garcia <jcgarc...@gmail.com> schrieb am Di., 12. Feb. 2019,
> 18:03:
>
>> Hi Tobias,
>>
>> I think this can happen when there is a lot of backpressure on the
>> pipeline.
>>
>> Don't know if it's normal but i have a pipeline reading from KafkaIO and
>> pushing to bigquery instreaming mode and i have seen checkpoint of almost
>> 1gb and whenever i am doing a savepoint for updating the pipeline it can
>> goes up to 8 GB of data on a savepoint.
>>
>> I am on Flink 1.5.x, on premises also using Rockdb and incremental.
>>
>> So far my only solutionto avoid errors while checkpointing or
>> savepointing is to make sure the checkpoint Timeout is high enough like 20m
>> or 30min.
>>
>>
>> Kaymak, Tobias <tobias.kay...@ricardo.ch> schrieb am Di., 12. Feb. 2019,
>> 17:33:
>>
>>> Hi,
>>>
>>> my Beam 2.10-SNAPSHOT pipeline has a KafkaIO as input and a BigQueryIO
>>> configured with FILE_LOADS as output. What bothers me is that even if I
>>> configure in my Flink 1.6 configuration
>>>
>>> state.backend: rocksdb
>>> state.backend.incremental: true
>>>
>>> I see states that are as big as 230 MiB and checkpoint timeouts, or
>>> checkpoints that take longer than 10 minutes to complete (I just saw one
>>> that took longer than 30 minutes).
>>>
>>> Am I missing something? Is there some room for improvement? Should I use
>>> a different storage backend for the checkpoints? (Currently they are stored
>>> on GCS).
>>>
>>> Best,
>>> Tobi
>>>
>>

Re: Dealing with "large" checkpoint state of a Beam pipeline in Flink

Reply via email to