Hi Ravi,

Consider moving to RocksDB state backend, where you can enable incremental
checkpointing. This will make you checkpoints size stay pretty much
constant even when your state becomes larger.

https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/state/state_backends.html#the-rocksdbstatebackend


Thanks,
Rafi

On Sat, Sep 7, 2019, 17:47 Ravi Bhushan Ratnakar <
ravibhushanratna...@gmail.com> wrote:

> Hi All,
>
> I am writing a streaming application using Flink 1.9. This application
> consumes data from kinesis stream which is basically avro payload.
> Application is using KeyedProcessFunction to execute business logic on the
> basis of correlation id using event time characteristics with below
> configuration --
> StateBackend - filesystem with S3 storage
> registerTimeTimer duration for each key is  -  currentWatermark  + 15
> seconds
> checkpoint interval - 1min
> minPauseBetweenCheckpointInterval - 1 min
> checkpoint timeout - 10mins
>
> incoming data rate from kinesis -  ~10 to 21GB/min
>
> Number of Task manager - 200 (r4.2xlarge -> 8cpu,61GB)
>
> First 2-4 checkpoints get completed within 1mins where the state size is
> usually 50GB. As the state size grows beyond 50GB, then checkpointing time
> starts taking more than 1mins and it increased till 10 mins and then
> checkpoint fails. The moment the checkpoint starts taking more than 1 mins
> to complete then application starts processing slow and start lagging in
> output.
>
> Any suggestion to fine tune checkpoint performance would be highly
> appreciated.
>
> Regards,
> Ravi
>

Reply via email to