Flink checkpoints timeout when Paimon table has millions of records

Ramesh Motaparthy via user Mon, 18 Mar 2024 08:14:56 -0700

Hi,
We are planning to use Paimon for our high-throughput (millions per second)
and low-latency (10 to 20 seconds) streaming use case. We've seen seen
promising results so far, except for an issue with Flink checkpointing.


When we start a Flink job that reads from S3 via Paimon with an AppendOnly
table, checkpoints take too long and eventually timeout if there are
millions of records pending to be consumed. These failures trigger Flink
job restarts, causing us to re-read data from the beginning, creating an
infinite loop.

Interestingly, checkpoints complete within 10-20 milliseconds for smaller
datasets in the Paimon table. Are there any specific Paimon or Flink
settings we can adjust to address the checkpointing issue we are seeing? We
would like the checkpoints to complete within a reasonable timeframe for
large datasets.

We appreciate any insights or recommendations you can offer.

Thanks,
Ramesh

Flink checkpoints timeout when Paimon table has millions of records

Reply via email to