Re: Understanding checkpoint/savepoint storage requirements

Robert Young Tue, 02 Apr 2024 13:08:46 -0700

Thank you both for the information!

Rob


On Thu, Mar 28, 2024 at 7:08 PM Asimansu Bera <asimansu.b...@gmail.com>
wrote:

> To add more details to it so that it will be clear why access to
> persistent object stores for all JVM processes are required for a job graph
> of Flink for consistent recovery.
> *JoB Manager:*
>
> Flink's JobManager writes critical metadata during checkpoints for fault
> tolerance:
>
>    - Job Configuration: Preserves job settings (parallelism, state
>    backend) for consistent restarts.
>    - Progress Information: Stores offsets (source/sink positions) to
>    resume processing from the correct point after failures.
>    - Checkpoint Counters: Provides metrics (ID, timestamp, duration) for
>    monitoring checkpointing behavior.
>
>
> *Task Managers:*
> While the JobManager handles checkpoint metadata, TaskManagers are the
> workhorses during Flink checkpoints. Here's what they do:
>
>    - State Snapshots: Upon receiving checkpoint instructions,
>    TaskManagers capture snapshots of their current state. This state includes
>    in-memory data and operator variables crucial for resuming processing.
>    - State Serialization: The captured state is transformed into a format
>    suitable for storage, often byte arrays. This serialized data represents
>    the actual application state.
>
>
> A good network connection bandwidth is very crucial to write the large
> state quicker to HDFS/S3 object store from all operator states so that Task
> Manager could write it quickly. Oftentimes, some customers use NFS as a
> persistent store which is not recommended as NFS is slow and slows down the
> checkpointing.
>
> -A
>
>
> On Wed, Mar 27, 2024 at 7:52 PM Feifan Wang <zoltar9...@163.com> wrote:
>
>> Hi Robert :
>>
>> Your understanding are right !
>> Add some more information : JobManager not only responsible for cleaning
>> old checkpoints, but also needs to write metadata file to checkpoint
>> storage after all taskmanagers have taken snapshots.
>>
>> -------------------------------
>> Best
>> Feifan Wang
>>
>> At 2024-03-28 06:30:54, "Robert Young" <robertyoun...@gmail.com> wrote:
>>
>> Hi all, I have some questions about checkpoint and savepoint storage.
>>
>> From what I understand a distributed, production-quality job with a lot
>> of state should use durable shared storage for checkpoints and savepoints.
>> All job managers and task managers should access the same volume. So
>> typically you'd use hadoop, S3, Azure etc.
>>
>> In the docs [1] it states for state.checkpoints.dir: "The storage path
>> must be accessible from all participating processes/nodes(i.e. all
>> TaskManagers and JobManagers)."
>>
>> I want to understand why that is exactly. Here's my understanding:
>>
>> 1. The JobManager is responsible for cleaning old checkpoints, so it
>> needs access to all the files written out by all the task managers so it
>> can remove them.
>> 2. For recovery/rescaling if all nodes share the same volume then
>> TaskManagers can read/redistribute the checkpoint data easily, since the
>> volume is shared.
>>
>> Is that correct? Are there more aspects to why the directory must be
>> shared across the processes?
>>
>> Thank you,
>> Rob Young
>>
>> 1.
>> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/datastream/fault-tolerance/checkpointing/#related-config-options
>>
>>

Re: Understanding checkpoint/savepoint storage requirements

Reply via email to