Thanks, Matthias. This is very helpful.

Regarding the checkpoint documentation, I was mostly looking for
information on how states from various tasks get serialized into one (or
more?) files on persistent storage. I'll check out the code pointers!

On Wed, Mar 31, 2021 at 7:07 AM Matthias Pohl <matth...@ververica.com>
wrote:

> Hi Deepthi,
> 1. Have you had a look at flink-benchmarks [1]? I haven't used it but it
> might be helpful.
> 2. Unfortunately, Flink doesn't provide metrics like that. But you might
> want to follow FLINK-21736 [2] for future developments.
> 3. Is there anything specific you are looking for? Unfortunately, I don't
> know any blogs for a more detailed summary. If you plan to look into the
> code CheckpointCoordinator [3] might be a starting point. Alternatively,
> something like MetadataV2V3SerializerBase [4] offers insights into how the
> checkpoints' metadata is serialized.
>
> Best,
> Matthias
>
> [1] https://github.com/apache/flink-benchmarks
> [2] https://issues.apache.org/jira/browse/FLINK-21736
> [3]
> https://github.com/apache/flink/blob/11550edbd4e1874634ec441bde4fe4952fc1ec4e/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java#L1493
> [4]
> https://github.com/apache/flink/blob/adaaed426c2e637b8e5ffa3f0d051326038d30aa/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/metadata/MetadataV2V3SerializerBase.java#L83
>
> On Tue, Mar 30, 2021 at 8:37 PM deepthi Sridharan <
> deepthi.sridha...@gmail.com> wrote:
>
>> Hi,
>>
>> I am trying to set up some benchmarking with a couple of IO options for
>> saving checkpoints and have a couple of questions :
>>
>> 1. Does flink come with any IO benchmarking tools? I couldn't find any. I
>> was hoping to use those to derive some insights about the storage
>> performance and extrapolate it for the checkpoint use case.
>>
>> 2. Are there any metrics pertaining to restore from checkpoints? The only
>> metric I can find is the last restore time, but neither the time it took to
>> read the checkpoints, nor the time it took to restore the operator/task
>> states seem to be covered. I am using RocksDB, but couldn't find any
>> metrics relating to how much time it took to restore the state backend from
>> rocksdb either.
>>
>> 3. I am trying to find documentation on how the states are serialized
>> into the checkpoint files from multiple operators and tasks to tailor the
>> testing use case, but can't seem to find any. Are there any bogs that go
>> into this detail or would reading the code be the only option?
>>
>> --
>> Thanks,
>> Deepthi
>>
>

-- 
Regards,
Deepthi

Reply via email to