, 2024 12:45
> To: dev@flink.apache.org
> Subject: Re: [DISCUSS] FLIP-428: Fault Tolerance/Rescale Integration for
> Disaggregated State
>
> Hi Feifan,
>
> Sorry for the misunderstanding. As Hangxiang explained, the basic cleanup
> mechanism for remote working directory is
: Jinzhong Li
Sent: Thursday, March 28, 2024 12:45
To: dev@flink.apache.org
Subject: Re: [DISCUSS] FLIP-428: Fault Tolerance/Rescale Integration for
Disaggregated State
Hi Feifan,
Sorry for the misunderstanding. As Hangxiang explained, the basic cleanup
mechanism for remote working directory
;
> > >>
> > >>
> > >>On Wed, Mar 27, 2024 at 11:49 PM Yun Tang wrote:
> > >>
> > >>> Hi Jinzhong,
> > >>>
> > >>> The overall design looks good.
> > >>>
> > >>> I have two mino
ve two minor questions:
> >>>
> >>> 1. Why must we have another 'subTask-checkpoint-sub-dir' under the
> shared
> >>> directory? if we don't consider making TM ownership in this FLIP, this
> >>> design seems unnecessary.
> >>> 2. This FLIP forgets
aking TM ownership in this FLIP, this
>>> design seems unnecessary.
>>> 2. This FLIP forgets to mention the cleanup of the remote working
>>> directory in case of the taskmanager crushes, even though this is an open
>>> problem, we can still leave some space fo
re optimization.
>>
>> Best,
>> Yun Tang
>>
>>
>> From: Jinzhong Li
>> Sent: Monday, March 25, 2024 10:41
>> To: dev@flink.apache.org
>> Subject: Re: [DISCUSS] FLIP-428: Fault Tolerance/Rescale Integration for
>
Yun Tang
>
>
> From: Jinzhong Li
> Sent: Monday, March 25, 2024 10:41
> To: dev@flink.apache.org
> Subject: Re: [DISCUSS] FLIP-428: Fault Tolerance/Rescale Integration for
> Disaggregated State
>
> Hi Yue,
>
> Thanks for your co
: [DISCUSS] FLIP-428: Fault Tolerance/Rescale Integration for
Disaggregated State
Hi Yue,
Thanks for your comments.
The CURRENT is a special file that points to the latest manifest log
file. As Zakelly explained above, we could record the latest manifest
filename during sync phase, and write
Hi Yue,
Thanks for your comments.
The CURRENT is a special file that points to the latest manifest log
file. As Zakelly explained above, we could record the latest manifest
filename during sync phase, and write the filename into CURRENT snapshot
file during async phase.
Best,
Jinzhong
On Fri,
Hi Yue,
Thanks for bringing this up!
The CURRENT FILE is the special one, which should be snapshot during the
sync phase (temporary load into memory). Thus we can solve this.
Best,
Zakelly
On Fri, Mar 22, 2024 at 4:55 PM yue ma wrote:
> Hi jinzhong,
> Thanks for you reply. I still have some
Hi jinzhong,
Thanks for you reply. I still have some doubts about the first question. Is
there such a case
When you made a snapshot during the synchronization phase, you recorded the
current and manifest 8, but before asynchronous phase, the manifest reached
the size threshold and then the CURRENT
Hi Jeyhun,
Thanks for your thoughtful feedback!
> Why dont we consider an option where checkpoint directory just contains
> metadata. So, we do not need to copy the data all the time from working
> directory to the checkpointing directory.
> Basically, when checkpointing, 1) we mark files in
Hi Jinzhong,
Thanks for the FLIP. +1 for it.
I have a few questions:
- Why dont we consider an option where checkpoint directory just contains
metadata. So, we do not need to copy the data all the time from working
directory to the checkpointing directory.
Basically, when checkpointing, 1) we
Hi Yue,
Thanks for your feedback!
> 1. If we choose Option-3 for ForSt , how would we handle Manifest File
> ? Should we take a snapshot of the Manifest during the synchronization
phase?
IIUC, the GetLiveFiles() API in Option-3 can also catch the fileInfo of
Manifest files, and this api also
Hi Jinzhong
Thank you for initiating this FLIP.
I have just some minor question:
1. If we choice Option-3 for ForSt , how would we handle Manifest File
? Should we take snapshot of the Manifest during the synchronization phase?
Otherwise, may the Manifest and MetaInfo information be
Hi everyone,
This discussion has been open for a while and there are no new comments for
several days . As a sub-FLIP of FLIP-423 which is nearing a consensus, I
would like to start a vote after 72 hours.
Please let me know if you have any concerns, thanks!
Best,
Jinzhong
On Thu, Mar 7, 2024
Hi devs,
I'd like to start a discussion on a sub-FLIP of FLIP-423: Disaggregated
State Storage and Management[1], which is a joint work of Yuan Mei, Zakelly
Lan, Jinzhong Li, Hangxiang Yu, Yanfei Lei and Feng Wang:
- FLIP-428: Fault Tolerance/Rescale Integration for Disaggregated State
17 matches
Mail list logo