Re: Task Manager restart and RocksDB incremental checkpoints issue.

Yanfei Lei Mon, 14 Nov 2022 20:00:46 -0800

Hi Vidya,
Thanks for sharing your setups.

> *What do you think about the older files that are pulled from the
hostpath to mount path should be deleted first and then create the new
instanceBasepath?*
I think that deleting the old instance path after restarting is hard to
achieve with the current implementation, because the random UUID of old
instancePath isn't recorded and we don't know which path to delete.


> *What is the general design recommendation is such cases where RocksDB
has mount path to a Volume on host node?*
For me, I usually use emptyDir[1] to sidestep the deleting problem, let k8s
be responsible for deleting old rocksdb instancePath: when a Pod is removed
from a node for any reason, the rocksdb instancePath in the emptyDir is
deleted as the pod is freed.

Hope this will be useful, maybe there are some alternatives.

[1] https://kubernetes.io/docs/concepts/storage/volumes/#emptydir

--
Best,
Yanfei

Martijn Visser <martijnvis...@apache.org> 于2022年11月15日周二 05:16写道：

> Hi Vidya,
>
> It is, until Flink 1.17 is released beginning of next year.
> While that code might not have been changed, there can be other changes
> that have impact. See for example this blog
> https://flink.apache.org/2022/05/06/restore-modes.html
>
> Best regards,
>
> Martijn
>
> Op ma 14 nov. 2022 om 17:45 schreef Vidya Sagar Mula <mulasa...@gmail.com>
>
>> Hi Martjin,
>>
>> Thanks for the info. We are in the process of moving to 1.15. Is this
>> version actively supported by community?
>>
>> And coming to my original and follow up questions, I checked the
>> RocksDbStatebackend code from 1.11 and 1.15, it is same.
>>
>> Given K8s configuration with Volume and mounth path, I would like to know
>> the design recommendation for Rocks Db local storage path.
>>
>> Thanks,
>> Vidya
>>
>> On Mon, Nov 14, 2022 at 6:57 AM Martijn Visser <martijnvis...@apache.org>
>> wrote:
>>
>>> Hi Vidya,
>>>
>>> Given that you are still on Flink 1.11 which was released in July 2020
>>> and no longer supported by the community, I would recommend first upgrading
>>> to a later, supported version like Flink 1.16.
>>>
>>> Best regards,
>>>
>>> Martijn
>>>
>>> On Sat, Nov 12, 2022 at 8:07 PM Vidya Sagar Mula <mulasa...@gmail.com>
>>> wrote:
>>>
>>>> Hi Yanfei,
>>>>
>>>> Thank you for the response. I have follow up answer and questions.
>>>>
>>>> I have two set ups. One is on the local environment and the other one
>>>> is a deployment scenario that is on K8s.
>>>>
>>>> - In K8s set up, I have Volume on the cluster node and mount path is
>>>> specified for the RockDB checkpoints location. So, when the Application TM
>>>> POD is restarted, the older checkpoints are read back from the host path
>>>> again when the TM is UP again.
>>>> In this case, RocksDB local directory is pulled with all the older data
>>>> which is not useful for the JOB ID as the "instanceBasePath" is calculated
>>>> with new random UUID.
>>>>
>>>> Questions:
>>>> - What do you think about the older files that are pulled from the
>>>> hostpath to mount path should be deleted first and then create the new
>>>> instanceBasepath?
>>>> Otherwise, we are going to be ended with the GBs of unwanted data.
>>>>
>>>> What is the general design recommendation is such cases where RocksDB
>>>> has mount path to a Volume on host node?
>>>> Please clarify.
>>>>
>>>> Thanks,
>>>> Vidya Sagar.
>>>>
>>>>
>>>> On Thu, Nov 10, 2022 at 7:52 PM Yanfei Lei <fredia...@gmail.com> wrote:
>>>>
>>>>> Hi Vidya Sagar,
>>>>>
>>>>> Could you please share the reason for TaskManager restart? If the
>>>>> machine or JVM process of TaskManager crashes, the
>>>>> `RocksDBKeyedStateBackend` can't be disposed/closed normally,  so the
>>>>> existing rocksdb instance directory would remain.
>>>>>
>>>>> BTW, if you use Application Mode on k8s, if a TaskManager(pod)
>>>>> crashes, the rocksdb directory would be deleted as the pod is released.
>>>>>
>>>>> Best,
>>>>> Yanfei
>>>>>
>>>>> Vidya Sagar Mula <mulasa...@gmail.com> 于2022年11月11日周五 01:39写道：
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am using RocksDB state backend for incremental checkpointing with
>>>>>> Flink 1.11 version.
>>>>>>
>>>>>> Question:
>>>>>> ----------
>>>>>> For a given Job ID, Intermediate RocksDB checkpoints are stored under
>>>>>> the path defined with ""
>>>>>>
>>>>>> The files are stored with "_jobID+ radom UUID" prefixed to the
>>>>>> location.
>>>>>>
>>>>>> Case : 1
>>>>>> ---------
>>>>>> - When I cancel the job, then all the rocksDB checkpoints are deleted
>>>>>> properly from the location corresponding to that JobId.
>>>>>> (based on "instanceBasePath" variable stored in
>>>>>> RocksDBKeyedStateBackend object).
>>>>>> "NO Issue here. Working as expected".
>>>>>>
>>>>>> Case : 2
>>>>>> ---------
>>>>>> - When my TaskManger is restarted, the existing rocksDb checkpoints
>>>>>> are not deleted.
>>>>>> New "instanceBasePath" is constructed with the new Random UUID
>>>>>> appended to the directory.
>>>>>> And, old checkpoint directories are still there.
>>>>>>
>>>>>> questions:
>>>>>> - Is this expected behaviour not to delete the existing checkPoint
>>>>>> dirs under the rocksDB local directory?
>>>>>> - I see the "StreamTaskStateInitializerImpl.java", where new
>>>>>> StateBackend objects are created. In this case, new directory is created
>>>>>> for this Job ID appended with new random UUID.
>>>>>> What happens to the old Directories. Are they going to be purged
>>>>>> later on?
>>>>>> If not, the disk is going to filled up with the older checkpoints.
>>>>>> Please clarify this.
>>>>>>
>>>>>> Thanks,
>>>>>> Vidya Sagar.
>>>>>>
>>>>>
>>>>>
>>>>> --
> Martijn
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser
>

Re: Task Manager restart and RocksDB incremental checkpoints issue.

Reply via email to