Hi,

I think Robert is right, state handles are deleted first, and then the
directory is deleted non-recursively.
If any exception occurs while removing the files, it will be combined with
the other exception (as suppressed).
So probably Flink failed to delete some files and then directory removal
failed because of that.
Can you share the full exception to check this?
And probably check what files exist there as Robert suggested.

Regards,
Roman


On Tue, Nov 17, 2020 at 10:38 AM Joshua Fan <joshuafat...@gmail.com> wrote:

> Hi Robert,
>
> When the `delete(Path f, boolean recursive)` recursive is false, hdfs
> will throw exception like below:
> [image: checkpoint-exception.png]
>
> Yours sincerely
> Josh
>
> On Thu, Nov 12, 2020 at 4:29 PM Robert Metzger <rmetz...@apache.org>
> wrote:
>
>> Hey Josh,
>>
>> As far as I understand the code CompletedCheckpoint.discard(), Flink is
>> removing all the files in StateUtil.bestEffortDiscardAllStateObjects, then
>> deleting the directory.
>>
>> Which files are left over in your case?
>> Do you see any exceptions on the TaskManagers?
>>
>> Best,
>> Robert
>>
>> On Wed, Nov 11, 2020 at 12:08 PM Joshua Fan <joshuafat...@gmail.com>
>> wrote:
>>
>>> Hi
>>>
>>> When a checkpoint should be deleted,
>>> FsCompletedCheckpointStorageLocation.disposeStorageLocation will be
>>> called.
>>> Inside it, fs.delete(exclusiveCheckpointDir, false) will do the delete
>>> action. I wonder why the recursive parameter is set to false? as the
>>> exclusiveCheckpointDir is truly a directory. in our hadoop, this causes
>>> the checkpoint cannot be removed.
>>> It is easy to change the recursive parameter to true, but is there any
>>> potential harm?
>>>
>>> Yours sincerely
>>> Josh
>>>
>>>

Reply via email to