Re: why not flink delete the checkpoint directory recursively?

2020-11-25 Thread Joshua Fan
Hi Roman and Robert, Thank you. I have checked the code and the checkpoint deleting failure case. Yes, Flink will delete the meta file and operator state file at first, then delete the checkpoint dir which is truly an empty dir. The root cause of the failure of deleting checkpoint is the hadoop

Re: why not flink delete the checkpoint directory recursively?

2020-11-17 Thread Khachatryan Roman
Hi, I think Robert is right, state handles are deleted first, and then the directory is deleted non-recursively. If any exception occurs while removing the files, it will be combined with the other exception (as suppressed). So probably Flink failed to delete some files and then directory removal

Re: why not flink delete the checkpoint directory recursively?

2020-11-17 Thread Joshua Fan
Hi Robert, When the `delete(Path f, boolean recursive)` recursive is false, hdfs will throw exception like below: [image: checkpoint-exception.png] Yours sincerely Josh On Thu, Nov 12, 2020 at 4:29 PM Robert Metzger wrote: > Hey Josh, > > As far as I understand the code

Re: why not flink delete the checkpoint directory recursively?

2020-11-12 Thread Robert Metzger
Hey Josh, As far as I understand the code CompletedCheckpoint.discard(), Flink is removing all the files in StateUtil.bestEffortDiscardAllStateObjects, then deleting the directory. Which files are left over in your case? Do you see any exceptions on the TaskManagers? Best, Robert On Wed, Nov

why not flink delete the checkpoint directory recursively?

2020-11-11 Thread Joshua Fan
Hi When a checkpoint should be deleted, FsCompletedCheckpointStorageLocation. disposeStorageLocation will be called. Inside it, fs.delete(exclusiveCheckpointDir, false) will do the delete action. I wonder why the recursive parameter is set to false? as the exclusiveCheckpointDir is truly a