Re: checkpoint interval and hdfs file capacity

2020-11-11 Thread Congxian Qiu
Hi Currently, checkpoint discard logic was executed in Executor[1], maybe it will not be deleted so quickly [1] https://github.com/apache/flink/blob/91404f435f20c5cd6714ee18bf4ccf95c81fb73e/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointsCleaner.java#L45 Best,

Re: checkpoint interval and hdfs file capacity

2020-11-09 Thread lec ssmi
Thanks. I have some jobs with the checkpoint interval 1000ms. And the HDFS files grow too large to work normally . What I am curious about is, are writing and deleting performed synchronously? Is it possible to add too fast to delete old files? Congxian Qiu 于2020年11月10日周二 下午2:16写道: > Hi >

Re: checkpoint interval and hdfs file capacity

2020-11-09 Thread Congxian Qiu
Hi No matter what interval you set, Flink will take care of the checkpoints(remove the useless checkpoint when it can), but when you set a very small checkpoint interval, there may be much high pressure for the storage system(here is RPC pressure of HDFS NN). Best, Congxian lec ssmi

checkpoint interval and hdfs file capacity

2020-11-09 Thread lec ssmi
Hi, if I set the checkpoint interval to be very small, such as 5 seconds, will there be a lot of state files on HDFS? In theory, no matter what the interval is set, every time you checkpoint, the old file will be deleted and new file will be written, right?