Hi banu:
> Not all old sst files are present. Few are removed (i think it is because of > compaction). You are right, rocksdb implement delete a key by insert a entry with null value, the space will be release after compaction. > Now how can I maintain check points size under control??. Since rocksdb incremental checkpoint directly uploads the sst file (native format) , this generally leads to space amplification. Another factor that affects the checkpoint size when using rocksdb incremental checkpoint is the data compress of rocksdb. In general, it is difficult to accurately estimate the checkpoint size. Savepoint with canonical format can truly reflect the size of the state data, however, making such a savepoint is more expensive because all state data must be reorganized into a canonical format. I think rocksdb incremental checkpoint is a better choose, although the checkpoint size cannot be calculated accurately, it is generally positively correlated with the actual state size. When the job runs stably, the checkpoint size will fluctuate within a certain range (due to compaction), which can be easily observed during the launch preparation phase. Maybe someone with more experience can give more valuable advice. > How can I find idle check point size of my project, I found below link but it > is not talking about parallelism. What do you mean "idle checkpoint size" ? —————————————— Best regards, Feifan Wang 在 2024-06-19 17:35:13,"banu priya" <banuke...@gmail.com> 写道: Hi Wang, Thanks a lot for your reply. Currently I have 2s window and check point interval as 10s. Minimum pass between check point is 5s. What happens is my check points size is growing gradually. I checked the content inside my rocks db local dir and also the shared checkpoints directory. Inside chk-x, I have _metaspace file which shows list of .sst files referred by that check point. In that I can see that my very old .sst file is still present. I was expecting it to be cleaned. Not all old sst files are present. Few are removed (i think it is because of compaction). Now how can I maintain check points size under control??. How can I find idle check point size of my project, I found below link but it is not talking about parallelism. https://www.ververica.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines Any help would really be appreciated :). Thanks Banu On Wed, 19 Jun, 2024, 9:38 am banu priya, <banuke...@gmail.com> wrote: Hi All, I have a flink job with key by, tumbling window(2sec window time &uses processing time)and aggregator. How often should I run the check point??I don't need the data to be retained after 2s. I want to use incremental check point with rocksdb. Thanks Banupriya