Hi banu:

> Not all old sst files are present. Few are removed (i think it is because of 
> compaction).


You are right, rocksdb implement delete a key by insert a entry with null 
value, the space will be release after compaction.


> Now how can I maintain check points size under control??.


Since rocksdb incremental checkpoint directly uploads the sst file (native 
format) , this generally leads to space amplification. Another factor that 
affects the checkpoint size when using rocksdb incremental checkpoint is the 
data compress of rocksdb. In general, it is difficult to accurately estimate 
the checkpoint size. 
Savepoint with canonical format can truly reflect the size of the state data, 
however, making such a savepoint is more expensive because all state data must 
be reorganized into a canonical format. 
I think rocksdb incremental checkpoint is a better choose, although the 
checkpoint size cannot be calculated accurately, it is generally positively 
correlated with the actual state size. When the job runs stably, the checkpoint 
size will fluctuate within a certain range (due to compaction), which can be 
easily observed during the launch preparation phase. Maybe someone with more 
experience can give more valuable advice.


> How can I find idle check point size of my project, I found below link but it 
> is not talking about parallelism.


What do you mean "idle checkpoint size" ?



——————————————

Best regards,

Feifan Wang




在 2024-06-19 17:35:13,"banu priya" <banuke...@gmail.com> 写道:

Hi Wang,


Thanks a lot for your reply.


Currently I have 2s window and check point interval as 10s. Minimum pass 
between check point is 5s. What happens is my check points size is growing 
gradually. I checked the content inside my rocks db local dir and also the 
shared checkpoints directory. Inside chk-x, I have _metaspace file which shows 
list of .sst files referred by that check point.


In that I can see that my very old .sst file is still present. I was expecting 
it to be cleaned.


Not all old sst files are present. Few are removed (i think it is because of 
compaction).


Now how can I maintain check points size under control??.


How can I find idle check point size of my project, I found below link but it 
is not talking about parallelism.


https://www.ververica.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines


Any help would really be appreciated :).


Thanks
Banu


On Wed, 19 Jun, 2024, 9:38 am banu priya, <banuke...@gmail.com> wrote:

Hi All, 


I have a flink job with key by, tumbling window(2sec window time &uses 
processing time)and aggregator.


How often should I run the check point??I don't need the data to be retained 
after 2s.  


I want to use incremental check point with rocksdb. 




Thanks
Banupriya 

Reply via email to