I have tested the TTL for hbase and found that it relies on compaction to remove old data . However, if a region has data that is older than TTL, and there is no trigger to compact it, then the data will remain there forever, wasting disk space and memory.

It appears at this state, to really remove data older than TTL we need to start a client side deletion request. This is really a pity because it is an more expensive way to get the job done. Another side effect of this is that as time goes on, we will end up with some small regions if the data are saved in chronological order in regions. It appears that hbase doesn't have a mechanism to merge 2 consecutive small regions into a bigger one at this time. So if data is saved in chronological order, sooner or later we will run out of capacity , even if the amount of data in hbase is small, because we have lots of regions with small storage space.

A much cheaper way to remove data older than TTL would be to remember the latest timestamp for the region in the .META. table and if the time is older than TTL, we just adjust the row in .META. and delete the store , without doing any compaction.

Can this be added to the hbase requirement for future release ?

Jimmy


Reply via email to