I have tested the TTL for hbase and found that it relies on compaction to
remove old data . However, if a region has data that is older
than TTL, and there is no trigger to compact it, then the data will remain
there forever, wasting disk space and memory.
It appears at this state, to really remove data older than TTL we need to
start a client side deletion request. This is really a pity because
it is an more expensive way to get the job done. Another side effect of
this is that as time goes on, we will end up with some small
regions if the data are saved in chronological order in regions. It appears
that hbase doesn't have a mechanism to merge 2 consecutive
small regions into a bigger one at this time. So if data is saved in
chronological order, sooner or later we will run out of capacity , even if
the amount of data in hbase is small, because we have lots of regions with
small storage space.
A much cheaper way to remove data older than TTL would be to remember the
latest timestamp for the region in the .META. table
and if the time is older than TTL, we just adjust the row in .META. and
delete the store , without doing any compaction.
Can this be added to the hbase requirement for future release ?
Jimmy