On 2012-11-08, at 1:12 PM, B. Todd Burruss <bto...@gmail.com> wrote:

> we are having the problem where we have huge SSTABLEs with tombstoned data in 
> them that is not being compacted soon enough (because size tiered compaction 
> requires, by default, 4 like sized SSTABLEs).  this is using more disk space 
> than we anticipated.
> 
> we are very write heavy compared to reads, and we delete the data after N 
> number of days (depends on the column family, but N is around 7 days)
> 
> my question is would leveled compaction help to get rid of the tombstoned 
> data faster than size tiered, and therefore reduce the disk space usage

From my experience, levelled compaction makes space reclamation after deletes 
even less predictable than sized-tier.

The reason is that deletes, like all mutations, are just recorded into 
sstables.  They enter level0, and get slowly, over time, promoted upwards to 
levelN.

Depending on your *total* mutation volume VS your data set size, this may be 
quite a slow process.  This is made even worse if the size of the data you're 
deleting (say, an entire row worth several hundred kilobytes) is to-be-deleted 
by a small row-level tombstone.  If the row is sitting in level 4, the 
tombstone won't impact it until enough data has pushed over all existing data 
in level3, level2, level1, level0

Finally, to guard against the tombstone missing any data, the tombstone itself 
is not candidate for removal (I believe even after gc_grace has passed) unless 
it's reached the highest populated level in levelled compaction.  This means if 
you have 4 levels and issue a ton of deletes (even deletes that will never 
impact existing data), these tombstones are deadweight that cannot be purged 
until they hit level4.

For a write-heavy workload, I recommend you stick with sized-tier.  You have 
several options at your disposal (compaction min/max thresholds, gc_grace) to 
move things along.  If that doesn't help, I've heard of some fairly reputable 
people doing some fairly blasphemous things (major compactions every night).


Reply via email to