Re: LCS, range tombstones, and eviction

Blake Eggleston Thu, 11 May 2017 11:59:23 -0700

Hi Stefano,

Based on what I understood reading the docs, if the ratio of garbage 
collectable tomstones exceeds the "tombstone_threshold", C* should start 
compacting and evicting.


If there are no other normal compaction tasks to be run, LCS will attempt to 
compact the sstables it estimates it will be able to drop the most tombstones 
from. It does this by estimating the number of tombstones an sstable has that 
have passed the gc grace period. Whether or not a tombstone will actually be 
evicted is more complicated. Even if a tombstone has passed gc grace, it can't 
be dropped if the data it's deleting still exists in another sstable, otherwise 
the data would appear to return. So, a tombstone won't be dropped if there is 
data for the same partition in other sstables that is older than the tombstone 
being evaluated for eviction.

I am quite puzzled however by what might happen when dealing with range 
tombstones. In that case a single tombstone might actually stand for an 
arbitrary number of normal tombstones. In other words, do range tombstones 
contribute to the "tombstone_threshold"? If so, how?

From what I can tell, each end of the range tombstone is counted as a single 
tombstone tombstone. So a range tombstone effectively contributes '2' to the 
count of tombstones for an sstable. I'm not 100% sure, but I haven't seen any 
sstable writing logic that tracks open tombstones and counts covered cells as 
tombstones. So, it's likely that the effect of range tombstones covering many 
rows are under represented in the droppable tombstone estimate.

I am also a bit confused by the "tombstone_compaction_interval". If I am 
dealing with a big partition in LCS which is receiving new records every day, 
and a weekly incremental repair job continously anticompacting the data and 
thus creating SStables, what is the likelhood of the default interval 
(10 days) to be actually hit?

It will be hit, but probably only in the repaired data. Once the data is marked 
repaired, it shouldn't be anticompacted again, and should get old enough to 
pass the compaction interval. That shouldn't be an issue though, because you 
should be running repair often enough that data is repaired before it can ever 
get past the gc grace period. Otherwise you'll have other problems. Also, keep 
in mind that tombstone eviction is a part of all compactions, it's just that 
occasionally a compaction is run specifically for that purpose. Finally, you 
probably shouldn't run incremental repair on data that is deleted. There is a 
design flaw in the incremental repair used in pre-4.0 of cassandra that can 
cause consistency issues. It can also cause a *lot* of over streaming, so you 
might want to take a look at how much streaming your cluster is doing with full 
repairs, and incremental repairs. It might actually be more efficient to run 
full repairs.

Hope that helps,

Blake

On May 11, 2017 at 7:16:26 AM, Stefano Ortolani (ostef...@gmail.com) wrote:

Hi all,

I am trying to wrap my head around how C* evicts tombstones when using LCS.
Based on what I understood reading the docs, if the ratio of garbage 
collectable tomstones exceeds the "tombstone_threshold", C* should start 
compacting and evicting.

I am quite puzzled however by what might happen when dealing with range 
tombstones. In that case a single tombstone might actually stand for an 
arbitrary number of normal tombstones. In other words, do range tombstones 
contribute to the "tombstone_threshold"? If so, how?

I am also a bit confused by the "tombstone_compaction_interval". If I am 
dealing with a big partition in LCS which is receiving new records every day, 
and a weekly incremental repair job continously anticompacting the data and 
thus creating SStables, what is the likelhood of the default interval 
(10 days) to be actually hit?

Hopefully somebody will be able to shed some lights here!

Thanks in advance! 
Stefano

Re: LCS, range tombstones, and eviction

Reply via email to