sankalp kohli created CASSANDRA-7331:
----------------------------------------

             Summary: Improve Droppable Tombstone compaction
                 Key: CASSANDRA-7331
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7331
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: sankalp kohli
            Priority: Minor


I was thinking about this idea so creating a JIRA to discuss it. 
Currently we do compaction for stables which have more than configurable number 
of droppable tombstones. 
Also there is another JIRA CASSANDRA-7019 to do compactions involving multiple 
stables from different levels which will be triggered based of same threshold. 

One of the areas of improvement here to pick better candidates will be to find 
out if a tombstone can actually get rid of data in other stables. 
We can add a byte to tombstone to keep track of whether it has knocked off the 
actual data(for which it is there) or not. 
All tombstones will start out with 0 as its value. When it compacts with other 
stables and causes data to be deleted, it will be incremented. 
For cases where there are multiple updates and then a delete, this value can be 
more than 1 depending on how many updates came in before delete. 

If we have this, by looking at these numbers in tombstones, we can find a 
stable which by compacting, we will get rid of maximum data. We can also add a 
global number per stable which sums up these numbers. 

I am not sure how this will work with range tombstones and whether this will be 
useful.   



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to