Dominic Letz created CASSANDRA-8547:
---------------------------------------

             Summary: Make RangeTombstone.Tracker.isDeleted() faster
                 Key: CASSANDRA-8547
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8547
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
         Environment: 2.0.11
            Reporter: Dominic Letz
         Attachments: rangetombstone.tracker.txt

During compaction and repairs with many tombstones an exorbitant amount of time 
is spend in RangeTombstone.Tracker.isDeleted().
The amount of time spend there can be so big that compactions and repairs look 
"stalled" and the time remaining time estimated frozen at the same value for 
days.

Using visualvm I've been sample profiling the code during execution and both in 
Compaction as well as during repairs found this. (point in time backtraces 
attached)

Looking at the code the problem is obviously the linear scanning:
{code}
        public boolean isDeleted(Column column)
        {
            for (RangeTombstone tombstone : ranges)
            {
                if (comparator.compare(column.name(), tombstone.min) >= 0
                    && comparator.compare(column.name(), tombstone.max) <= 0
                    && tombstone.maxTimestamp() >= column.timestamp())
                {
                    return true;
                }
            }
            return false;
        }
{code}

I would like to propose to change this and instead use a sorted list (e.g. 
RangeTombstoneList) here instead.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to