Dominic Letz created CASSANDRA-8547: ---------------------------------------
Summary: Make RangeTombstone.Tracker.isDeleted() faster Key: CASSANDRA-8547 URL: https://issues.apache.org/jira/browse/CASSANDRA-8547 Project: Cassandra Issue Type: Improvement Components: Core Environment: 2.0.11 Reporter: Dominic Letz Attachments: rangetombstone.tracker.txt During compaction and repairs with many tombstones an exorbitant amount of time is spend in RangeTombstone.Tracker.isDeleted(). The amount of time spend there can be so big that compactions and repairs look "stalled" and the time remaining time estimated frozen at the same value for days. Using visualvm I've been sample profiling the code during execution and both in Compaction as well as during repairs found this. (point in time backtraces attached) Looking at the code the problem is obviously the linear scanning: {code} public boolean isDeleted(Column column) { for (RangeTombstone tombstone : ranges) { if (comparator.compare(column.name(), tombstone.min) >= 0 && comparator.compare(column.name(), tombstone.max) <= 0 && tombstone.maxTimestamp() >= column.timestamp()) { return true; } } return false; } {code} I would like to propose to change this and instead use a sorted list (e.g. RangeTombstoneList) here instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)