Chris Lohfink created CASSANDRA-14654:
-----------------------------------------

             Summary: Reduce heap pressure during compactions
                 Key: CASSANDRA-14654
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14654
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Chris Lohfink
            Assignee: Chris Lohfink


Small partition compactions are painfully slow with a lot of overhead per 
partition. There also tends to be an excess of objects created (ie 200-700mb/s) 
per compaction thread.

The EncoderStats walks through all the partitions and with mergeWith it will 
create a new one per partition as it walks the potentially millions of 
partitions. In a test scenario of about 600byte partitions and a couple 100mb 
of data this consumed ~16% of the heap pressure. Changing this to instead 
mutably track the min values and create one in a EncodingStats.Collector 
brought this down considerably (but not 100% since the 
UnfilteredRowIterator.stats() still creates 1 per partition).

The KeyCacheKey makes a full copy of the underlying byte array in 
ByteBufferUtil.getArray in its constructor. This is the dominating heap 
pressure as there are more sstables. By changing this to just keeping the 
original it completely eliminates the current dominator of the compactions and 
also improves read performance.

Minor tweak included for this as well for operators when compactions are behind 
on low read clusters is to make the preemptive opening setting a hotprop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to