Cameron Zemek created CASSANDRA-18773: -----------------------------------------
Summary: Compactions are slow Key: CASSANDRA-18773 URL: https://issues.apache.org/jira/browse/CASSANDRA-18773 Project: Cassandra Issue Type: Improvement Reporter: Cameron Zemek Attachments: stress.yaml I have noticed that compactions involving a lot of sstables are very slow (for example major compactions). I have attached a cassandra stress profile that can generate such a dataset under ccm. In my local test I have 2567 sstables at 4Mb each. I added code to track wall clock time of various parts of the code. One problematic part is ManyToOne constructor. Tracing through the code for every partition creating a ManyToOne for all the sstable iterators for each partition. In my local test get a measy 60Kb/sec read speed, and bottlenecked on single core CPU (since this code is single threaded) with it spending 85% of the wall clock time in ManyToOne constructor. As another datapoint to show its the merge iterator part of the code using the cfstats from [https://github.com/instaclustr/cassandra-sstable-tools/] which reads all the sstables but does no merging gets 26Mb/sec read speed. Tracking back from ManyToOne call I see this in UnfilteredPartitionIterators::merge {code:java} for (int i = 0; i < toMerge.size(); i++) { if (toMerge.get(i) == null) { if (null == empty) empty = EmptyIterators.unfilteredRow(metadata, partitionKey, isReverseOrder); toMerge.set(i, empty); } } {code} Not sure what purpose of creating these empty rows are. But on a whim I removed all these empty iterators before passing to ManyToOne and then all the wall clock time shifted to CompactionIterator::hasNext() and read speed increased to 1.5Mb/s. So there are further bottlenecks in this code path it seems, but the first is this ManyToOne and having to build it for every partition read. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org