Hello Alexander,

Thank you for the pointer, it looks like this part of the documentation has 
become outdated. Major compaction does indeed start separate operations for 
non-overlapping sections of the compaction space, but because of changes in the 
default configuration, we no longer have a guaranteed splitting of the space 
into b shards.

More precisely, because (since CASSANDRA-18945) we have a default 
min_sstable_size of 100MiB, flushes will often result in one sstable that 
covers the whole token space, thus overlapping with all the other sstables and 
creating a single overlap region covering the whole token space. Because of 
this, in many cases the major compaction operations created will be only one. 
The output will still be split into as many shards as the density of the data 
set calls for.

After CASSANDRA-18802 (which is not part of Cassandra 5 but is committed in 
trunk), that single operation will still be executed in parallel for every 
output shard.

If you need to have a set minimum parallelism regardless of the size of flushed 
sstables, try adjusting min_sstable_size to 0 or some value smaller than 100MiB 
that makes more sense for your usecase.

Regards,
Branimir
________________________________
From: Alexander Batyrshin <[email protected]>
Sent: Friday 28 November 2025 03:59
To: [email protected] <[email protected]>
Subject: [EXTERNAL] Cassandra-5 UCS and Major Compaction

Hello everyone, I have been testing UCS in Cassandra 5 and noticed that the 
behavior of major compaction diverges from the documentation. Since I am using 
the default value of base_shard = 4, I expected 4 compaction tasks to be 
initiated. However,

Hello everyone,

I have been testing UCS in Cassandra 5 and noticed that the behavior of major 
compaction diverges from the documentation.
Since I am using the default value of base_shard = 4, I expected 4 compaction 
tasks to be initiated.
However, in my case only a single task was launched, and it included all 
SSTables in the table.

My compaction settings: { 'base_shard_count': ‘4', 'class': 
'org.apache.cassandra.db.compaction.UnifiedCompactionStrategy', 
'scaling_parameters': ‘T4' }


Documentation excerpt below:

Major compaction
Under the working principles of UCS, a major compaction is an operation that 
compacts together all SSTables with (transitive) overlap, and whose output is 
split on shard boundaries appropriate for the expected resulting density.

In other words, a major compaction will result in b concurrent compactions, 
each containing all SSTables covered in each of the base shards. The output 
will be split on shard boundaries whose number depends on the total size of 
data contained in the shard.

Reply via email to