Wei Deng created CASSANDRA-11366:
------------------------------------

             Summary: Compaction backlog triggered backpressure to write 
operations
                 Key: CASSANDRA-11366
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11366
             Project: Cassandra
          Issue Type: Improvement
          Components: Local Write-Read Paths
            Reporter: Wei Deng


A general perception about Cassandra is that it is super efficient in taking 
writes and can handle very high write throughput, due to its simple and 
efficient commitlog/memtable write path. However, one aspect that is often 
overlooked by new Cassandra user is that when they ingest a lot of data at the 
highest throughput possible trying to push the hardware's limit, as soon as the 
compaction starts to kick in, a seemingly harmless write workload could create 
way more I/O and CPU consumption than their hardware can accommodate, and as a 
result, memtable flushes take much longer to finish due to I/O contentions from 
the compaction, thousands of SSTables start to show up on data disk because 
compaction is not able to process them all in time, and GC pauses are more 
frequent and impactful because read will have to touch way more SSTables than 
ideal situation. Depending on the compaction strategy a Cassandra table 
chooses, this kind of overwhelmed and backlogged situation can sometimes take a 
long time to clear up (for LeveledCompactionStrategy, or LCS, it could be 
hours) even after the write workload is removed.

Currently write has a back pressure mechanism called load shedding, and it will 
be triggered if the MutationStage gets too overwhelmed and a mutation takes 
more than 2 seconds to get acknowledgement, in that case, the mutation message 
will be dropped and a hint will be added by StorageProxy. However, this 
mechanism does not take into consideration of compaction backlogs. As a result, 
the compaction can be very much behind under heavy write workload and continue 
to accumulate more and more pending compactions and unprocessed SSTables, while 
the writes are flowing at the same rate as write timeouts are not triggered 
yet. For LCS and DTCS this can get out of control and become hard to recover. 
As a practical workaround, you can monitor the number of pending compactions, 
and if it starts going on an upward trend, reduce the write throughput. This 
has proved to be effectively. However, these two types of activities are 
usually conducted by different teams in an enterprise, so they are not always 
easy to coordinate. If there is some back pressure mechanism (to slow down the 
intake of the writes) that can be automatically triggered based on how much 
compaction is behind, it will make DBA's life easier.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to