Wei Deng created CASSANDRA-12464:
------------------------------------

             Summary: Investigate the potential improvement of parallelism on 
higher level compactions in LCS
                 Key: CASSANDRA-12464
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12464
             Project: Cassandra
          Issue Type: Improvement
          Components: Compaction
            Reporter: Wei Deng


According to LevelDB's design doc 
[here|https://github.com/google/leveldb/blob/master/doc/impl.html#L115-L116], 
"A compaction merges the contents of the picked files to produce a
sequence of level-(L+1) files", it will "switch to producing a new
level-(L+1) file after the current output file has reached the target
file size" (in our case 160MB), it will also "switch to a new output file when 
the key range of the current output file has grown enough to overlap more than 
ten level-(L+2) files". This is to ensure "that a later compaction
of a level-(L+1) file will not pick up too much data from level-(L+2)."

Our current code in LeveledCompactionStrategy doesn't implement this last rule, 
but we might be able to quickly implement it and see how much a compaction 
throughput improvement it can deliver. Potentially we can create a scenario 
where a number of large L0 SSTables are present (e.g. 200GB after switching 
from STCS) and let it to create thousands of L1 SSTables overflow, and see how 
fast LCS can digest this much data from L1 and properly upper-level them to 
completion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to