Krzysztof Piecuch created KAFKA-15885: -----------------------------------------
Summary: Reduce lock contention when cleaning topics Key: KAFKA-15885 URL: https://issues.apache.org/jira/browse/KAFKA-15885 Project: Kafka Issue Type: Improvement Components: log cleaner Reporter: Krzysztof Piecuch Somewhat similar to KAFKA-14213, there are a couple of subroutines which require the same lock which results in throttling compaction speed and limits parallelism. There's a couple of problems here: # LogCleanerManager.grabFilthiestCompactedLog - iterates through a list of partitions multiple times, all of this while holding a lock # LogCleanerManager.grabFilthiestCompactedLog doesn't cache anything and returns only 1 item at a time - method is issued every time a cleaner thread asks for a new partition to compact # LogCleanerManager.checkCleaningAborted - a quick check which: ## shares a lock with grabFilthiestCompactedLog ## is executed every time a LogCleaner reads bufsize data to compact # LogCleaner's bufsize is limited to 1G / (number of log cleaner threads) Here's the scenario where this design falls short: * I have 15k partitions * all of which need to be compacted fairly often but it doesn't take a lot of time to compact them * Most of the cputime spent by cleaner threads is spent on grabFilthiestCompactedLog ** so the other cleaners can't do anything since they need to acquire a lock to read data to compact as per 3.1. and 3.2. ** because of 4. log cleaners run out of work to do as soon as grabFilthiestLog is called * Negative performance scaling - increasing # of log cleaner threads decreases log cleaner's bufsize which makes them hammer the lock mentioned in 3.1. and 3.2. more often I suggest: * making LogCleanerManager to use more fine-grained locking (ex. RW lock for checkCleaningAborted data structures) to decrease the effect of negative performance scaling * making LogCleanerManager.grabFilthiestLog faster on average: ** we don't need grabFilthiestLog to be 100% accurate ** we can try caching candidates for "filthiestLog" and re-calculate the cache every 1 minute or so. ** change the algorithm to probabilistic sampling (get 100 topics and pick the worst one?) or even round-robin * Alternatively we could make LogCleaner's bufsize to allow values higher -- This message was sent by Atlassian Jira (v8.20.10#820010)