[ 
https://issues.apache.org/jira/browse/KAFKA-16709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16709.
-------------------------------
    Resolution: Fixed

> alter logDir within broker might cause log cleanup hanging
> ----------------------------------------------------------
>
>                 Key: KAFKA-16709
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16709
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 3.7.0
>            Reporter: Luke Chen
>            Assignee: Luke Chen
>            Priority: Major
>             Fix For: 3.8.0
>
>
> When doing alter replica logDirs, we'll create a future log and pause log 
> cleaning for the partition( 
> [here|https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/server/ReplicaManager.scala#L1200]).
>  And this log cleaning pausing will resume after alter replica logDirs 
> completes 
> ([here|https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/log/LogManager.scala#L1254]).
>  And when in the resuming log cleaning, we'll decrement 1 for the 
> LogCleaningPaused count. Once the count reached 0, the cleaning pause is 
> really resuming. 
> ([here|https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/log/LogCleanerManager.scala#L310]).
>  For more explanation about the logCleaningPaused state can check 
> [here|https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/log/LogCleanerManager.scala#L55].
>  
> But, there's still one factor that could increase the LogCleaningPaused 
> count: leadership change 
> ([here|https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/server/ReplicaManager.scala#L2126]).
>  When there's a leadership change, we'll check if there's a future log in 
> this partition, if so, we'll create future log and pauseCleaning 
> (LogCleaningPaused count + 1). So, if during the alter replica logDirs:
>  # alter replica logDirs for tp0 triggered (LogCleaningPaused count = 1)
>  # tp0 leadership changed (LogCleaningPaused count = 2)
>  # alter replica logDirs completes, resuming logCleaning (LogCleaningPaused 
> count = 1)
>  # LogCleaning keeps paused because the count is always >  0
>  
> The log cleaning is not just related to compacting logs, but also affecting 
> the normal log retention processing, which means, the log retention for these 
> paused partitions will be pending. This issue can be fixed when broker 
> restarted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to