[ https://issues.apache.org/jira/browse/KAFKA-10723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jun Rao resolved KAFKA-10723. ----------------------------- Fix Version/s: 2.8.0 Resolution: Fixed merged to trunk > LogManager leaks internal thread pool activity during shutdown > -------------------------------------------------------------- > > Key: KAFKA-10723 > URL: https://issues.apache.org/jira/browse/KAFKA-10723 > Project: Kafka > Issue Type: Bug > Reporter: Kowshik Prakasam > Assignee: Kowshik Prakasam > Priority: Major > Fix For: 2.8.0 > > > *TL;DR:* > The asynchronous shutdown in {{LogManager}} has the shortcoming that if > during shutdown any of the internal futures fail, then we do not always > ensure that all futures are completed before {{LogManager.shutdown}} returns. > As a result, despite the shut down completed message from KafkaServer is seen > in the error logs, some futures continue to run from inside LogManager > attempting to close the logs. This is misleading and it could possibly break > the general rule of avoiding post-shutdown activity in the Broker. > *Description:* > When LogManager is shutting down, exceptions in log closure are handled > [here|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogManager.scala#L497-L501]. > However, this > [line|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogManager.scala#L502] > in the finally clause shuts down the thread pools *asynchronously*. The > code: _threadPools.foreach(.shutdown())_ initiates an orderly shutdown (for > each thread pool) in which previously submitted tasks are executed, but no > new tasks will be accepted (see javadoc link > [here|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html#shutdown()])_._ > As a result, if there is an exception during log closure, some of the thread > pools which are closing logs could be leaked and continue to run in the > background, after the control returns to the caller (i.e. {{KafkaServer}}). > As a result, even after the "shut down completed" message is seen in the > error logs (originating from {{KafkaServer}} shutdown sequence), log closures > continue to happen in the background, which is misleading. > > *Proposed options for fixes:* > It seems useful that we maintain the contract with {{KafkaServer}} that after > {{LogManager.shutdown}} is called once, all tasks that close the logs are > guaranteed to have completed before the call returns. There are probably > couple different ways to fix this: > # Replace {{_threadPools.foreach(.shutdown())_ with > _threadPools.foreach(.awaitTermination())_}}{{.}} This ensures that we wait > for all threads to be shutdown before returning the {{_LogManager.shutdown_}} > call. > # Skip creating of checkpoint and clean shutdown file only for the affected > directory if any of its futures throw an error. We continue to wait for all > futures to complete for all directories. This can require some changes to > [this for > loop|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogManager.scala#L481-L496], > so that we wait for all futures to complete regardless of whether one of > them threw an error. -- This message was sent by Atlassian Jira (v8.3.4#803005)