[ https://issues.apache.org/jira/browse/KAFKA-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Gustafson resolved KAFKA-14292. ------------------------------------- Fix Version/s: 3.4.0 3.3.2 Resolution: Fixed > KRaft broker controlled shutdown can be delayed indefinitely > ------------------------------------------------------------ > > Key: KAFKA-14292 > URL: https://issues.apache.org/jira/browse/KAFKA-14292 > Project: Kafka > Issue Type: Bug > Reporter: Jason Gustafson > Assignee: Alyssa Huang > Priority: Major > Fix For: 3.4.0, 3.3.2 > > > We noticed when rolling a kraft cluster that it took an unexpectedly long > time for one of the brokers to shutdown. In the logs, we saw the following: > {code:java} > Oct 11, 2022 @ 17:53:38.277 [Controller 1] The request from broker 8 to > shut down can not yet be granted because the lowest active offset 2283357 is > not greater than the broker's shutdown offset 2283358. > org.apache.kafka.controller.BrokerHeartbeatManager DEBUG > 2Oct 11, 2022 @ 17:53:38.277 [Controller 1] Updated the controlled shutdown > offset for broker 8 to 2283362. > org.apache.kafka.controller.BrokerHeartbeatManager DEBUG > 3Oct 11, 2022 @ 17:53:40.278 [Controller 1] Updated the controlled shutdown > offset for broker 8 to 2283366. > org.apache.kafka.controller.BrokerHeartbeatManager DEBUG > 4Oct 11, 2022 @ 17:53:40.278 [Controller 1] The request from broker 8 to > shut down can not yet be granted because the lowest active offset 2283361 is > not greater than the broker's shutdown offset 2283362. > org.apache.kafka.controller.BrokerHeartbeatManager DEBUG > 5Oct 11, 2022 @ 17:53:42.279 [Controller 1] The request from broker 8 to > shut down can not yet be granted because the lowest active offset 2283365 is > not greater than the broker's shutdown offset 2283366. > org.apache.kafka.controller.BrokerHeartbeatManager DEBUG > 6Oct 11, 2022 @ 17:53:42.279 [Controller 1] Updated the controlled shutdown > offset for broker 8 to 2283370. > org.apache.kafka.controller.BrokerHeartbeatManager DEBUG > 7Oct 11, 2022 @ 17:53:44.280 [Controller 1] The request from broker 8 to > shut down can not yet be granted because the lowest active offset 2283369 is > not greater than the broker's shutdown offset 2283370. > org.apache.kafka.controller.BrokerHeartbeatManager DEBUG > 8Oct 11, 2022 @ 17:53:44.281 [Controller 1] Updated the controlled shutdown > offset for broker 8 to 2283374. > org.apache.kafka.controller.BrokerHeartbeatManager DEBUG {code} > From what I can tell, it looks like the controller waits until all brokers > have caught up to the {{controlledShutdownOffset}} of the broker that is > shutting down before allowing it to proceed. Probably the intent is to make > sure they have all the leader and ISR state. > The problem is that the {{controlledShutdownOffset}} seems to be updated > after every heartbeat that the controller receives: > https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/QuorumController.java#L1996. > Unless all other brokers can catch up to that offset before the next > heartbeat from the shutting down broker is received, then the broker remains > in the shutting down state indefinitely. > In this case, it took more than 40 minutes before the broker completed > shutdown: > {code:java} > 1Oct 11, 2022 @ 18:36:36.105 [Controller 1] The request from broker 8 to > shut down has been granted since the lowest active offset 2288510 is now > greater than the broker's controlled shutdown offset 2288510. > org.apache.kafka.controller.BrokerHeartbeatManager INFO > 2Oct 11, 2022 @ 18:40:35.197 [Controller 1] The request from broker 8 to > unfence has been granted because it has caught up with the offset of it's > register broker record 2288906. > org.apache.kafka.controller.BrokerHeartbeatManager INFO{code} > It seems like the bug here is that we should not keep updating > {{controlledShutdownOffset}} if it has already been set. -- This message was sent by Atlassian Jira (v8.20.10#820010)