I don't know about the root cause, but if you're trying to solve the issue, restarting the controller broker pod should do the trick.
Fares On Thu, Mar 3, 2022 at 8:38 AM Dhirendra Singh <dhirendr...@gmail.com> wrote: > Hi All, > > We have kafka cluster running in kubernetes. kafka version we are using is > 2.7.1. > Every night zookeeper servers and kafka brokers are restarted. > After the nightly restart of the zookeeper servers some partitions remain > stuck in under replication. This happens randomly but not at every nightly > restart. > Partitions remain under replicated until kafka broker with the partition > leader is restarted. > For example partition 4 of consumer_offsets topic remain under replicated > and we see following error in the log... > > [2022-02-28 04:01:20,217] WARN [Partition __consumer_offsets-4 broker=1] > Controller failed to update ISR to PendingExpandIsr(isr=Set(1), > newInSyncReplicaId=2) due to unexpected UNKNOWN_SERVER_ERROR. Retrying. > (kafka.cluster.Partition) > [2022-02-28 04:01:20,217] ERROR [broker-1-to-controller] Uncaught error in > request completion: (org.apache.kafka.clients.NetworkClient) > java.lang.IllegalStateException: Failed to enqueue `AlterIsr` request with > state LeaderAndIsr(leader=1, leaderEpoch=2728, isr=List(1, 2), > zkVersion=4719) for partition __consumer_offsets-4 > at kafka.cluster.Partition.sendAlterIsrRequest(Partition.scala:1403) > at > > kafka.cluster.Partition.$anonfun$handleAlterIsrResponse$1(Partition.scala:1438) > at kafka.cluster.Partition.handleAlterIsrResponse(Partition.scala:1417) > at > > kafka.cluster.Partition.$anonfun$sendAlterIsrRequest$1(Partition.scala:1398) > at > > kafka.cluster.Partition.$anonfun$sendAlterIsrRequest$1$adapted(Partition.scala:1398) > at > > kafka.server.AlterIsrManagerImpl.$anonfun$handleAlterIsrResponse$8(AlterIsrManager.scala:166) > at > > kafka.server.AlterIsrManagerImpl.$anonfun$handleAlterIsrResponse$8$adapted(AlterIsrManager.scala:163) > at scala.collection.immutable.List.foreach(List.scala:333) > at > > kafka.server.AlterIsrManagerImpl.handleAlterIsrResponse(AlterIsrManager.scala:163) > at > > kafka.server.AlterIsrManagerImpl.responseHandler$1(AlterIsrManager.scala:94) > at > > kafka.server.AlterIsrManagerImpl.$anonfun$sendRequest$2(AlterIsrManager.scala:104) > at > > kafka.server.BrokerToControllerRequestThread.handleResponse(BrokerToControllerChannelManagerImpl.scala:175) > at > > kafka.server.BrokerToControllerRequestThread.$anonfun$generateRequests$1(BrokerToControllerChannelManagerImpl.scala:158) > at > org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109) > at > > org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:586) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:578) > at > kafka.common.InterBrokerSendThread.doWork(InterBrokerSendThread.scala:71) > at > > kafka.server.BrokerToControllerRequestThread.doWork(BrokerToControllerChannelManagerImpl.scala:183) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) > Looks like some kind of race condition bug...anyone has any idea ? > > Thanks, > Dhirendra >