[ 
https://issues.apache.org/jira/browse/KAFKA-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-13720.
-------------------------------
    Fix Version/s: 3.1.0
       Resolution: Fixed

> Few topic partitions remain under replicated after broker lose connectivity 
> to zookeeper
> ----------------------------------------------------------------------------------------
>
>                 Key: KAFKA-13720
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13720
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 2.7.1
>            Reporter: Dhirendra Singh
>            Priority: Major
>             Fix For: 3.1.0
>
>
> Few topic partitions remain under replicated after broker lose connectivity 
> to zookeeper.
> It only happens when brokers lose connectivity to zookeeper and it results in 
> change in active controller. Issue does not occur always but randomly.
> Issue never occurs when there is no change in active controller when brokers 
> lose connectivity to zookeeper.
> Following error message i found in the log file.
> [2022-02-28 04:01:20,217] WARN [Partition __consumer_offsets-4 broker=1] 
> Controller failed to update ISR to PendingExpandIsr(isr=Set(1), 
> newInSyncReplicaId=2) due to unexpected UNKNOWN_SERVER_ERROR. Retrying. 
> (kafka.cluster.Partition)
> [2022-02-28 04:01:20,217] ERROR [broker-1-to-controller] Uncaught error in 
> request completion: (org.apache.kafka.clients.NetworkClient)
> java.lang.IllegalStateException: Failed to enqueue `AlterIsr` request with 
> state LeaderAndIsr(leader=1, leaderEpoch=2728, isr=List(1, 2), 
> zkVersion=4719) for partition __consumer_offsets-4
> at kafka.cluster.Partition.sendAlterIsrRequest(Partition.scala:1403)
> at 
> kafka.cluster.Partition.$anonfun$handleAlterIsrResponse$1(Partition.scala:1438)
> at kafka.cluster.Partition.handleAlterIsrResponse(Partition.scala:1417)
> at 
> kafka.cluster.Partition.$anonfun$sendAlterIsrRequest$1(Partition.scala:1398)
> at 
> kafka.cluster.Partition.$anonfun$sendAlterIsrRequest$1$adapted(Partition.scala:1398)
> at 
> kafka.server.AlterIsrManagerImpl.$anonfun$handleAlterIsrResponse$8(AlterIsrManager.scala:166)
> at 
> kafka.server.AlterIsrManagerImpl.$anonfun$handleAlterIsrResponse$8$adapted(AlterIsrManager.scala:163)
> at scala.collection.immutable.List.foreach(List.scala:333)
> at 
> kafka.server.AlterIsrManagerImpl.handleAlterIsrResponse(AlterIsrManager.scala:163)
> at 
> kafka.server.AlterIsrManagerImpl.responseHandler$1(AlterIsrManager.scala:94)
> at 
> kafka.server.AlterIsrManagerImpl.$anonfun$sendRequest$2(AlterIsrManager.scala:104)
> at 
> kafka.server.BrokerToControllerRequestThread.handleResponse(BrokerToControllerChannelManagerImpl.scala:175)
> at 
> kafka.server.BrokerToControllerRequestThread.$anonfun$generateRequests$1(BrokerToControllerChannelManagerImpl.scala:158)
> at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
> at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:586)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:578)
> at kafka.common.InterBrokerSendThread.doWork(InterBrokerSendThread.scala:71)
> at 
> kafka.server.BrokerToControllerRequestThread.doWork(BrokerToControllerChannelManagerImpl.scala:183)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
>  
> under replication count goes to zero after the controller broker is restarted 
> again. but this require manual intervention.
> Expectation is that when broker reconnect with zookeeper cluster should come 
> back to stable state with under replication count as zero by itself without 
> any manual intervention.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to