Hi Miguel, Could you let us know which version of Kafka you're using? There's no v3.8.1 Kafka currently.
Thanks. Luke On Wed, Feb 16, 2022 at 12:12 AM Miguel Angel Corral <miguel.cor...@mandiant.com.invalid> wrote: > Hi, > > Recently, in a 3.8.1 Kafka cluster with 3 brokers, the topic > __consumer_offsets became leaderless: > > $ /kafka-topics.sh --zookeeper <zookeeper_addresses> --describe > --under-replicated-partitions > Topic: __consumer_offsets Partition: 0 > Leader: none Replicas: 103,101,102 Isr: > Topic: __consumer_offsets Partition: 1 > Leader: none Replicas: 101,102,103 Isr: > Topic: __consumer_offsets Partition: 2 > Leader: none Replicas: 102,103,101 Isr: > Topic: __consumer_offsets Partition: 3 > Leader: none Replicas: 103,102,101 Isr: > Topic: __consumer_offsets Partition: 4 > Leader: none Replicas: 101,103,102 Isr: > Topic: __consumer_offsets Partition: 5 > Leader: none Replicas: 102,101,103 Isr: > Topic: __consumer_offsets Partition: 6 > Leader: none Replicas: 103,101,102 Isr: > … > > When this happened, consumers were unable to consume, with the following > error: > > o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, > groupId=foo] Sending FindCoordinator request to broker <IP:port> (id: 102 > rack: <region>) > o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, > groupId=foo] Received FindCoordinator response > ClientResponse(receivedTimeMs=1639436595264, latencyMs=98, > disconnected=false, requestHeader=RequestHeader(apiKey=bar, apiVersion=2, > clientId=consumer-2, correlationId=117), > responseBody=FindCoordinatorResponseData(throttleTimeMs=0, errorCode=15, > errorMessage='The coordinator is not available.', nodeId=-1, host='', > port=-1)) > o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, > groupId=foo] Group coordinator lookup failed: The coordinator is not > available. > o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, > groupId=foo] Coordinator discovery failed, refreshing metadata > > This issue was solved just restarting all brokers without much > investigation, since this caused an outage. Unfortunately, there’s no > broker logs. During this incident, the JMX metrics > kafka.controller:type=KafkaController,name=OfflinePartitionsCount and > kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions reported 0. > > I’m trying to figure out: 1. What could have caused this issue? 2. What > JMX metrics could we use to get notified of this issue in the future? > > Thanks in advance, > Miguel > This email and any attachments thereto may contain private, confidential, > and/or privileged material for the sole use of the intended recipient. Any > review, copying, or distribution of this email (or any attachments thereto) > by others is strictly prohibited. If you are not the intended recipient, > please contact the sender immediately and permanently delete the original > and any copies of this email and any attachments thereto. >