Hi Miguel,

Could you let us know which version of Kafka you're using?
There's no v3.8.1 Kafka currently.

Thanks.
Luke

On Wed, Feb 16, 2022 at 12:12 AM Miguel Angel Corral
<miguel.cor...@mandiant.com.invalid> wrote:

> Hi,
>
> Recently, in a 3.8.1 Kafka cluster with 3 brokers, the topic
> __consumer_offsets became leaderless:
>
> $ /kafka-topics.sh  --zookeeper <zookeeper_addresses>  --describe
> --under-replicated-partitions
>                 Topic: __consumer_offsets          Partition: 0
> Leader: none      Replicas: 103,101,102    Isr:
>                 Topic: __consumer_offsets          Partition: 1
> Leader: none      Replicas: 101,102,103    Isr:
>                 Topic: __consumer_offsets          Partition: 2
> Leader: none      Replicas: 102,103,101    Isr:
>                 Topic: __consumer_offsets          Partition: 3
> Leader: none      Replicas: 103,102,101    Isr:
>                 Topic: __consumer_offsets          Partition: 4
> Leader: none      Replicas: 101,103,102    Isr:
>                 Topic: __consumer_offsets          Partition: 5
> Leader: none      Replicas: 102,101,103    Isr:
>                 Topic: __consumer_offsets          Partition: 6
> Leader: none      Replicas: 103,101,102    Isr:
>                 …
>
> When this happened, consumers were unable to consume, with the following
> error:
>
> o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-2,
> groupId=foo] Sending FindCoordinator request to broker <IP:port> (id: 102
> rack: <region>)
> o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-2,
> groupId=foo] Received FindCoordinator response
> ClientResponse(receivedTimeMs=1639436595264, latencyMs=98,
> disconnected=false, requestHeader=RequestHeader(apiKey=bar, apiVersion=2,
> clientId=consumer-2, correlationId=117),
> responseBody=FindCoordinatorResponseData(throttleTimeMs=0, errorCode=15,
> errorMessage='The coordinator is not available.', nodeId=-1, host='',
> port=-1))
> o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-2,
> groupId=foo] Group coordinator lookup failed: The coordinator is not
> available.
> o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-2,
> groupId=foo] Coordinator discovery failed, refreshing metadata
>
> This issue was solved just restarting all brokers without much
> investigation, since this caused an outage. Unfortunately, there’s no
> broker logs. During this incident, the JMX metrics
> kafka.controller:type=KafkaController,name=OfflinePartitionsCount and
> kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions reported 0.
>
> I’m trying to figure out: 1. What could have caused this issue? 2. What
> JMX metrics could we use to get notified of this issue in the future?
>
> Thanks in advance,
> Miguel
> This email and any attachments thereto may contain private, confidential,
> and/or privileged material for the sole use of the intended recipient. Any
> review, copying, or distribution of this email (or any attachments thereto)
> by others is strictly prohibited. If you are not the intended recipient,
> please contact the sender immediately and permanently delete the original
> and any copies of this email and any attachments thereto.
>

Reply via email to