Hi all,

I have an issue that spans Kafka and K8s... Do you think a Kafka bug is 
appropriate? Is there an alternative configuration to prevent this from 
happening again? Would it be any different with KRaft?

Here's what happened:
* A big disruption occurs on a node running the kafka-2 broker. Lots of I/O, 
OCI, Docker errors in /var/log/messages.
* The Controller sees kafka-2 disappear and it moves leadership to other 
brokers which have their replicas. Everything is good
* The node and kafka-2 aren't actually dead. The Controller sees kafka-2 return 
and marks it as part of the cluster. I guess it briefly lost its ZooKeeper 
registration and then reregistered itself.
* However, Kubelet is not responsive and the rest of the K8s cluster has marked 
the node as "Unavailable", "Kubelet stopped posting node status."
* Because of this, K8s has removed the kafka-2 pod from the headless-service, 
so its DNS name cannot be resolved anymore
* A preferred replica leader election happens and the Controller assigns 
partitions back to kafka-2, but nobody can resolve it, producers are now stuck.

Finally, we rebooted the node. This caused the Controller to see kafka-2 go 
away again, at which point it reassigned leadership back to the available 
brokers. But before that (about an hour), all our producers were stuck because 
the leader for all those partitions was unavailable.

The Controller's logs are full of hundreds of UnknownHostExceptions, so it 
should be aware the broker has problems. Yet, it leaves it as the leader in the 
metadata.

Kafka: version 3.4.0, 9 brokers, 2x replication
Deployed by: Bitnami Kafka chart 21.2.0
Stuck producers: Java Client (standard, no streams, etc.)
Connections: Plaintext, acks=0
Metadata: ZooKeeper

Thank you!
Meg
Privacy and Confidentiality Notice: This email and any attachments are intended 
solely for the use of the individual or entity to which they are addressed. The 
information contained herein may be confidential, privileged, or subject to 
legal restrictions (such as the GDPR). If you are not the intended recipient, 
please be advised that any unauthorized disclosure, copying, distribution, or 
any action taken or omitted in reliance on the contents of this email is 
strictly prohibited. If you have received this email in error, please notify 
the sender immediately and delete all copies from your system. Our external 
privacy policy is available here: 
https://www.infovista.com/infovista-privacy-and-personal-data-protection-policy

Reply via email to