Gwen Shapira created KAFKA-12674:
------------------------------------

             Summary: Client failover takes 2-4 seconds on clean broker shutdown
                 Key: KAFKA-12674
                 URL: https://issues.apache.org/jira/browse/KAFKA-12674
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 2.7.0
            Reporter: Gwen Shapira


I ran two perf-producer clients against a 4-broker cluster running AWS, behind 
ELB. And then did a rolling restart, taking down one broker at a time using 
controlled shutdown.

I got the following errors on every broker shutdown:

{{[2021-04-16 01:31:39,846] WARN [Producer clientId=producer-1] Received 
invalid metadata error in produce request on partition perf-test-3 due to 
org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests 
intended only for the leader, this error indicates that the broker is not the 
current leader. For requests intended for any replica, this error indicates 
that the broker is not a replica of the topic partition.. Going to request 
metadata update now (org.apache.kafka.clients.producer.internals.Sender)}}
 {{[2021-04-16 01:44:22,691] WARN [Producer clientId=producer-1] Connection to 
node 0 (b0-pkc-7yrmj.us-east-2.aws.confluent.cloud/3.140.123.43:9092) 
terminated during authentication. This may happen due to any of the following 
reasons: (1) Authentication failed due to invalid credentials with brokers 
older than 1.0.0, (2) Firewall blocking Kafka TLS traffic (eg it may only allow 
HTTPS traffic), (3) Transient network issue. 
(org.apache.kafka.clients.NetworkClient)}}

 The "Connection to node... terminated" error continued for 2-4 seconds. 

It looks like the metadata request was repeatedly sent to the node that just 
went down. I'd expect it to go on an existing connection to one of the live 
nodes.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to