Jason Gustafson created KAFKA-15828: ---------------------------------------
Summary: Protect clients from broker hostname reuse Key: KAFKA-15828 URL: https://issues.apache.org/jira/browse/KAFKA-15828 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson In some environments such as k8s, brokers may be assigned to nodes dynamically from an available pool. When a cluster is rolling, it is possible for the client to see the same node advertised for different broker IDs in a short period of time. For example, kafka-1 might be initially assigned to node1. Before the client is able to establish a connection, it could be that kafka-3 is now on node1 instead. Currently there is no protection in the client or in the protocol for this scenario. If the connection succeeds, the client will assume it has a good connection to kafka-1. Until something disrupts the connection, it will continue under this assumption even if the hostname for kafka-1 changes. We have observed this scenario in practice. The client connected to the wrong broker through stale hostname information. It was unable to produce data because of persistent NOT_LEADER errors. The only way to recover in the end was by restarting the client to force a reconnection. We have discussed a couple potential solutions to this problem: # Let the client be smarter managing the connection/hostname mapping. When it detects that a hostname has changed, it should force a disconnect to ensure it connects to the right node. # We can modify the protocol to verify that the client has connected to the intended broker. For example, we can add a field to ApiVersions to indicate the intended broker ID. The broker receiving the request can return an error if its ID does not match that in the request. Are there alternatives? -- This message was sent by Atlassian Jira (v8.20.10#820010)