Anna Povzner created KAFKA-9839: ----------------------------------- Summary: IllegalStateException on metadata update when broker learns about its new epoch after the controller Key: KAFKA-9839 URL: https://issues.apache.org/jira/browse/KAFKA-9839 Project: Kafka Issue Type: Bug Components: controller, core Affects Versions: 2.3.1 Reporter: Anna Povzner
Broker throws "java.lang.IllegalStateException: Epoch XXX larger than current broker epoch YYY" on UPDATE_METADATA when the controller learns about the broker epoch and sends UPDATE_METADATA before KafkaZkCLient.registerBroker completes (the broker learns about its new epoch). Here is the scenario we observed in more detail: 1. ZK session expires on broker 1 2. Broker 1 establishes new session to ZK and creates znode 3. Controller learns about broker 1 and assigns epoch 4. Broker 1 receives UPDATE_METADATA from controller, but it does not know about its new epoch yet, so we get an exception: ERROR [KafkaApi-3] Error when handling request: clientId=1, correlationId=0, api=UPDATE_METADATA, body={ ......... java.lang.IllegalStateException: Epoch XXX larger than current broker epoch YYY at kafka.server.KafkaApis.isBrokerEpochStale(KafkaApis.scala:2725) at kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:320) at kafka.server.KafkaApis.handle(KafkaApis.scala:139) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at java.lang.Thread.run(Thread.java:748) 5. KafkaZkCLient.registerBroker completes on broker 1: "INFO Stat of the created znode at /brokers/ids/1" The result is the broker has a stale metadata for some time. Possible solutions: 1. Broker returns a more specific error and controller retries UPDATE_MEDATA 2. Broker accepts UPDATE_METADATA with larger broker epoch. -- This message was sent by Atlassian Jira (v8.3.4#803005)