Badai Aqrandista created KAFKA-12162: ----------------------------------------
Summary: Kafka broker continued to run after failing to create "/brokers/ids/X" znode. Key: KAFKA-12162 URL: https://issues.apache.org/jira/browse/KAFKA-12162 Project: Kafka Issue Type: Bug Affects Versions: 2.4.1 Reporter: Badai Aqrandista We found that Kafka broker continued to run after it failed to create "/brokers/ids/X" znode and still acted as a partition leader. Here's the log snippet. {code:java} [2020-12-08 14:10:25,040] INFO Client successfully logged in. (org.apache.zookeeper.Login) [2020-12-08 14:10:25,040] INFO Client will use DIGEST-MD5 as SASL mechanism. (org.apache.zookeeper.client.ZooKeeperSaslClient) [2020-12-08 14:10:25,040] INFO Opening socket connection to server kafkaprzk4.example.com/0.0.0.0:2181. Will attempt to SASL-authenticate using Login Context section 'Client' (org.apache.zookeeper.ClientCnxn) [2020-12-08 14:10:25,045] INFO Socket connection established, initiating session, client: /0.0.0.0:36056, server: kafkaprzk4.example.com/0.0.0.0:2181 (org.apache.zookeeper.ClientCnxn) [2020-12-08 14:10:25,054] WARN Unable to reconnect to ZooKeeper service, session 0x50000002ed5001b has expired (org.apache.zookeeper.ClientCnxn) [2020-12-08 14:10:25,054] INFO Unable to reconnect to ZooKeeper service, session 0x50000002ed5001b has expired, closing socket connection (org.apache.zookeeper.ClientCnxn) [2020-12-08 14:10:25,055] INFO EventThread shut down for session: 0x50000002ed5001b (org.apache.zookeeper.ClientCnxn) [2020-12-08 14:10:25,055] INFO [ZooKeeperClient Kafka server] Session expired. (kafka.zookeeper.ZooKeeperClient) [2020-12-08 14:10:25,059] INFO [ZooKeeperClient Kafka server] Initializing a new session to kafkaprzk1.example.com:2181,kafkaprzk2.example.com:2181,kafkaprzk3.example.com:2181,kafkaprzk4.example.com:2181,kafkaprzk5.example.com:2181/kafkaprod/01. (kafka.zookeeper.ZooKeeperClient) [2020-12-08 14:10:25,059] INFO Initiating client connection, connectString=kafkaprzk1.example.com:2181,kafkaprzk2.example.com:2181,kafkaprzk3.example.com:2181,kafkaprzk4.example.com:2181,kafkaprzk5.example.com:2181/kafkaprod/01 sessionTimeout=22500 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@1b11171f (org.apache.zookeeper.ZooKeeper) [2020-12-08 14:10:25,060] INFO jute.maxbuffer value is 4194304 Bytes (org.apache.zookeeper.ClientCnxnSocket) [2020-12-08 14:10:25,060] INFO zookeeper.request.timeout value is 0. feature enabled= (org.apache.zookeeper.ClientCnxn) [2020-12-08 14:10:25,061] INFO Client successfully logged in. (org.apache.zookeeper.Login) [2020-12-08 14:10:25,061] INFO Client will use DIGEST-MD5 as SASL mechanism. (org.apache.zookeeper.client.ZooKeeperSaslClient) [2020-12-08 14:10:25,061] INFO Opening socket connection to server kafkaprzk4.example.com/0.0.0.0:2181. Will attempt to SASL-authenticate using Login Context section 'Client' (org.apache.zookeeper.ClientCnxn) [2020-12-08 14:10:25,065] INFO Socket connection established, initiating session, client: /0.0.0.0:36058, server: kafkaprzk4.example.com/0.0.0.0:2181 (org.apache.zookeeper.ClientCnxn) [2020-12-08 14:10:25,070] INFO Creating /brokers/ids/2152 (is it secure? true) (kafka.zk.KafkaZkClient) [2020-12-08 14:10:25,081] INFO Session establishment complete on server kafkaprzk4.example.com/0.0.0.0:2181, sessionid = 0x400645dad9d0001, negotiated timeout = 22500 (org.apache.zookeeper.ClientCnxn) [2020-12-08 14:10:25,112] ERROR Error while creating ephemeral at /brokers/ids/2152, node already exists and owner '288459910057232437' does not match current session '288340729659195393' (kafka.zk.KafkaZkClient$CheckedEphemeral) [2020-12-08 14:10:29,814] WARN [Producer clientId=confluent-metrics-reporter] Got error produce response with correlation id 520478 on topic-partition _confluent-metrics-9, retrying (9 attempts left). Error: NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender) [2020-12-08 14:10:29,814] WARN [Producer clientId=confluent-metrics-reporter] Received invalid metadata error in produce request on partition _confluent-metrics-9 due to org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Going to request metadata update now (org.apache.kafka.clients.producer.internals.Sender) [2020-12-08 14:10:29,818] WARN [Producer clientId=confluent-metrics-reporter] Got error produce response with correlation id 520480 on topic-partition _confluent-metrics-4, retrying (9 attempts left). Error: NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)