Kamal, Thanks for your response. I tried testing with metadata.max.age.ms reduced to 10s, but the behavior not changed, and producer still can't find the live broker.
I did more testing and find the rule (Topic is created with "--replication-factor 2 --partitions 1" in following case): node 1 node 2 down(lead) down (replica) down(replica) up (lead) producer send fail !!! down(lead) down (replica) up (lead) down (replica) producer send ok !!! If the only node with original lead partition up, everything is fine. If the only node with original replica partition up, producer can't connect to broker alive (always try to connect to the original lead broker, node 1 in my case). Kafka can't recover for this situation? Anyone has clue for this? Thanks! Aggie -----Original Message----- From: Kamal C [mailto:[email protected]] Sent: Saturday, September 24, 2016 1:37 PM To: [email protected] Subject: Re: producer can't push msg sometimes with 1 broker recoved Reduce the metadata refresh interval 'metadata.max.age.ms' from 5 min to your desired time interval. This may reduce the time window of non-availability broker. -- Kamal
