The error message is very explicit (partition is under replicated), I don’t think it’s related to networking issues.
Try to run /home/kafka/bin/kafka-topics.sh —zookeeper localhost/kafka —describe topic_name and see which brokers are missing from the replica assignment. (replace home, zk-quorum etc with your own set-up) Lastly, has this ever worked? Maybe you’ve accidentally created the topic with more partitions and replicas than available brokers… try to recreate with fewer partitions/replicas, see if it works. -adrian From: Dmitry Goldenberg Date: Tuesday, September 29, 2015 at 3:37 PM To: Adrian Tanase Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" Subject: Re: Kafka error "partitions don't have a leader" / LeaderNotAvailableException Adrian, Thanks for your response. I just looked at both machines we're testing on and on both the Kafka server process looks OK. Anything specific I can check otherwise? From googling around, I see some posts where folks suggest to check the DNS settings (those appear fine) and to set the advertised.host.name<http://advertised.host.name> in Kafka's server.properties. Yay/nay? Thanks again. On Tue, Sep 29, 2015 at 8:31 AM, Adrian Tanase <atan...@adobe.com<mailto:atan...@adobe.com>> wrote: I believe some of the brokers in your cluster died and there are a number of partitions that nobody is currently managing. -adrian From: Dmitry Goldenberg Date: Tuesday, September 29, 2015 at 3:26 PM To: "user@spark.apache.org<mailto:user@spark.apache.org>" Subject: Kafka error "partitions don't have a leader" / LeaderNotAvailableException I apologize for posting this Kafka related issue into the Spark list. Have gotten no responses on the Kafka list and was hoping someone on this list could shed some light on the below. --------------------------------------------------------------------------------------- We're running into this issue in a clustered environment where we're trying to send messages to Kafka and are getting the below error. Can someone explain what might be causing it and what the error message means (Failed to send data since partitions [<topic-name>,8] don't have a leader) ? --------------------------------------------------------------------------------------- WARN kafka.producer.BrokerPartitionInfo: Error while fetching metadata partition 10 leader: none replicas: isr: isUnderReplicated: false for topic partition [<topic-name>,10]: [class kafka.common.LeaderNotAvailableException] ERROR kafka.producer.async.DefaultEventHandler: Failed to send requests for topics <topic-name> with correlation ids in [2398792,2398801] ERROR com.acme.core.messaging.kafka.KafkaMessageProducer: Error while sending a message to the message store. kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries. at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90) ~[kafka_2.10-0.8.2.0.jar:?] at kafka.producer.Producer.send(Producer.scala:77) ~[kafka_2.10-0.8.2.0.jar:?] at kafka.javaapi.producer.Producer.send(Producer.scala:33) ~[kafka_2.10-0.8.2.0.jar:?] WARN kafka.producer.async.DefaultEventHandler: Failed to send data since partitions [<topic-name>,8] don't have a leader What do these errors and warnings mean and how do we get around them? --------------------------------------------------------------------------------------- The code for sending messages is basically as follows: public class KafkaMessageProducer { private Producer<String, String> producer; ..................... public void sendMessage(String topic, String key, String message) throws IOException, MessagingException { KeyedMessage<String, String> data = new KeyedMessage<String, String>(topic, key, message); try { producer.send(data); } catch (Exception ex) { throw new MessagingException("Error while sending a message to the message store.", ex); } } Is it possible that the producer gets "stale" and needs to be re-initialized? Do we want to re-create the producer on every message (??) or is it OK to hold on to one indefinitely? --------------------------------------------------------------------------------------- The following are the producer properties that are being set into the producer batch.num.messages => 200 client.id<http://client.id/> => Acme compression.codec => none key.serializer.class => kafka.serializer.StringEncoder message.send.max.retries => 3 metadata.broker.list => data2.acme.com:9092<http://data2.acme.com:9092/>,data3.acme.com:9092<http://data3.acme.com:9092/> partitioner.class => kafka.producer.DefaultPartitioner producer.type => sync queue.buffering.max.messages => 10000 queue.buffering.max.ms<http://queue.buffering.max.ms/> => 5000 queue.enqueue.timeout.ms<http://queue.enqueue.timeout.ms/> => -1 request.required.acks => 1 request.timeout.ms<http://request.timeout.ms/> => 10000 retry.backoff.ms<http://retry.backoff.ms/> => 1000 send.buffer.bytes => 102400 serializer.class => kafka.serializer.StringEncoder topic.metadata.refresh.interval.ms<http://topic.metadata.refresh.interval.ms/> => 600000 Thanks.