Hello Rajasekar,

In 0.8 producers keep a cache of the partition -> leader_broker_id map
which is used to determine to which brokers should the messages be sent.
After new partitions are added, the cache on the producer has not populated
yet hence it will throw this exception. The producer will then try to
refresh its cache by asking the brokers "who are the leaders of these new
partitions that I do not know of before". The brokers at the beginning also
do not know this information, and will only get this information from
controller which will only propagation the leader information after the
leader elections have all been finished.

If you set num.retries to 3 then it is possible that producer gives up too
soon before the leader info ever propagated to producers, hence to
producers also. Could you try to increase producer.num.retries and see if
the producer can eventually succeed in re-trying?

Guozhang


On Tue, Aug 27, 2013 at 8:53 AM, Rajasekar Elango <rela...@salesforce.com>wrote:

> Hello everyone,
>
> We recently increased number of partitions from 4 to 16 and after that
> console producer mostly fails with LeaderNotAvailableException and exits
> after 3 tries:
>
> Here is last few lines of console producer log:
>
> No partition metadata for topic test-41 due to
> kafka.common.LeaderNotAvailableException}] for topic [test-41]: class
> kafka.common.LeaderNotAvailableException
>  (kafka.producer.BrokerPartitionInfo)
> [2013-08-27 08:29:30,271] ERROR Failed to collate messages by topic,
> partition due to: Failed to fetch topic metadata for topic: test-41
> (kafka.producer.async.DefaultEventHandler)
> [2013-08-27 08:29:30,271] INFO Back off for 100 ms before retrying send.
> Remaining retries = 0 (kafka.producer.async.DefaultEventHandler)
> [2013-08-27 08:29:30,372] INFO Secure sockets for data transfer is enabled
> (kafka.producer.SyncProducerConfig)
> [2013-08-27 08:29:30,372] INFO Fetching metadata from broker
> id:0,host:localhost,port:6667,secure:true with correlation id 8 for 1
> topic(s) Set(test-41) (kafka.client.ClientUtils$)
> [2013-08-27 08:29:30,373] INFO begin ssl handshake for localhost/
> 127.0.0.1:6667//127.0.0.1:36640 (kafka.security.SSLSocketChannel)
> [2013-08-27 08:29:30,375] INFO finished ssl handshake for localhost/
> 127.0.0.1:6667//127.0.0.1:36640 (kafka.security.SSLSocketChannel)
> [2013-08-27 08:29:30,375] INFO Connected to localhost:6667:true for
> producing (kafka.producer.SyncProducer)
> [2013-08-27 08:29:30,380] INFO Disconnecting from localhost:6667:true
> (kafka.producer.SyncProducer)
> [2013-08-27 08:29:30,381] INFO Secure sockets for data transfer is enabled
> (kafka.producer.SyncProducerConfig)
> [2013-08-27 08:29:30,381] ERROR Failed to send requests for topics test-41
> with correlation ids in [0,8] (kafka.producer.async.DefaultEventHandler)
> kafka.common.FailedToSendMessageException: Failed to send messages after 3
> tries.
>         at
>
> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90)
>         at kafka.producer.Producer.send(Producer.scala:74)
>         at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:168)
>         at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala)
> [2013-08-27 08:29:30,383] INFO Shutting down producer
> (kafka.producer.Producer)
> [2013-08-27 08:29:30,384] INFO Closing all sync producers
> (kafka.producer.ProducerPool)
>
>
> Also, this happens only for new topics (we have auto.create.topic set to
> true), If retry sending message to existing topic, it works fine. Is there
> any tweaking I need to do to broker or to producer to scale based on number
> of partitions?
>
> --
> Thanks in advance for help,
> Raja.
>



-- 
-- Guozhang

Reply via email to