Can you paste the error log for each rebalance try?
You may search for keyword ³exception during rebalance².

On 2/26/15, 7:41 PM, "Ashwin Jayaprakash" <>

>Just give you some more debugging context, we noticed that the "consumers"
>path becomes empty after all the JVMs have exited because of this error.
>So, when we restart, there are no visible entries in ZK.
>On Thu, Feb 26, 2015 at 6:04 PM, Ashwin Jayaprakash <
>> wrote:
>> Hello, we have a set of JVMs that consume messages from Kafka topics.
>> JVM creates 4 ConsumerConnectors that are used by 4 separate threads.
>> These JVMs also create and use the CuratorFramework's Path children
>> to watch and keep a sub-tree of the ZooKeeper in sync with other JVMs.
>> path has several thousand children elements.
>> Everything was working perfectly until one fine day we decided to
>> these JVMs. We restart these JVMs to roll in new code every few weeks or
>> so. We never had any problems until suddenly the Kafka consumers on
>> JVMs were unable to rebalance partitions among themselves.  We have
>> these JVMs before with no issues.
>> The exception:
>> Caused by: kafka.common.ConsumerRebalanceFailedException:
>> group1-system01-27422-kafka-787 can't rebalance after 12 retries
>> at
>> at
>> at
>> at
>> at
>> at
>> We then set rebalance.max.retries=16 and
>> seen the Spark-Kafka issue
>> and Jun's
>> to increase the backoff property.
>> We must've tried restarting these JVMs about 20 times now both with and
>> without the "rebalance.xx" properties. Every time it is the same issue.
>> Except for the first time we applied the ""
>> property when all 4 JVMs started! We thought that solved everything and
>> then we tried restarting it just to make sure and then we were back to
>> square one.
>> If we have only 1 thread create 1 ConsumerConnector instead of 4 it
>> This way we can have any number of JVMs running 1 ConsumerConnector and
>> they all behave well and rebalance partitions. It is only when we try to
>> start multiple ConsumerConnectors on the same JVM does this problem
>> I'd like to remind you that 4 ConsumerConnectors was working for several
>> months. The ZK sub-tree for our non-Kafka part of the code was small
>> we started.
>> Does anybody have any thoughts on this? What could be causing this
>> Could there be a Curator/ZK client conflict with the High level Kafka
>> consumer? Or is the number of nodes that we have on ZK from our code
>> causing problems with partition assignment in the Kafka code? Because
>> Curator framework keeps syncing data in the background while the Kafka
>> is creating ConsumerConnectors and rebalancing topics.
>> Thanks,
>> Ashwin Jayaprakash.

Reply via email to