Re: Kafka rebalancing causes Zookeeper to fail

Ahmed H. Wed, 22 Jan 2014 08:40:26 -0800

Hello,

I looked at that, not sure if it is applicable or not at this point. We
used to have frequent rebalances, but that issue was mitigated by
increasing the zktimeout on the consumer side. With that said, it may still
be a problem. I have't collected any metrics concerning rebalances in a
while. I will certainly take a look at our current GC settings. What are
typical settings that we should have for GC (I am not sure of what exactly
I'm looking for)?


As for downgrading the Zookeeper version, would there be any major loss of
functionality? Version 3.4.5 is currently stable, so I am unsure of how it
would help. I can try it and let it soak for a while to see if it helps or
not. The problem is we have many components that tie into Zookeeper and I'm
worried that downgrading may break some of our API calls to it.

Is there a good way of trying to narrow this problem down further?

Thanks again


On Wed, Jan 22, 2014 at 10:15 AM, Jun Rao <jun...@gmail.com> wrote:

> Not sure how stable ZK 3.4.5 is. Could you try 3.3.4? Also, see if
>
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyaretheremanyrebalancesinmyconsumerlog
> ?
> is applicable.
>
> Thanks,
>
> Jun
>
>
> On Wed, Jan 22, 2014 at 6:24 AM, Ahmed H. <ahmed.ham...@gmail.com> wrote:
>
> > I have a basic Zookeeper/Kafka setup. I am still on Kafka 0.8 beta 1, and
> > Zookeeper 3.4.5. The activity on this machine isn't massive...I would say
> > the Kafka queues get a consistent 1 message every 2-3 seconds, as well as
> > occasional spikes, but still nothing large enough to push the limits.
> Both
> > Kafka and Zookeeper are running on the same machine.
> >
> > Occasionally, a rebalance is triggered, which causes our Kafka clients to
> > try reconnecting several times, but it ultimately fails with the
> following
> > error:
> >
> >
> > 04:56:10,020 INFO  [kafka.consumer.ZookeeperConsumerConnector]
> > (alarms.topology.updates_<host>-1383643783747-c7775701_watcher_executor)
> > [alarms.topology.updates_<host>-1383643783747-c7775701], exception
> > during rebalance : org.I0Itec.zkclient.exception.ZkNoNodeException:
> > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
> > = NoNode for
> >
> /consumers/alarms.topology.updates/ids/alarms.topology.updates_<host>-1383643783747-c7775701
> >         at
> > org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
> > [zkclient-0.3.jar:0.3]
> >         at
> > org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
> > [zkclient-0.3.jar:0.3]
> >         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
> > [zkclient-0.3.jar:0.3]
> >         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
> > [zkclient-0.3.jar:0.3]
> >         at kafka.utils.ZkUtils$.readData(ZkUtils.scala:407)
> > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> >         at
> > kafka.consumer.TopicCount$.constructTopicCount(TopicCount.scala:52)
> > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> >         at
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:401)
> > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> >         at
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:374)
> > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> >         at
> scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:78)
> > [scala-library-2.9.2.jar:]
> >         at
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:369)
> > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> >         at
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:326)
> > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> > Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> > KeeperErrorCode = NoNode for
> >
> >
> /consumers/alarms.topology.updates/ids/alarms.topology.updates_<host>-1383643783747-c7775701
> >         at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> > [zookeeper-3.4.3.jar:3.4.3-1240972]
> >         at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > [zookeeper-3.4.3.jar:3.4.3-1240972]
> >         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131)
> > [zookeeper-3.4.3.jar:3.4.3-1240972]
> >         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1160)
> > [zookeeper-3.4.3.jar:3.4.3-1240972]
> >         at
> org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
> > [zkclient-0.3.jar:0.3]
> >         at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
> > [zkclient-0.3.jar:0.3]
> >         at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
> > [zkclient-0.3.jar:0.3]
> >         at
> > org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> > [zkclient-0.3.jar:0.3]
> >         ... 9 more
> >
> >
> > Our Kafka consumers are written in Clojure (
> > https://github.com/pingles/clj-kafka).
> >
> > Any ideas on what can cause such behaviour? The rebalances themselves
> > happen sporadically, but when they do, they sometimes fail and an error
> > like the one above is shown. I'm not sure if this is a Kafka or Zookeeper
> > problem at this point, but any help would be appreciated.
> >
> > Thanks
> >
>

Re: Kafka rebalancing causes Zookeeper to fail

Reply via email to