zookeeper.session.timeout.ms in consumer config. Thanks, Jun
On Thu, Jan 23, 2014 at 11:24 AM, Ahmed H. <ahmed.ham...@gmail.com> wrote: > When you say "use a larger session timeout", which session timeout do you > refer to? Is it the zookeeper session timeout variable that we define when > creating a Kafka consumer? Or is there a different session timeout? > > As for downgrading, that is currently not an option for the time being, so > I will have to have some better debugging tools to pinpoint the cause. > > Thanks > > > On Wed, Jan 22, 2014 at 11:44 PM, Jun Rao <jun...@gmail.com> wrote: > > > You can find some of the GC settings in > > https://cwiki.apache.org/confluence/display/KAFKA/Operations > > > > There were some ZK bugs exposed during session expiration, which were > fixed > > in 3.3.4. Not sure if 3.4.5 exposes any new issues. The easiest thing is > > probably to avoid GC-induced ZK session timeout in the first place or > use a > > larger session timeout. > > > > Thanks, > > > > Jun > > > > > > On Wed, Jan 22, 2014 at 8:29 AM, Ahmed H. <ahmed.ham...@gmail.com> > wrote: > > > > > Hello, > > > > > > I looked at that, not sure if it is applicable or not at this point. We > > > used to have frequent rebalances, but that issue was mitigated by > > > increasing the zktimeout on the consumer side. With that said, it may > > still > > > be a problem. I have't collected any metrics concerning rebalances in a > > > while. I will certainly take a look at our current GC settings. What > are > > > typical settings that we should have for GC (I am not sure of what > > exactly > > > I'm looking for)? > > > > > > As for downgrading the Zookeeper version, would there be any major loss > > of > > > functionality? Version 3.4.5 is currently stable, so I am unsure of how > > it > > > would help. I can try it and let it soak for a while to see if it helps > > or > > > not. The problem is we have many components that tie into Zookeeper and > > I'm > > > worried that downgrading may break some of our API calls to it. > > > > > > Is there a good way of trying to narrow this problem down further? > > > > > > Thanks again > > > > > > > > > On Wed, Jan 22, 2014 at 10:15 AM, Jun Rao <jun...@gmail.com> wrote: > > > > > > > Not sure how stable ZK 3.4.5 is. Could you try 3.3.4? Also, see if > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyaretheremanyrebalancesinmyconsumerlog > > > > ? > > > > is applicable. > > > > > > > > Thanks, > > > > > > > > Jun > > > > > > > > > > > > On Wed, Jan 22, 2014 at 6:24 AM, Ahmed H. <ahmed.ham...@gmail.com> > > > wrote: > > > > > > > > > I have a basic Zookeeper/Kafka setup. I am still on Kafka 0.8 beta > 1, > > > and > > > > > Zookeeper 3.4.5. The activity on this machine isn't massive...I > would > > > say > > > > > the Kafka queues get a consistent 1 message every 2-3 seconds, as > > well > > > as > > > > > occasional spikes, but still nothing large enough to push the > limits. > > > > Both > > > > > Kafka and Zookeeper are running on the same machine. > > > > > > > > > > Occasionally, a rebalance is triggered, which causes our Kafka > > clients > > > to > > > > > try reconnecting several times, but it ultimately fails with the > > > > following > > > > > error: > > > > > > > > > > > > > > > 04:56:10,020 INFO [kafka.consumer.ZookeeperConsumerConnector] > > > > > > > > > (alarms.topology.updates_<host>-1383643783747-c7775701_watcher_executor) > > > > > [alarms.topology.updates_<host>-1383643783747-c7775701], exception > > > > > during rebalance : org.I0Itec.zkclient.exception.ZkNoNodeException: > > > > > org.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode > > > > > = NoNode for > > > > > > > > > > > > > > > /consumers/alarms.topology.updates/ids/alarms.topology.updates_<host>-1383643783747-c7775701 > > > > > at > > > > > > org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) > > > > > [zkclient-0.3.jar:0.3] > > > > > at > > > > > org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685) > > > > > [zkclient-0.3.jar:0.3] > > > > > at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766) > > > > > [zkclient-0.3.jar:0.3] > > > > > at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761) > > > > > [zkclient-0.3.jar:0.3] > > > > > at kafka.utils.ZkUtils$.readData(ZkUtils.scala:407) > > > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > > > at > > > > > kafka.consumer.TopicCount$.constructTopicCount(TopicCount.scala:52) > > > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > > > at > > > > > > > > > > > > > > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:401) > > > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > > > at > > > > > > > > > > > > > > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:374) > > > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > > > at > > > > scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:78) > > > > > [scala-library-2.9.2.jar:] > > > > > at > > > > > > > > > > > > > > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:369) > > > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > > > at > > > > > > > > > > > > > > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:326) > > > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > > > Caused by: org.apache.zookeeper.KeeperException$NoNodeException: > > > > > KeeperErrorCode = NoNode for > > > > > > > > > > > > > > > > > > > > /consumers/alarms.topology.updates/ids/alarms.topology.updates_<host>-1383643783747-c7775701 > > > > > at > > > > > > org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > > > > > [zookeeper-3.4.3.jar:3.4.3-1240972] > > > > > at > > > > > > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > > > > > [zookeeper-3.4.3.jar:3.4.3-1240972] > > > > > at > > org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131) > > > > > [zookeeper-3.4.3.jar:3.4.3-1240972] > > > > > at > > org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1160) > > > > > [zookeeper-3.4.3.jar:3.4.3-1240972] > > > > > at > > > > org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103) > > > > > [zkclient-0.3.jar:0.3] > > > > > at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770) > > > > > [zkclient-0.3.jar:0.3] > > > > > at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766) > > > > > [zkclient-0.3.jar:0.3] > > > > > at > > > > > org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675) > > > > > [zkclient-0.3.jar:0.3] > > > > > ... 9 more > > > > > > > > > > > > > > > Our Kafka consumers are written in Clojure ( > > > > > https://github.com/pingles/clj-kafka). > > > > > > > > > > Any ideas on what can cause such behaviour? The rebalances > themselves > > > > > happen sporadically, but when they do, they sometimes fail and an > > error > > > > > like the one above is shown. I'm not sure if this is a Kafka or > > > Zookeeper > > > > > problem at this point, but any help would be appreciated. > > > > > > > > > > Thanks > > > > > > > > > > > > > > >