Hello, I am using the beta right now.
I'm not sure if it's GC or something else at this point. To be honest I've never really fiddled with any GC settings before. The system can run for as long as a day without failing, or as little as a few hours. The lack of pattern makes it a little harder to debug. As I mentioned before, the activity on this system is fairly consistent throughout the day. On the link that you sent, I see this, which could very well be the reason: - One of the typical causes is that the application code that consumes messages somehow died and therefore killed the consumer thread. We recommend using a try/catch clause to log all Throwable in the consumer logic. That is entirely possible. I wanted to check the kafka logs for any clues but for some reason, kafka is not writing any logs :/. Here is my log4j settings for kafka: log4j.rootLogger=INFO, stdout > log4j.appender.stdout=org.apache.log4j.ConsoleAppender > log4j.appender.stdout.layout=org.apache.log4j.PatternLayout > log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n > log4j.appender.kafkaAppender=org.apache.log4j.DailyRollingFileAppender > log4j.appender.kafkaAppender.DatePattern='.'yyyy-MM-dd-HH > log4j.appender.kafkaAppender.File=/kafka-log4j/server.log > log4j.appender.kafkaAppender.layout=org.apache.log4j.PatternLayout > log4j.appender.kafkaAppender.layout.ConversionPattern=[%d] %p %m (%c)%n > > log4j.appender.stateChangeAppender=org.apache.log4j.DailyRollingFileAppender > log4j.appender.stateChangeAppender.DatePattern='.'yyyy-MM-dd-HH > log4j.appender.stateChangeAppender.File=/kafka-log4j/state-change.log > log4j.appender.stateChangeAppender.layout=org.apache.log4j.PatternLayout > log4j.appender.stateChangeAppender.layout.ConversionPattern=[%d] %p %m > (%c)%n > log4j.appender.requestAppender=org.apache.log4j.DailyRollingFileAppender > log4j.appender.requestAppender.DatePattern='.'yyyy-MM-dd-HH > log4j.appender.requestAppender.File=/kafka-log4j/kafka-request.log > log4j.appender.requestAppender.layout=org.apache.log4j.PatternLayout > log4j.appender.requestAppender.layout.ConversionPattern=[%d] %p %m (%c)%n > log4j.appender.controllerAppender=org.apache.log4j.DailyRollingFileAppender > log4j.appender.controllerAppender.DatePattern='.'yyyy-MM-dd-HH > log4j.appender.controllerAppender.File=/kafka-log4j/controller.log > log4j.appender.controllerAppender.layout=org.apache.log4j.PatternLayout > log4j.appender.controllerAppender.layout.ConversionPattern=[%d] %p %m > (%c)%n > log4j.logger.kafka=INFO, kafkaAppender > log4j.logger.kafka.network.RequestChannel$=TRACE, requestAppender > log4j.additivity.kafka.network.RequestChannel$=false > log4j.logger.kafka.request.logger=TRACE, requestAppender > log4j.additivity.kafka.request.logger=false > log4j.logger.kafka.controller=TRACE, controllerAppender > log4j.additivity.kafka.controller=false > log4j.logger.state.change.logger=TRACE, stateChangeAppender > log4j.additivity.state.change.logger=false Thanks On Thu, Nov 7, 2013 at 5:06 PM, Joel Koshy <jjkosh...@gmail.com> wrote: > Can you see if this applies in your case: > > https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyaretheremanyrebalancesinmyconsumerlog%3F > > Also, what version of kafka 0.8 are you using? If not the beta, then > what's the git hash? > > Joel > > On Thu, Nov 07, 2013 at 02:51:41PM -0500, Ahmed H. wrote: > > Hello all, > > > > I am not sure if this is a Kafka issue, or an issue with the client that > I > > am using. > > > > We have a fairly small setup, where everything sits on one server (Kafka > > 0.8, and Zookeeper). The message frequency is not too high (1-2 per > second). > > > > The setup works fine for a certain period of time but at some point, it > > just dies, and exceptions are thrown. This is pretty much a daily > > occurrence, but there is no pattern. Based on the logs, it appears that > the > > Kafka client tries to rebalance with Zookeeper and fails, it tries and > > tries multiple times but after a few tries it gives up. Here is the stack > > trace: > > > > 04:56:07,234 INFO [kafka.consumer.SimpleConsumer] > > > > (ConsumerFetcherThread-kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-0-0) > > > Reconnect due to socket error: : > > > java.nio.channels.ClosedByInterruptException > > > at > > > > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > > > [rt.jar:1.7.0_25] > > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:402) > > > [rt.jar:1.7.0_25] > > > at > > > sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:220) > > > [rt.jar:1.7.0_25] > > > at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) > > > [rt.jar:1.7.0_25] > > > at > > > > java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) > > > [rt.jar:1.7.0_25] > > > at kafka.utils.Utils$.read(Utils.scala:394) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at kafka.network.Receive$class.readCompletely(Transmission.scala:56) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:71) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:69) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:108) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:108) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:108) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:107) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:107) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:107) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:106) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > 04:56:07,238 WARN [kafka.consumer.ConsumerFetcherThread] > > > > (ConsumerFetcherThread-kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-0-0) > > > > [ConsumerFetcherThread-kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-0-0], > > > Error in fetch Name: FetchRequest; Version: 0; CorrelationId: 0; > ClientId: > > > > kafkaqueue.notifications-ConsumerFetcherThread-kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-0-0; > > > ReplicaId: -1; MaxWait: 100 ms; MinBytes: 1 bytes; RequestInfo: > > > [kafkaqueue.notifications,0] -> PartitionFetchInfo(216003,1048576): > > > java.nio.channels.ClosedByInterruptException > > > at > > > > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > > > [rt.jar:1.7.0_25] > > > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:650) > > > [rt.jar:1.7.0_25] > > > at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at kafka.consumer.SimpleConsumer.connect(SimpleConsumer.scala:43) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at kafka.consumer.SimpleConsumer.reconnect(SimpleConsumer.scala:56) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:77) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:69) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:108) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:108) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:108) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:107) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:107) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:107) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:106) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > 04:56:07,240 INFO [kafka.consumer.ConsumerFetcherThread] > > > > (ConsumerFetcherThread-kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-0-0) > > > > [ConsumerFetcherThread-kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-0-0], > > > Stopped > > > 04:56:07,240 INFO [kafka.consumer.ConsumerFetcherThread] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5_watcher_executor) > > > > [ConsumerFetcherThread-kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-0-0], > > > Shutdown completed > > > 04:56:07,241 INFO [kafka.consumer.ConsumerFetcherManager] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5_watcher_executor) > > > [ConsumerFetcherManager-1383643783834] All connections stopped > > > 04:56:07,241 INFO [kafka.consumer.ZookeeperConsumerConnector] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5_watcher_executor) > > > [kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5], > > > Cleared all relevant queues for this fetcher > > > 04:56:07,242 INFO [kafka.consumer.ZookeeperConsumerConnector] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5_watcher_executor) > > > [kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5], > > > Cleared the data chunks in all the consumer message iterators > > > 04:56:07,242 INFO [kafka.consumer.ZookeeperConsumerConnector] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5_watcher_executor) > > > [kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5], > > > Committing all offsets after clearing the fetcher queues > > > 04:56:07,245 INFO [kafka.consumer.ZookeeperConsumerConnector] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5_watcher_executor) > > > [kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5], > > > Releasing partition ownership > > > 04:56:07,248 INFO [kafka.consumer.ZookeeperConsumerConnector] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5_watcher_executor) > > > [kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5], > > > Consumer > > > kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5 > > > rebalancing the following partitions: ArrayBuffer(0) for topic > > > kafkaqueue.notifications with consumers: > > > > List(kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-0) > > > 04:56:07,249 INFO [kafka.consumer.ZookeeperConsumerConnector] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5_watcher_executor) > > > [kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5], > > > kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-0 > > > attempting to claim partition 0 > > > 04:56:07,252 INFO [kafka.consumer.ZookeeperConsumerConnector] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5_watcher_executor) > > > [kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5], > > > kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-0 > > > successfully owned partition 0 for topic kafkaqueue.notifications > > > 04:56:07,253 INFO [kafka.consumer.ZookeeperConsumerConnector] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5_watcher_executor) > > > [kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5], > > > Updating the cache > > > 04:56:07,254 INFO [proj.hd.core] (clojure-agent-send-off-pool-5) > Invalid > > > node name. Not performing walk. Node name: POC6O003.2:BER:1/19/1 > > > 04:56:07,254 INFO [kafka.consumer.ZookeeperConsumerConnector] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5_watcher_executor) > > > [kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5], > > > Consumer > > > kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5 > > > selected partitions : kafkaqueue.notifications:0: fetched offset = > 216003: > > > consumed offset = 216003 > > > 04:56:07,255 INFO [kafka.consumer.ZookeeperConsumerConnector] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5_watcher_executor) > > > > [kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5], end > > > rebalancing consumer > > > kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5 > try #0 > > > 04:56:07,257 INFO > > > [kafka.consumer.ConsumerFetcherManager$LeaderFinderThread] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-leader-finder-thread) > > > > [kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-leader-finder-thread], > > > Starting > > > 04:56:07,265 INFO [kafka.utils.VerifiableProperties] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-leader-finder-thread) > > > Verifying properties > > > 04:56:07,265 INFO [kafka.utils.VerifiableProperties] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-leader-finder-thread) > > > Property metadata.broker.list is overridden to > test-server.localnet:9092 > > > 04:56:07,266 INFO [kafka.utils.VerifiableProperties] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-leader-finder-thread) > > > Property request.timeout.ms is overridden to 30000 > > > 04:56:07,266 INFO [kafka.utils.VerifiableProperties] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-leader-finder-thread) > > > Property client.id is overridden to kafkaqueue.notifications > > > 04:56:07,267 INFO [kafka.client.ClientUtils$] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-leader-finder-thread) > > > Fetching metadata from broker id:0,host:test-server.localnet,port:9092 > with > > > correlation id 15 for 1 topic(s) Set(kafkaqueue.notifications) > > > 04:56:07,268 INFO [kafka.producer.SyncProducer] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-leader-finder-thread) > > > Connected to test-server.localnet:9092 for producing > > > 04:56:07,272 INFO [kafka.producer.SyncProducer] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-leader-finder-thread) > > > Disconnecting from test-server.localnet:9092 > > > 04:56:07,274 INFO [kafka.consumer.ConsumerFetcherManager] > > > > (kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-leader-finder-thread) > > > [ConsumerFetcherManager-1383643783834] Adding fetcher for partition > > > [kafkaqueue.notifications,0], initOffset 216003 to broker 0 with > fetcherId 0 > > > 04:56:07,275 INFO [kafka.consumer.ConsumerFetcherThread] > > > > (ConsumerFetcherThread-kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-0-0) > > > > [ConsumerFetcherThread-kafkaqueue.notifications_test-server.localnet-1383643783745-3757e7a5-0-0], > > > Starting > > > 04:56:07,281 INFO [proj.hd.core] (clojure-agent-send-off-pool-5) > Invalid > > > node name. Not performing walk. Node name: B2Z_0053.2:Rx > Frequency:1/2/1 > > > 04:56:10,010 INFO [kafka.consumer.ZookeeperConsumerConnector] > > > > (kafkaqueue.topology.updates_test-server.localnet-1383643783747-c7775701_watcher_executor) > > > > [kafkaqueue.topology.updates_test-server.localnet-1383643783747-c7775701], > > > begin rebalancing consumer > > > > kafkaqueue.topology.updates_test-server.localnet-1383643783747-c7775701 try > > > #0 > > > 04:56:10,020 INFO [kafka.consumer.ZookeeperConsumerConnector] > > > > (kafkaqueue.topology.updates_test-server.localnet-1383643783747-c7775701_watcher_executor) > > > > [kafkaqueue.topology.updates_test-server.localnet-1383643783747-c7775701], > > > exception during rebalance : > > > org.I0Itec.zkclient.exception.ZkNoNodeException: > > > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > > > NoNode for > > > > /consumers/kafkaqueue.topology.updates/ids/kafkaqueue.topology.updates_test-server.localnet-1383643783747-c7775701 > > > at > org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) > > > [zkclient-0.3.jar:0.3] > > > at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685) > > > [zkclient-0.3.jar:0.3] > > > at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766) > > > [zkclient-0.3.jar:0.3] > > > at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761) > > > [zkclient-0.3.jar:0.3] > > > at kafka.utils.ZkUtils$.readData(ZkUtils.scala:407) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at kafka.consumer.TopicCount$.constructTopicCount(TopicCount.scala:52) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:401) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:374) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:78) > > > [scala-library-2.9.2.jar:] > > > at > > > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:369) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > at > > > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:326) > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT] > > > Caused by: org.apache.zookeeper.KeeperException$NoNodeException: > > > KeeperErrorCode = NoNode for > > > > /consumers/kafkaqueue.topology.updates/ids/kafkaqueue.topology.updates_test-server.localnet-1383643783747-c7775701 > > > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > > > [zookeeper-3.4.3.jar:3.4.3-1240972] > > > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > > > [zookeeper-3.4.3.jar:3.4.3-1240972] > > > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131) > > > [zookeeper-3.4.3.jar:3.4.3-1240972] > > > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1160) > > > [zookeeper-3.4.3.jar:3.4.3-1240972] > > > at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103) > > > [zkclient-0.3.jar:0.3] > > > at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770) > > > [zkclient-0.3.jar:0.3] > > > at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766) > > > [zkclient-0.3.jar:0.3] > > > at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675) > > > [zkclient-0.3.jar:0.3] > > > ... 9 more > > > > > > The attempts to rebalance occur a few times but eventually,this message > > shows up: "can't rebalance after 4 retries". > > > > Our app is deployed in JBoss and the only way to recover from this is to > > restart JBoss. > > > > This started happening after we went from Kafka 0.7 to Kafka 0.8. Nothing > > else on our system changed except for that. We are connecting to Kafka > > using a Clojure library called clj-kafka ( > > https://github.com/pingles/clj-kafka). clj-kafka was updated to work > with > > Kafka 0.8... > > > > My apologies if this post doesn't belong here. I'm hoping that this may > be > > a generic issue rather than an issue specific to how we're connecting to > > Kafka. Any ideas are appreciated. > > > > Thanks! > >