Just wanted to close the loop on this. It seems the consumer offset logs might have been corrupted by the system restart. Deleting the topic logs and restarting the Kafka service cleared up the problem.
Thanks, Dave On 1/12/17, 2:29 PM, "Dave Hamilton" <dhamil...@nanigans.com> wrote: Hello, we ran into a memory issue on a Kafka 0.10.0.1 broker we are running that required a system restart. Since bringing Kafka back up it seems the consumers are having issues finding their coordinators. Here are some errors we’ve seen in our server logs after restarting: [2017-01-12 19:02:10,178] ERROR [Group Metadata Manager on Broker 0]: Error in loading offsets from [__consumer_offsets,40] (kafka.coordinator.GroupMetadataManager) java.nio.channels.ClosedChannelException at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99) at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:678) at kafka.log.FileMessageSet.searchFor(FileMessageSet.scala:135) at kafka.log.LogSegment.translateOffset(LogSegment.scala:106) at kafka.log.LogSegment.read(LogSegment.scala:127) at kafka.log.Log.read(Log.scala:532) at kafka.coordinator.GroupMetadataManager$$anonfun$kafka$coordinator$GroupMetadataManager$$loadGroupsAndOffsets$1$1.apply$mcV$sp(GroupMetadataManager.scala:380) at kafka.coordinator.GroupMetadataManager$$anonfun$kafka$coordinator$GroupMetadataManager$$loadGroupsAndOffsets$1$1.apply(GroupMetadataManager.scala:374) at kafka.coordinator.GroupMetadataManager$$anonfun$kafka$coordinator$GroupMetadataManager$$loadGroupsAndOffsets$1$1.apply(GroupMetadataManager.scala:374) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231) at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:239) at kafka.coordinator.GroupMetadataManager.kafka$coordinator$GroupMetadataManager$$loadGroupsAndOffsets$1(GroupMetadataManager.scala:374) at kafka.coordinator.GroupMetadataManager$$anonfun$loadGroupsForPartition$1.apply$mcV$sp(GroupMetadataManager.scala:353) at kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110) at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:56) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) [2017-01-12 19:03:56,468] ERROR [KafkaApi-0] Error when handling request {topics=[__consumer_offsets]} (kafka.server.KafkaApis) kafka.admin.AdminOperationException: replication factor: 1 larger than available brokers: 0 at kafka.admin.AdminUtils$.assignReplicasToBrokers(AdminUtils.scala:117) at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:403) at kafka.server.KafkaApis.kafka$server$KafkaApis$$createTopic(KafkaApis.scala:629) at kafka.server.KafkaApis.kafka$server$KafkaApis$$createGroupMetadataTopic(KafkaApis.scala:651) at kafka.server.KafkaApis$$anonfun$29.apply(KafkaApis.scala:668) at kafka.server.KafkaApis$$anonfun$29.apply(KafkaApis.scala:666) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.Set$Set1.foreach(Set.scala:94) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47) at scala.collection.SetLike$class.map(SetLike.scala:92) at scala.collection.AbstractSet.map(Set.scala:47) at kafka.server.KafkaApis.getTopicMetadata(KafkaApis.scala:666) at kafka.server.KafkaApis.handleTopicMetadataRequest(KafkaApis.scala:727) at kafka.server.KafkaApis.handle(KafkaApis.scala:79) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) at java.lang.Thread.run(Thread.java:744) Also running the kafka-consumer-groups.sh on a consumer group returns the following: Error while executing consumer group command This is not the correct coordinator for this group. org.apache.kafka.common.errors.NotCoordinatorForGroupException: This is not the correct coordinator for this group. We also see the following logs when trying to restart a Kafka connector: [2017-01-12 17:44:07,941] INFO Discovered coordinator lxskfkdal501.nanigans.com:9092 (id: 2147483647 rack: null) for group connect-paid_events_s3. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:505) [2017-01-12 17:44:07,941] INFO (Re-)joining group connect-paid_events_s3 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:326) [2017-01-12 17:44:07,941] INFO Marking the coordinator lxskfkdal501.nanigans.com:9092 (id: 2147483647 rack: null) dead for group connect-paid_events_s3 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:542) Does anyone have recommendations for what we can do to recover from this issue? Thanks, Dave