Hi, I'm using Kafka as a messaging system in my data pipeline. I've a couple of producer processes in my pipeline and Spark Streaming <https://spark.apache.org/docs/2.2.1/streaming-kafka-0-10-integration.html> and Druid's Kafka indexing service <http://druid.io/docs/latest/development/extensions-core/kafka-ingestion.html> as consumers of Kafka. The indexing service spawns 40 new indexing tasks (Kafka consumers) every 15 mins.
The heap memory used on Kafka seems fairly constant for an hour after which it seems to shoot up to the max allocated space. The garbage collection logs of Kafka seems to indicate a memory leak in Kafka. Find attached the plots generated from the GC logs. *Kafka Deployment:* 3 nodes, with 3 topics and 64 partitions per topic *Kafka Runtime jvm parameters:* 8GB Heap Memory 1GC swap Memory Using G1GC MaxGCPauseMilllis=20 InitiatingHeapOccupancyPercent=35 *Kafka Versions Used:* I've used Kafka version 0.10.0, 0.11.0.2 and 1.0.0 and find similar behavior *Questions:* 1) Is this a memory leak on the Kafka side or a misconfiguration of my Kafka cluster? 2) Druid creates new indexing tasks periodically. Does Kafka stably handle large number of consumers being added periodically? 3) As a knock on effect, We also notice kafka partitions going offline periodically after some time with the following error: ERROR [ReplicaFetcherThread-18-2], Error for partition [topic1,2] to broker 2:*org.apache.kafka.common.errors.UnknownTopicOrPartitionException*: This server does not host this topic-partition. (kafka.server.ReplicaFetcherThread) Can someone shed some light on the behavior being seen in my cluster? Please let me know if more details are needed to root cause the behavior being seen. Thanks in advance. Avinash [image: Screen Shot 2018-01-23 at 2.29.04 PM.png][image: Screen Shot 2018-01-23 at 2.29.21 PM.png] -- Excuse brevity and typos. Sent from mobile device. -- Excuse brevity and typos. Sent from mobile device.