Hello:

We are running a data pipeline application stack using Kafka 0.8.2.2 in
production. We have been seeing intermittent CLOSE_WAIT on our kafka
brokers frequently and they fill up the file handles pretty quickly. By the
time the open file count reaches around 40K, the node becomes unresponsive
and we see huge GC pauses. The only way out has been restart of the node.
When the nodes are working fine, the average open files in the nodes stay
around 6K during peak load and 3K at average.

Configurations:
- 5 broker cluster (Single node spec: 24 core processors, 250 GB RAM, 256GB
SSD)
- 20 topics and 1100 partitions across all topics
- Replication factor of 3
- Java based KafkaProducer and high level consumers
(ZookeeperConsumerConnector)
- GC params { -Xmx32G -Xms4G -server -XX:MetaspaceSize=96m -XX:+UseG1GC
-XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35
-XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50
-XX:MaxMetaspaceFreeRatio=80 }

Any pointers here? Appreciate your help.

Thanks,
Bharath

Reply via email to