Furher investigations:
I have compared open files/connections of the different nodes. Same count in
real open files (data dir files) and established connections on all nodes.
But the affected node has a lot of "CLOSE_WAIT" connections (many thousends) to
IPs of external clients (no specific ip). The other nodes less than 10.
Hi,
I’m running a Kafka cluster with many topics and constant input of data.
The cluster is running for over one year but now (since 2 weeks) there is
one node where I see a steady increase of open file descriptors of the Kafka
server process.
All other nodes have a constant number of this metric. Topics/partitions
are distributed equal over all nodes, same hardware.
The open file limit was reached last week and the node worked normally
after restart and recovery…but since the restart the file descriptors are
increasing again..
Any idea or things to do to find out more?
Version: 0.10.2.1
Thanks,
Michael