5-node Kafka cluster, bare metal, Ubuntu 14.04.x LTS with 64GB RAM, 8-core,
960GB SSD boxes and a single node in cluster is filling logs with the following:
[2016-09-12 09:34:49,522] ERROR Error while accepting connection
(kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:323)
at kafka.network.Acceptor.run(SocketServer.scala:268)
at java.lang.Thread.run(Thread.java:745)
No other nodes in cluster have this issue. Separate application server has
consumers/producers using librdkafka + confluent kafka python library with a
few million messages published to under 100 topics.
For days now the /var/log/kafka/kafka.server.log.N are filling up server with
this message and using up all space on only a single server node in cluster. I
have soft/hard limits at 65,535 for all users so > ulimit -n reveals 65535
Is there a setting I should add from librdkafka config in the Python producer
clients to shorten socket connections even further to avoid this or something
else going on?
Should I write this as issue in Github repo and if so, which project?
Thanks!