Hi Guys,
I’m seeing the following errors from the 0.8.1.1 broker. This occurs most often
on the Controller machine. Then the controller process crashes, and the
controller bounces to other machines, which causes those machines to crash.
Looking at the file descriptors being held by the process, it’s only around
4000 or so(looking at . There aren’t a whole lot of connections in TIME_WAIT
states, and I’ve increased the ephemeral port range to “16000 – 64000” via
"/proc/sys/net/ipv4/ip_local_port_range”. I’ve written a Java test program to
see how many sockets and files I can open. The socket is definitely limited by
the ephemeral port range, which was around 22K at the time. But I
can open tons of files, since the open file limit of the user is set to 100K.
So given that I can theoretically open 48K sockets and probably 90K files, and
I only see around 4K total for the Kafka broker, I’m really confused as to why
I’m seeing this error. Is there some internal Kafka limit that I don’t know
about?
Paul Lung
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)
at kafka.network.Acceptor.accept(SocketServer.scala:200)
at kafka.network.Acceptor.run(SocketServer.scala:154)
at java.lang.Thread.run(Thread.java:679)
[2014-07-08 13:07:21,534] ERROR Error in acceptor (kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)
at kafka.network.Acceptor.accept(SocketServer.scala:200)
at kafka.network.Acceptor.run(SocketServer.scala:154)
at java.lang.Thread.run(Thread.java:679)
[2014-07-08 13:07:21,563] ERROR [ReplicaFetcherThread-3-2124488], Error for
partition [bom__021____active_80__32__mini____activeitem_lvs_qn,0] to broker
2124488:class kafka.common.NotLeaderForPartitionException
(kafka.server.ReplicaFetcherThread)
[2014-07-08 13:07:21,558] FATAL [Replica Manager on Broker 2140112]: Error
writing to highwatermark file: (kafka.server.ReplicaManager)
java.io.FileNotFoundException:
/ebay/cronus/software/cronusapp_home/kafka/kafka-logs/replication-offset-checkpoint.tmp
(Too many open files)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:209)
at java.io.FileOutputStream.<init>(FileOutputStream.java:160)
at java.io.FileWriter.<init>(FileWriter.java:90)
at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
at
kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(ReplicaManager.scala:447)
at
kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(ReplicaManager.scala:444)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
at
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at
kafka.server.ReplicaManager.checkpointHighWatermarks(ReplicaManager.scala:444)
at
kafka.server.ReplicaManager$$anonfun$1.apply$mcV$sp(ReplicaManager.scala:94)
at kafka.utils.KafkaScheduler$$anon$1.run(KafkaScheduler.scala:100)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)