Without knowing the intricacies of Kafka, i think the default open file descriptors is 1024 on unix. This can be changed by setting a higher ulimit value ( typically 8192 but sometimes even 100000 ). Before modifying the ulimit I would recommend you check the number of sockets stuck in TIME_WAIT mode. In this case, it looks like the broker has too many open sockets. This could be because you have a rogue client connecting and disconnecting repeatedly. You might have to reduce the TIME_WAIT state to 30 seconds or lower.
On Wed, Jun 25, 2014 at 10:19 AM, Lung, Paul <pl...@ebay.com> wrote: > Hi Prakash, > > How many open files do you expect a broker to be able to handle? It seems > like this broker is crashing at around 4100 or so open files. > > Thanks, > Paul Lung > > On 6/24/14, 11:08 PM, "Lung, Paul" <pl...@ebay.com> wrote: > > >Ok. What I just saw was that when the controller machine reaches around > >4100+ files, it crashes. Then I think the controller bounced between 2 > >other machines, taking them down too, and the circled back to the original > >machine. > > > >Paul Lung > > > >On 6/24/14, 10:51 PM, "Lung, Paul" <pl...@ebay.com> wrote: > > > >>The controller machine has 3500 or so, while the other machines have > >>around 1600. > >> > >>Paul Lung > >> > >>On 6/24/14, 10:31 PM, "Prakash Gowri Shankor" <prakash.shan...@gmail.com > > > >>wrote: > >> > >>>How many files does each broker itself have open ? You can find this > >>>from > >>>'ls -l /proc/<processid>/fd' > >>> > >>> > >>> > >>> > >>>On Tue, Jun 24, 2014 at 10:18 PM, Lung, Paul <pl...@ebay.com> wrote: > >>> > >>>> Hi All, > >>>> > >>>> > >>>> I just upgraded my cluster from 0.8.1 to 0.8.1.1. I¹m seeing the > >>>>following > >>>> error messages on the same 3 brokers once in a while: > >>>> > >>>> > >>>> [2014-06-24 21:43:44,711] ERROR Error in acceptor > >>>>(kafka.network.Acceptor) > >>>> > >>>> java.io.IOException: Too many open files > >>>> > >>>> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > >>>> > >>>> at > >>>> > >>>>sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java: > >>>>1 > >>>>6 > >>>>3) > >>>> > >>>> at kafka.network.Acceptor.accept(SocketServer.scala:200) > >>>> > >>>> at kafka.network.Acceptor.run(SocketServer.scala:154) > >>>> > >>>> at java.lang.Thread.run(Thread.java:679) > >>>> > >>>> [2014-06-24 21:43:44,711] ERROR Error in acceptor > >>>>(kafka.network.Acceptor) > >>>> > >>>> java.io.IOException: Too many open files > >>>> > >>>> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > >>>> > >>>> at > >>>> > >>>>sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java: > >>>>1 > >>>>6 > >>>>3) > >>>> > >>>> at kafka.network.Acceptor.accept(SocketServer.scala:200) > >>>> > >>>> at kafka.network.Acceptor.run(SocketServer.scala:154) > >>>> > >>>> at java.lang.Thread.run(Thread.java:679) > >>>> > >>>> When this happens, these 3 brokers essentially go out of sync when you > >>>>do > >>>> a ³kafka-topics.sh ‹describe². > >>>> > >>>> I tracked the number of open files by doing ³watch n 1 Œsudo lsof | > >>>>wc > >>>> l¹², which basically counts all open files on the system. The numbers > >>>>for > >>>> the systems are basically in the 6000 range, with one system going to > >>>>9000. > >>>> I presume the 9000 machine is the controller. Looking at the ulimit of > >>>>the > >>>> user, both the hard limit and the soft limit for open files is > >>>>100,000. > >>>> Using sysctl, the max file is fs.file-max = 9774928. So we seem to be > >>>>way > >>>> under the limit. > >>>> > >>>> What am I missing here? Is there some JVM limit around 10K open files > >>>>or > >>>> something? > >>>> > >>>> Paul Lung > >>>> > >> > > > >