Without knowing the intricacies of Kafka, i think the default open file
descriptors is 1024 on unix. This can be changed by setting a higher ulimit
value ( typically 8192 but sometimes even 100000 ).
Before modifying the ulimit I would recommend you check the number of
sockets stuck in TIME_WAIT mode. In this case, it looks like the broker has
too many open sockets. This could be because you have a rogue client
connecting and disconnecting repeatedly.
You might have to reduce the TIME_WAIT state to 30 seconds or lower.



On Wed, Jun 25, 2014 at 10:19 AM, Lung, Paul <pl...@ebay.com> wrote:

> Hi Prakash,
>
> How many open files do you expect a broker to be able to handle? It seems
> like this broker is crashing at around 4100 or so open files.
>
> Thanks,
> Paul Lung
>
> On 6/24/14, 11:08 PM, "Lung, Paul" <pl...@ebay.com> wrote:
>
> >Ok. What I just saw was that when the controller machine reaches around
> >4100+ files, it crashes. Then I think the controller bounced between 2
> >other machines, taking them down too, and the circled back to the original
> >machine.
> >
> >Paul Lung
> >
> >On 6/24/14, 10:51 PM, "Lung, Paul" <pl...@ebay.com> wrote:
> >
> >>The controller machine has 3500 or so, while the other machines have
> >>around 1600.
> >>
> >>Paul Lung
> >>
> >>On 6/24/14, 10:31 PM, "Prakash Gowri Shankor" <prakash.shan...@gmail.com
> >
> >>wrote:
> >>
> >>>How many files does each broker itself have open ? You can find this
> >>>from
> >>>'ls -l /proc/<processid>/fd'
> >>>
> >>>
> >>>
> >>>
> >>>On Tue, Jun 24, 2014 at 10:18 PM, Lung, Paul <pl...@ebay.com> wrote:
> >>>
> >>>> Hi All,
> >>>>
> >>>>
> >>>> I just upgraded my cluster from 0.8.1 to 0.8.1.1. I¹m seeing the
> >>>>following
> >>>> error messages on the same 3 brokers once in a while:
> >>>>
> >>>>
> >>>> [2014-06-24 21:43:44,711] ERROR Error in acceptor
> >>>>(kafka.network.Acceptor)
> >>>>
> >>>> java.io.IOException: Too many open files
> >>>>
> >>>>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> >>>>
> >>>>         at
> >>>>
> >>>>sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:
> >>>>1
> >>>>6
> >>>>3)
> >>>>
> >>>>         at kafka.network.Acceptor.accept(SocketServer.scala:200)
> >>>>
> >>>>         at kafka.network.Acceptor.run(SocketServer.scala:154)
> >>>>
> >>>>         at java.lang.Thread.run(Thread.java:679)
> >>>>
> >>>> [2014-06-24 21:43:44,711] ERROR Error in acceptor
> >>>>(kafka.network.Acceptor)
> >>>>
> >>>> java.io.IOException: Too many open files
> >>>>
> >>>>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> >>>>
> >>>>         at
> >>>>
> >>>>sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:
> >>>>1
> >>>>6
> >>>>3)
> >>>>
> >>>>         at kafka.network.Acceptor.accept(SocketServer.scala:200)
> >>>>
> >>>>         at kafka.network.Acceptor.run(SocketServer.scala:154)
> >>>>
> >>>>         at java.lang.Thread.run(Thread.java:679)
> >>>>
> >>>> When this happens, these 3 brokers essentially go out of sync when you
> >>>>do
> >>>> a ³kafka-topics.sh ‹describe².
> >>>>
> >>>> I tracked the number of open files by doing ³watch ­n 1 Œsudo lsof |
> >>>>wc
> >>>> ­l¹², which basically counts all open files on the system. The numbers
> >>>>for
> >>>> the systems are basically in the 6000 range, with one system going to
> >>>>9000.
> >>>> I presume the 9000 machine is the controller. Looking at the ulimit of
> >>>>the
> >>>> user, both the hard limit and the soft limit for open files is
> >>>>100,000.
> >>>> Using sysctl, the max file is fs.file-max = 9774928. So we seem to be
> >>>>way
> >>>> under the limit.
> >>>>
> >>>> What am I missing here? Is there some JVM limit around 10K open files
> >>>>or
> >>>> something?
> >>>>
> >>>> Paul Lung
> >>>>
> >>
> >
>
>

Reply via email to