Ok. What I just saw was that when the controller machine reaches around
4100+ files, it crashes. Then I think the controller bounced between 2
other machines, taking them down too, and the circled back to the original
machine.

Paul Lung

On 6/24/14, 10:51 PM, "Lung, Paul" <pl...@ebay.com> wrote:

>The controller machine has 3500 or so, while the other machines have
>around 1600.
>
>Paul Lung
>
>On 6/24/14, 10:31 PM, "Prakash Gowri Shankor" <prakash.shan...@gmail.com>
>wrote:
>
>>How many files does each broker itself have open ? You can find this from
>>'ls -l /proc/<processid>/fd'
>>
>>
>>
>>
>>On Tue, Jun 24, 2014 at 10:18 PM, Lung, Paul <pl...@ebay.com> wrote:
>>
>>> Hi All,
>>>
>>>
>>> I just upgraded my cluster from 0.8.1 to 0.8.1.1. I¹m seeing the
>>>following
>>> error messages on the same 3 brokers once in a while:
>>>
>>>
>>> [2014-06-24 21:43:44,711] ERROR Error in acceptor
>>>(kafka.network.Acceptor)
>>>
>>> java.io.IOException: Too many open files
>>>
>>>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>>
>>>         at
>>> 
>>>sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:1
>>>6
>>>3)
>>>
>>>         at kafka.network.Acceptor.accept(SocketServer.scala:200)
>>>
>>>         at kafka.network.Acceptor.run(SocketServer.scala:154)
>>>
>>>         at java.lang.Thread.run(Thread.java:679)
>>>
>>> [2014-06-24 21:43:44,711] ERROR Error in acceptor
>>>(kafka.network.Acceptor)
>>>
>>> java.io.IOException: Too many open files
>>>
>>>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>>
>>>         at
>>> 
>>>sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:1
>>>6
>>>3)
>>>
>>>         at kafka.network.Acceptor.accept(SocketServer.scala:200)
>>>
>>>         at kafka.network.Acceptor.run(SocketServer.scala:154)
>>>
>>>         at java.lang.Thread.run(Thread.java:679)
>>>
>>> When this happens, these 3 brokers essentially go out of sync when you
>>>do
>>> a ³kafka-topics.sh ‹describe².
>>>
>>> I tracked the number of open files by doing ³watch ­n 1 Œsudo lsof | wc
>>> ­l¹², which basically counts all open files on the system. The numbers
>>>for
>>> the systems are basically in the 6000 range, with one system going to
>>>9000.
>>> I presume the 9000 machine is the controller. Looking at the ulimit of
>>>the
>>> user, both the hard limit and the soft limit for open files is 100,000.
>>> Using sysctl, the max file is fs.file-max = 9774928. So we seem to be
>>>way
>>> under the limit.
>>>
>>> What am I missing here? Is there some JVM limit around 10K open files
>>>or
>>> something?
>>>
>>> Paul Lung
>>>
>

Reply via email to