Hi Mark,

I'm using centos 6.2. My file limit is something like 500k, the value is 
arbitrary.

One of the thing I changed so far are the TCP keepalive parameters, it had 
moderate success so far.

net.ipv4.tcp_keepalive_time
net.ipv4.tcp_keepalive_intvl
net.ipv4.tcp_keepalive_probes

I still notice an abnormal number of ESTABLISHED connections, I've been doing 
some search and came over this page 
(http://www.lognormal.com/blog/2012/09/27/linux-tcpip-tuning/)

I'll change the "net.netfilter.nf_conntrack_tcp_timeout_established" as 
indicated there, it looks closer to the solution to my issue.

Are you also experiencing the issue in a cross data center context ? 

Best regards,

Nicolas Berthet 


-----Original Message-----
From: Mark [mailto:static.void....@gmail.com] 
Sent: Friday, September 27, 2013 6:08 AM
To: users@kafka.apache.org
Subject: Re: Too many open files

What OS settings did you change? How high is your huge file limit?


On Sep 25, 2013, at 10:06 PM, Nicolas Berthet <nicolasbert...@maaii.com> wrote:

> Jun,
> 
> I observed similar kind of things recently. (didn't notice before 
> because our file limit is huge)
> 
> I have a set of brokers in a datacenter, and producers in different data 
> centers. 
> 
> At some point I got disconnections, from the producer perspective I had 
> something like 15 connections to the broker. On the other hand on the broker 
> side, I observed hundreds of connections from the producer in an ESTABLISHED 
> state.
> 
> We had some default settings for the socket timeout on the OS level, which we 
> reduced hoping it would prevent the issue in the future. I'm not sure if the 
> issue is from the broker or OS configuration though. I'm still keeping the 
> broker under observation for the time being.
> 
> Note that, for clients in the same datacenter, we didn't see this issue, the 
> socket count matches on both ends.
> 
> Nicolas Berthet
> 
> -----Original Message-----
> From: Jun Rao [mailto:jun...@gmail.com]
> Sent: Thursday, September 26, 2013 12:39 PM
> To: users@kafka.apache.org
> Subject: Re: Too many open files
> 
> If a client is gone, the broker should automatically close those broken 
> sockets. Are you using a hardware load balancer?
> 
> Thanks,
> 
> Jun
> 
> 
> On Wed, Sep 25, 2013 at 4:48 PM, Mark <static.void....@gmail.com> wrote:
> 
>> FYI if I kill all producers I don't see the number of open files drop. 
>> I still see all the ESTABLISHED connections.
>> 
>> Is there a broker setting to automatically kill any inactive TCP 
>> connections?
>> 
>> 
>> On Sep 25, 2013, at 4:30 PM, Mark <static.void....@gmail.com> wrote:
>> 
>>> Any other ideas?
>>> 
>>> On Sep 25, 2013, at 9:06 AM, Jun Rao <jun...@gmail.com> wrote:
>>> 
>>>> We haven't seen any socket leaks with the java producer. If you 
>>>> have
>> lots
>>>> of unexplained socket connections in established mode, one possible
>> cause
>>>> is that the client created new producer instances, but didn't close 
>>>> the
>> old
>>>> ones.
>>>> 
>>>> Thanks,
>>>> 
>>>> Jun
>>>> 
>>>> 
>>>> On Wed, Sep 25, 2013 at 6:08 AM, Mark <static.void....@gmail.com>
>> wrote:
>>>> 
>>>>> No. We are using the kafka-rb ruby gem producer.
>>>>> https://github.com/acrosa/kafka-rb
>>>>> 
>>>>> Now that you asked that question I need to ask. Is there a problem 
>>>>> with the java producer?
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>>> On Sep 24, 2013, at 9:01 PM, Jun Rao <jun...@gmail.com> wrote:
>>>>>> 
>>>>>> Are you using the java producer client?
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Jun
>>>>>> 
>>>>>> 
>>>>>>> On Tue, Sep 24, 2013 at 5:33 PM, Mark 
>>>>>>> <static.void....@gmail.com>
>>>>> wrote:
>>>>>>> 
>>>>>>> Our 0.7.2 Kafka cluster keeps crashing with:
>>>>>>> 
>>>>>>> 2013-09-24 17:21:47,513 -  [kafka-acceptor:Acceptor@153] - Error 
>>>>>>> in acceptor
>>>>>>>     java.io.IOException: Too many open
>>>>>>> 
>>>>>>> The obvious fix is to bump up the number of open files but I'm
>> wondering
>>>>>>> if there is a leak on the Kafka side and/or our application 
>>>>>>> side. We currently have the ulimit set to a generous 4096 but 
>>>>>>> obviously we are hitting this ceiling. What's a recommended value?
>>>>>>> 
>>>>>>> We are running rails and our Unicorn workers are connecting to 
>>>>>>> our
>> Kafka
>>>>>>> cluster via round-robin load balancing. We have about 1500 
>>>>>>> workers to
>>>>> that
>>>>>>> would be 1500 connections right there but they should be split 
>>>>>>> across
>>>>> our 3
>>>>>>> nodes. Instead Netstat shows thousands of connections that look 
>>>>>>> like
>>>>> this:
>>>>>>> 
>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>> 10.99.99.1:22503    ESTABLISHED
>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>> 10.99.99.1:48398    ESTABLISHED
>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>> 10.99.99.2:29617    ESTABLISHED
>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>> 10.99.99.1:32444    ESTABLISHED
>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>> 10.99.99.1:34415    ESTABLISHED
>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>> 10.99.99.1:56901    ESTABLISHED
>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>> 10.99.99.2:45349    ESTABLISHED
>>>>>>> 
>>>>>>> Has anyone come across this problem before? Is this a 0.7.2 
>>>>>>> leak, LB misconfiguration... ?
>>>>>>> 
>>>>>>> Thanks
>>>>> 
>>> 
>> 
>> 

Reply via email to