Hi Mark,

Sorry for the delay. We're not using a load balancer if it's what you mean by 
LB. 

After applying the change I mentioned last time (the netfilter thing), I 
couldn't see any improvement. We even restart kafka, but since the restart, I 
saw connection count slowly getting higher.
                                                      
Best regards,

Nicolas Berthet 


-----Original Message-----
From: Mark [mailto:static.void....@gmail.com] 
Sent: Saturday, September 28, 2013 12:35 AM
To: users@kafka.apache.org
Subject: Re: Too many open files

No, this is all within the same DC. I think the problem has to do with the LB. 
We've upgraded our producers to point directory to a node for testing and after 
running it all night, I don't see any more connections then there are supposed 
to be. 

Can I ask which LB are you using? We are using A10's

On Sep 26, 2013, at 6:41 PM, Nicolas Berthet <nicolasbert...@maaii.com> wrote:

> Hi Mark,
> 
> I'm using centos 6.2. My file limit is something like 500k, the value is 
> arbitrary.
> 
> One of the thing I changed so far are the TCP keepalive parameters, it had 
> moderate success so far.
> 
> net.ipv4.tcp_keepalive_time
> net.ipv4.tcp_keepalive_intvl
> net.ipv4.tcp_keepalive_probes
> 
> I still notice an abnormal number of ESTABLISHED connections, I've 
> been doing some search and came over this page 
> (http://www.lognormal.com/blog/2012/09/27/linux-tcpip-tuning/)
> 
> I'll change the "net.netfilter.nf_conntrack_tcp_timeout_established" as 
> indicated there, it looks closer to the solution to my issue.
> 
> Are you also experiencing the issue in a cross data center context ? 
> 
> Best regards,
> 
> Nicolas Berthet
> 
> 
> -----Original Message-----
> From: Mark [mailto:static.void....@gmail.com]
> Sent: Friday, September 27, 2013 6:08 AM
> To: users@kafka.apache.org
> Subject: Re: Too many open files
> 
> What OS settings did you change? How high is your huge file limit?
> 
> 
> On Sep 25, 2013, at 10:06 PM, Nicolas Berthet <nicolasbert...@maaii.com> 
> wrote:
> 
>> Jun,
>> 
>> I observed similar kind of things recently. (didn't notice before 
>> because our file limit is huge)
>> 
>> I have a set of brokers in a datacenter, and producers in different data 
>> centers. 
>> 
>> At some point I got disconnections, from the producer perspective I had 
>> something like 15 connections to the broker. On the other hand on the broker 
>> side, I observed hundreds of connections from the producer in an ESTABLISHED 
>> state.
>> 
>> We had some default settings for the socket timeout on the OS level, which 
>> we reduced hoping it would prevent the issue in the future. I'm not sure if 
>> the issue is from the broker or OS configuration though. I'm still keeping 
>> the broker under observation for the time being.
>> 
>> Note that, for clients in the same datacenter, we didn't see this issue, the 
>> socket count matches on both ends.
>> 
>> Nicolas Berthet
>> 
>> -----Original Message-----
>> From: Jun Rao [mailto:jun...@gmail.com]
>> Sent: Thursday, September 26, 2013 12:39 PM
>> To: users@kafka.apache.org
>> Subject: Re: Too many open files
>> 
>> If a client is gone, the broker should automatically close those broken 
>> sockets. Are you using a hardware load balancer?
>> 
>> Thanks,
>> 
>> Jun
>> 
>> 
>> On Wed, Sep 25, 2013 at 4:48 PM, Mark <static.void....@gmail.com> wrote:
>> 
>>> FYI if I kill all producers I don't see the number of open files drop. 
>>> I still see all the ESTABLISHED connections.
>>> 
>>> Is there a broker setting to automatically kill any inactive TCP 
>>> connections?
>>> 
>>> 
>>> On Sep 25, 2013, at 4:30 PM, Mark <static.void....@gmail.com> wrote:
>>> 
>>>> Any other ideas?
>>>> 
>>>> On Sep 25, 2013, at 9:06 AM, Jun Rao <jun...@gmail.com> wrote:
>>>> 
>>>>> We haven't seen any socket leaks with the java producer. If you 
>>>>> have
>>> lots
>>>>> of unexplained socket connections in established mode, one 
>>>>> possible
>>> cause
>>>>> is that the client created new producer instances, but didn't 
>>>>> close the
>>> old
>>>>> ones.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Jun
>>>>> 
>>>>> 
>>>>> On Wed, Sep 25, 2013 at 6:08 AM, Mark <static.void....@gmail.com>
>>> wrote:
>>>>> 
>>>>>> No. We are using the kafka-rb ruby gem producer.
>>>>>> https://github.com/acrosa/kafka-rb
>>>>>> 
>>>>>> Now that you asked that question I need to ask. Is there a 
>>>>>> problem with the java producer?
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>>> On Sep 24, 2013, at 9:01 PM, Jun Rao <jun...@gmail.com> wrote:
>>>>>>> 
>>>>>>> Are you using the java producer client?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Jun
>>>>>>> 
>>>>>>> 
>>>>>>>> On Tue, Sep 24, 2013 at 5:33 PM, Mark 
>>>>>>>> <static.void....@gmail.com>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Our 0.7.2 Kafka cluster keeps crashing with:
>>>>>>>> 
>>>>>>>> 2013-09-24 17:21:47,513 -  [kafka-acceptor:Acceptor@153] - 
>>>>>>>> Error in acceptor
>>>>>>>>    java.io.IOException: Too many open
>>>>>>>> 
>>>>>>>> The obvious fix is to bump up the number of open files but I'm
>>> wondering
>>>>>>>> if there is a leak on the Kafka side and/or our application 
>>>>>>>> side. We currently have the ulimit set to a generous 4096 but 
>>>>>>>> obviously we are hitting this ceiling. What's a recommended value?
>>>>>>>> 
>>>>>>>> We are running rails and our Unicorn workers are connecting to 
>>>>>>>> our
>>> Kafka
>>>>>>>> cluster via round-robin load balancing. We have about 1500 
>>>>>>>> workers to
>>>>>> that
>>>>>>>> would be 1500 connections right there but they should be split 
>>>>>>>> across
>>>>>> our 3
>>>>>>>> nodes. Instead Netstat shows thousands of connections that look 
>>>>>>>> like
>>>>>> this:
>>>>>>>> 
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.1:22503    ESTABLISHED
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.1:48398    ESTABLISHED
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.2:29617    ESTABLISHED
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.1:32444    ESTABLISHED
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.1:34415    ESTABLISHED
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.1:56901    ESTABLISHED
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.2:45349    ESTABLISHED
>>>>>>>> 
>>>>>>>> Has anyone come across this problem before? Is this a 0.7.2 
>>>>>>>> leak, LB misconfiguration... ?
>>>>>>>> 
>>>>>>>> Thanks
>>>>>> 
>>>> 
>>> 
>>> 
> 

Reply via email to