No, this is all within the same DC. I think the problem has to do with the LB. 
We've upgraded our producers to point directory to a node for testing and after 
running it all night, I don't see any more connections then there are supposed 
to be. 

Can I ask which LB are you using? We are using A10's

On Sep 26, 2013, at 6:41 PM, Nicolas Berthet <nicolasbert...@maaii.com> wrote:

> Hi Mark,
> 
> I'm using centos 6.2. My file limit is something like 500k, the value is 
> arbitrary.
> 
> One of the thing I changed so far are the TCP keepalive parameters, it had 
> moderate success so far.
> 
> net.ipv4.tcp_keepalive_time
> net.ipv4.tcp_keepalive_intvl
> net.ipv4.tcp_keepalive_probes
> 
> I still notice an abnormal number of ESTABLISHED connections, I've been doing 
> some search and came over this page 
> (http://www.lognormal.com/blog/2012/09/27/linux-tcpip-tuning/)
> 
> I'll change the "net.netfilter.nf_conntrack_tcp_timeout_established" as 
> indicated there, it looks closer to the solution to my issue.
> 
> Are you also experiencing the issue in a cross data center context ? 
> 
> Best regards,
> 
> Nicolas Berthet 
> 
> 
> -----Original Message-----
> From: Mark [mailto:static.void....@gmail.com] 
> Sent: Friday, September 27, 2013 6:08 AM
> To: users@kafka.apache.org
> Subject: Re: Too many open files
> 
> What OS settings did you change? How high is your huge file limit?
> 
> 
> On Sep 25, 2013, at 10:06 PM, Nicolas Berthet <nicolasbert...@maaii.com> 
> wrote:
> 
>> Jun,
>> 
>> I observed similar kind of things recently. (didn't notice before 
>> because our file limit is huge)
>> 
>> I have a set of brokers in a datacenter, and producers in different data 
>> centers. 
>> 
>> At some point I got disconnections, from the producer perspective I had 
>> something like 15 connections to the broker. On the other hand on the broker 
>> side, I observed hundreds of connections from the producer in an ESTABLISHED 
>> state.
>> 
>> We had some default settings for the socket timeout on the OS level, which 
>> we reduced hoping it would prevent the issue in the future. I'm not sure if 
>> the issue is from the broker or OS configuration though. I'm still keeping 
>> the broker under observation for the time being.
>> 
>> Note that, for clients in the same datacenter, we didn't see this issue, the 
>> socket count matches on both ends.
>> 
>> Nicolas Berthet
>> 
>> -----Original Message-----
>> From: Jun Rao [mailto:jun...@gmail.com]
>> Sent: Thursday, September 26, 2013 12:39 PM
>> To: users@kafka.apache.org
>> Subject: Re: Too many open files
>> 
>> If a client is gone, the broker should automatically close those broken 
>> sockets. Are you using a hardware load balancer?
>> 
>> Thanks,
>> 
>> Jun
>> 
>> 
>> On Wed, Sep 25, 2013 at 4:48 PM, Mark <static.void....@gmail.com> wrote:
>> 
>>> FYI if I kill all producers I don't see the number of open files drop. 
>>> I still see all the ESTABLISHED connections.
>>> 
>>> Is there a broker setting to automatically kill any inactive TCP 
>>> connections?
>>> 
>>> 
>>> On Sep 25, 2013, at 4:30 PM, Mark <static.void....@gmail.com> wrote:
>>> 
>>>> Any other ideas?
>>>> 
>>>> On Sep 25, 2013, at 9:06 AM, Jun Rao <jun...@gmail.com> wrote:
>>>> 
>>>>> We haven't seen any socket leaks with the java producer. If you 
>>>>> have
>>> lots
>>>>> of unexplained socket connections in established mode, one possible
>>> cause
>>>>> is that the client created new producer instances, but didn't close 
>>>>> the
>>> old
>>>>> ones.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Jun
>>>>> 
>>>>> 
>>>>> On Wed, Sep 25, 2013 at 6:08 AM, Mark <static.void....@gmail.com>
>>> wrote:
>>>>> 
>>>>>> No. We are using the kafka-rb ruby gem producer.
>>>>>> https://github.com/acrosa/kafka-rb
>>>>>> 
>>>>>> Now that you asked that question I need to ask. Is there a problem 
>>>>>> with the java producer?
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>>> On Sep 24, 2013, at 9:01 PM, Jun Rao <jun...@gmail.com> wrote:
>>>>>>> 
>>>>>>> Are you using the java producer client?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Jun
>>>>>>> 
>>>>>>> 
>>>>>>>> On Tue, Sep 24, 2013 at 5:33 PM, Mark 
>>>>>>>> <static.void....@gmail.com>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Our 0.7.2 Kafka cluster keeps crashing with:
>>>>>>>> 
>>>>>>>> 2013-09-24 17:21:47,513 -  [kafka-acceptor:Acceptor@153] - Error 
>>>>>>>> in acceptor
>>>>>>>>    java.io.IOException: Too many open
>>>>>>>> 
>>>>>>>> The obvious fix is to bump up the number of open files but I'm
>>> wondering
>>>>>>>> if there is a leak on the Kafka side and/or our application 
>>>>>>>> side. We currently have the ulimit set to a generous 4096 but 
>>>>>>>> obviously we are hitting this ceiling. What's a recommended value?
>>>>>>>> 
>>>>>>>> We are running rails and our Unicorn workers are connecting to 
>>>>>>>> our
>>> Kafka
>>>>>>>> cluster via round-robin load balancing. We have about 1500 
>>>>>>>> workers to
>>>>>> that
>>>>>>>> would be 1500 connections right there but they should be split 
>>>>>>>> across
>>>>>> our 3
>>>>>>>> nodes. Instead Netstat shows thousands of connections that look 
>>>>>>>> like
>>>>>> this:
>>>>>>>> 
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.1:22503    ESTABLISHED
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.1:48398    ESTABLISHED
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.2:29617    ESTABLISHED
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.1:32444    ESTABLISHED
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.1:34415    ESTABLISHED
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.1:56901    ESTABLISHED
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.2:45349    ESTABLISHED
>>>>>>>> 
>>>>>>>> Has anyone come across this problem before? Is this a 0.7.2 
>>>>>>>> leak, LB misconfiguration... ?
>>>>>>>> 
>>>>>>>> Thanks
>>>>>> 
>>>> 
>>> 
>>> 
> 

Reply via email to