We're running Ubuntu 14.04, HDFS 2.6.0, ZooKeeper 3.4.6, and Accumulo
1.8.1.  I'm using `lsof -i` and grepping for the tserver PID to list all
the connections.  Just now there are ~25k connections for this one tserver,
of which 99.9% of them are all writing to various DataNodes on port 50010.
It's split about 50/50 for connections that are CLOSED_WAIT and ones that
are ESTABLISHED.  No special RPC configuration.

On Wed, Jan 24, 2018 at 7:53 PM, Josh Elser <[email protected]> wrote:

> +1 to looking at the remote end of the socket and see where they're
> going/coming to/from. I've seen a few HDFS JIRA issues filed about sockets
> left in CLOSED_WAIT.
>
> Lucky you, this is a fun Linux rabbit hole to go down :)
>
> (https://blog.cloudflare.com/this-is-strictly-a-violation-of
> -the-tcp-specification/ covers some of the technical details)
>
> On 1/24/18 6:37 PM, Christopher wrote:
>
>> I haven't seen that, but I'm curious what OS, Hadoop, ZooKeeper, and
>> Accumulo version you're running. I'm assuming you verified that it was the
>> TabletServer process holding these TCP sockets open using `netstat -p` and
>> cross-referencing the PID with `jps -ml` (or similar)? Are you able to
>> confirm based on the port number that these were Thrift connections or
>> could they be ZooKeeper or Hadoop connections? Do you have any special
>> non-default Accumulo RPC configuration (SSL or SASL)?
>>
>> On Wed, Jan 24, 2018 at 3:46 PM Adam J. Shook <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>     Hello all,
>>
>>     Has anyone come across an issue with a TabletServer occupying a
>>     large number of ports in a CLOSED_WAIT state?  'Normal' number of
>>     used ports on a 12-node cluster are around 12,000 to 20,000 ports.
>>  In one instance, there were over 68k and it was affecting other
>>     applications from getting a free port and they would fail to start
>>     (which is how we found this in the first place).
>>
>>     Thank you,
>>     --Adam
>>
>>

Reply via email to