Bob Hansen created HDFS-8855:
--------------------------------

             Summary: Webhdfs client leaks active NameNode connections
                 Key: HDFS-8855
                 URL: https://issues.apache.org/jira/browse/HDFS-8855
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: webhdfs
         Environment: HDP 2.2
            Reporter: Bob Hansen


The attached script simulates a process opening ~50 files via webhdfs and 
performing random reads.  Note that there are at most 50 concurrent reads, and 
all webhdfs sessions are kept open.  Each read is ~64k at a random position.  

The script periodically (once per second) shells into the NameNode and produces 
a summary of the socket states.  For my test cluster with 5 nodes, it took ~30 
seconds for the NameNode to have ~25000 active connections and fails.

It appears that each request to the webhdfs client is opening a new connection 
to the NameNode and keeping it open after the request is complete.  If the 
process continues to run, eventually (~30-60 seconds), all of the open 
connections are closed and the NameNode recovers.  

This smells like SoftReference reaping.  Are we using SoftReferences in the 
webhdfs client to cache NameNode connections but never re-using them?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to