[ https://issues.apache.org/jira/browse/HDFS-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072108#comment-15072108 ]
zdtjkl commented on HDFS-6973: ------------------------------ forbid IPv6 function in all nodes, I meet the problem,you try it > DFSClient does not closing a closed socket resulting in thousand of > CLOSE_WAIT sockets > -------------------------------------------------------------------------------------- > > Key: HDFS-6973 > URL: https://issues.apache.org/jira/browse/HDFS-6973 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client > Affects Versions: 2.4.0 > Environment: RHEL 6.3 -HDP 2.1 -6 RegionServers/Datanode -18T per > node -3108Regions > Reporter: steven xu > > HBase as HDFS Client dose not close a dead connection with the datanode. > This resulting in over 30K+ CLOSE_WAIT and at some point HBase can not > connect to the datanode because too many mapped sockets from one host to > another on the same port:50010. > After I restart all RSs, the count of CLOSE_WAIT will increase always. > $ netstat -an|grep CLOSE_WAIT|wc -l > 2545 > netstat -nap|grep CLOSE_WAIT|grep 6569|wc -l > 2545 > ps -ef|grep 6569 > hbase 6569 6556 21 Aug25 ? 09:52:33 /opt/jdk1.6.0_25/bin/java > -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m > -XX:+UseConcMarkSweepGC > I aslo have reviewed these issues: > [HDFS-5697] > [HDFS-5671] > [HDFS-1836] > [HBASE-9393] > I found in HBase 0.98/Hadoop 2.4.0 source codes of these patchs have been > added. > But I donot understand why HBase 0.98/Hadoop 2.4.0 also have this isssue. > Please check. Thanks a lot. > These codes have been added into > BlockReaderFactory.getRemoteBlockReaderFromTcp(). Another bug maybe lead my > problem, > {code:title=BlockReaderFactory.java|borderStyle=solid} > // Some comments here > private BlockReader getRemoteBlockReaderFromTcp() throws IOException { > if (LOG.isTraceEnabled()) { > LOG.trace(this + ": trying to create a remote block reader from a " + > "TCP socket"); > } > BlockReader blockReader = null; > while (true) { > BlockReaderPeer curPeer = null; > Peer peer = null; > try { > curPeer = nextTcpPeer(); > if (curPeer == null) break; > if (curPeer.fromCache) remainingCacheTries--; > peer = curPeer.peer; > blockReader = getRemoteBlockReader(peer); > return blockReader; > } catch (IOException ioe) { > if (isSecurityException(ioe)) { > if (LOG.isTraceEnabled()) { > LOG.trace(this + ": got security exception while constructing " + > "a remote block reader from " + peer, ioe); > } > throw ioe; > } > if ((curPeer != null) && curPeer.fromCache) { > // Handle an I/O error we got when using a cached peer. These are > // considered less serious, because the underlying socket may be > // stale. > if (LOG.isDebugEnabled()) { > LOG.debug("Closed potentially stale remote peer " + peer, ioe); > } > } else { > // Handle an I/O error we got when using a newly created peer. > LOG.warn("I/O error constructing remote block reader.", ioe); > throw ioe; > } > } finally { > if (blockReader == null) { > IOUtils.cleanup(LOG, peer); > } > } > } > return null; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)