Hello Hadoopers, When I run HDP 2.1/HBase 0.98.0/Hadoop/2.4.0, I always got the fatal problem: DFSClient does not closing a closed socket resulting in thousand of CLOSE_WAIT sockets. Have you guys got same issue, if that please share to me? Thanks a lot. I also create a issue HDFS-6973 for this.
HBase as HDFS Client dose not close a dead connection with the datanode. This resulting in over 30K+ CLOSE_WAIT and at some point HBase can not connect to the datanode because too many mapped sockets from one host to another on the same port:50010. After I restart all RSs, the count of CLOSE_WAIT will increase always. $ netstat -an|grep CLOSE_WAIT|wc -l 2545 netstat -nap|grep CLOSE_WAIT|grep 6569|wc -l 2545 ps -ef|grep 6569 hbase 6569 6556 21 Aug25 ? 09:52:33 /opt/jdk1.6.0_25/bin/java -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC I aslo have reviewed these issues: HDFS-5697 <https://issues.apache.org/jira/browse/HDFS-5697> HDFS-5671 <https://issues.apache.org/jira/browse/HDFS-5671> HDFS-1836 <https://issues.apache.org/jira/browse/HDFS-1836> <https://issues.apache.org/jira/browse/HBASE-9393> HBASE-9393 I found in HBase 0.98/Hadoop 2.4.0 source codes of these patchs have been added. But I donot understand why HBase 0.98/Hadoop 2.4.0 also have this isssue. Please check. Thanks a lot. These codes have been added into BlockReaderFactory.getRemoteBlockReaderFromTcp(). Another bug maybe lead my problem, BlockReaderFactory.java // Some comments here private BlockReader getRemoteBlockReaderFromTcp() throws IOException { if (LOG.isTraceEnabled()) { LOG.trace(this + ": trying to create a remote block reader from a " + "TCP socket"); } BlockReader blockReader = null; while (true) { BlockReaderPeer curPeer = null; Peer peer = null; try { curPeer = nextTcpPeer(); if (curPeer == null) break; if (curPeer.fromCache) remainingCacheTries--; peer = curPeer.peer; blockReader = getRemoteBlockReader(peer); return blockReader; } catch (IOException ioe) { if (isSecurityException(ioe)) { if (LOG.isTraceEnabled()) { LOG.trace(this + ": got security exception while constructing " + "a remote block reader from " + peer, ioe); } throw ioe; } if ((curPeer != null) && curPeer.fromCache) { // Handle an I/O error we got when using a cached peer. These are // considered less serious, because the underlying socket may be // stale. if (LOG.isDebugEnabled()) { LOG.debug("Closed potentially stale remote peer " + peer, ioe); } } else { // Handle an I/O error we got when using a newly created peer. LOG.warn("I/O error constructing remote block reader.", ioe); throw ioe; } } finally { if (blockReader == null) { IOUtils.cleanup(LOG, peer); } } } return null; } --------------------------------------------------------------------------------------------------- Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) is intended only for the use of the intended recipient and may be confidential and/or privileged of Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is not the intended recipient, unauthorized use, forwarding, printing, storing, disclosure or copying is strictly prohibited, and may be unlawful.If you have received this communication in error,please immediately notify the sender by return e-mail, and delete the original message and all copies from your system. Thank you. ---------------------------------------------------------------------------------------------------