机器ip: LF-HBASE-VENUS-149106.hadoop.jd.local
jstack信息:/data0/hbase-logs/46384.out 2017-01-03 23:44 GMT+08:00 Weizhan Zeng <qgweiz...@gmail.com>: > My HBase version is 1.1.6 And Hadoop version is 2.6.1 。 I had jstack info > , I can give it to you tomorrow after I arrived my company . > > I guess the reason why "Too many open files" is too many storeFiles . I > saw my monitor and found storeFileCount is 33K , but ulimit is 65535 。 The > reason why so many stofeFiles seens compaction not worked. > > > > But confused me is why rs not exit . > > > 2017-01-03 23:05 GMT+08:00 Ted Yu <yuzhih...@gmail.com>: > >> Switching to user@ >> >> What's the version of hbase / hadoop you're using ? >> >> Before issuing, "kill -9", did you capture stack trace of the region >> server >> process ? >> >> Have you read 'Limits on Number of Files and Processes' under >> http://hbase.apache.org/book.html#basic.prerequisites ? >> >> On Tue, Jan 3, 2017 at 6:56 AM, Weizhan Zeng <qgweiz...@gmail.com> wrote: >> >> > Hi guys: >> > I met an issue on one of my RS. >> > After SocketException happend, It should shut down , but after 8 hours >> , I >> > found it still alive and use kill -9 process to end up it. >> > >> > Here is my RegionServer log: >> > >> > In 01:58 AM , SocketException Happen, >> > >> > >> > 1. [2017-01-02T01:58:00.469+08:00] [INFO] hdfs.DFSClient : >> > Exception in createBlockOutputStream java.net.SocketException: Too >> > many open files >> > 2. at sun.nio.ch.Net.socket0(Native Method) >> > 3. at sun.nio.ch.Net.socket(Net.java:423) >> > 4. at sun.nio.ch.Net.socket(Net.java:416) >> > 5. at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImp.java: >> > 104) >> > >> > And in 01:58 AM, RegionServer aborted itself. And began to close region. >> > >> > >> > 1. [2017-01-02T01:58:00.632+08:00] [INFO] >> > regionserver.HRegionServer : aborting server >> > HBASE-VENUS-149106.hadoop.local,16020,1482236933819 >> > 2. [2017-01-02T01:58:00.632+08:00] [INFO] >> > client.ConnectionManager$HConnectionImplementation : Closing zookeeper >> > sessionid=0x456f9b55fda457b >> > 3. [2017-01-02T01:58:00.632+08:00] [INFO] regionserver.HStore : >> Closed >> > f >> > >> > >> > 1. 2017-01-02T01:59:18.067+08:00] [INFO] >> > regionserver.HRegionServer$MovedRegionsCleaner : Chore: >> > MovedRegionsCleaner for region >> > HBASE-VENUS-149106.hadoop.local,16020,1482236933819 was stopped >> > 2. [2017-01-02T01:59:18.225+08:00] [INFO] regionserver.Replication >> > : Normal source for cluster 1: Total replicated edits: 39081044, >> > currently replicating from: >> > hdfs://venus/hbase/oldWALs/HBASE-VENUS-149106.hadoop. >> > local%2C16020%2C1482236933819.default.1483293299516 >> > at position: 0 >> > >> > >> > 1. [2017-01-02T01:59:18.225+08:00] [INFO] regionserver.Replication >> > : Sink: age in ms of last applied edit: 0, total replicated edits: >> > 160769427 >> > >> > After one Hour, It still log >> > >> > >> > 1. [2017-01-02T02:04:18.225+08:00] [INFO] regionserver.Replication >> > : Normal source for cluster 1: Total replicated edits: 39081044, >> > currently replicating from: >> > hdfs://venus/hbase/oldWALs/HBASE-VENUS-149106.hadoop. >> > local%2C16020%2C1482236933819.default.1483293299516 >> > at position: 0 >> > >> > At 8 AM >> > >> > >> > 1. [2017-01-02T08:09:18.225+08:00] [INFO] regionserver.Replication >> > : Sink: age in ms of last applied edit: 0, total replicated edits: >> > 160769427 >> > 2. [2017-01-02T08:14:18.225+08:00] [INFO] regionserver.Replication >> > : Normal source for cluster 1: Total replicated edits: 39081044, >> > currently replicating >> > >> > Is anyone can give me some tips to find it out . thanks . >> > >> > >