I think i found a similar bug report that matches your symptom: HDFS-12204 <https://issues.apache.org/jira/browse/HDFS-12204> (Dfsclient Do not close file descriptor when using shortcircuit)
On Wed, May 29, 2019 at 11:37 PM Kang Minwoo <[email protected]> wrote: > I think these file opened for reads. because that block is finalized. > > --- > ls -al /proc/regionserver_pid/fd > 902 -> /data_path/current/finalized/~/blk_1 (deleted) > 946 -> /data_path/current/finalized/~/blk_2 (deleted) > 947 -> /data_path/current/finalized/~/blk_3.meta (deleted) > --- > > I think it is not an HBase bug. This is because DFSClient checks stale fd > when the fetch method invoked. > > Best regards, > Minwoo Kang > > ________________________________________ > 보낸 사람: Wei-Chiu Chuang <[email protected]> > 보낸 날짜: 2019년 5월 29일 수요일 20:51 > 받는 사람: [email protected] > 제목: Re: Disk hot swap for data node while hbase use short-circuit > > Do you have a list of files that was being opened? I'd like to know if > those are files opened for writes or for reads. > > If you are on the more recent version of Hadoop (2.8.0 and above), > there's a HDFS command to interrupt ongoing writes to DataNodes (HDFS-9945 > <https://issues.apache.org/jira/browse/HDFS-9945>) > > > https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfsadmin > hdfs dfsadmin -evictWriters > > Looking at HDFS hotswap implementation, it looks like DataNode doesn't > interrupt writers when a volume is removed. That sounds like a bug. > > On Tue, May 28, 2019 at 9:39 PM Kang Minwoo <[email protected]> > wrote: > > > Hello, Users. > > > > I use JBOD for data node. Some times the disk in the data node has a > > problem. > > > > The first time, I shut down all instance include data node and region > > server in the machine that has a disk problem. > > But It is not a good solution. So I improve the process. > > > > When I detect disk problem in the server. I just perform disk hot swap. > > > > But System administrator complains of some FD that still open so they > > cannot remove the disk. > > Regionserver has an FD, I use short circuit reads feature. (HBase version > > 1.2.9) > > > > When we first met this issue, we force unmount disk and remount. > > But after this process, kernel report error[1]. > > > > So we avoid this issue. purge stale FD. > > > > I think this issue is common. > > I want to know how hbase-users deal with this issue. > > > > Thank you very much for sharing your experience. > > > > Best regards, > > Minwoo Kang > > > > [1]: > > > https://www.thegeekdiary.com/xfs_log_force-error-5-returned-xfs-error-centos-rhel-7/ > > >
