[ 
https://issues.apache.org/jira/browse/HBASE-18454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145229#comment-16145229
 ] 

HanRyong,Jung edited comment on HBASE-18454 at 8/29/17 1:03 PM:
----------------------------------------------------------------

RefCount of Hdfs ShortCircuitReplica has an initial value of 2
That's because one is ShortCircuitCache, and one is HDFS BlockReaderLocal.
The problem I found here is that both hdfs and hbase need to be modified.
First, the ShortCircuitCacheCleaner of hdfs-client reports only the expireTime 
to purge(delete) the cache
However, ShortCircuitReplica has a Slot and I need the code to Pugrge (delete) 
it via Slot.
Secondly, It is lazy to check the status of HDFS client BlockReaderLocal in 
hbase.
So even if you purged the cache in ShortCircuitCacheCleaner, the refCount of 
the hdfs client is fixed to 1 if there is no access to the hfile.
I need to periodically check and close BlockReaderLocal on the HDFS client in 
Hbase.

I have added the following code to ShortCircuitCacheCleaner to solve this 
problem.
This solution is only available in hbase and is a very temporary fix.
https://issues.apache.org/jira/browse/HDFS-12204?focusedCommentId=16145244&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16145244


was (Author: dragonboy):
RefCount of Hdfs ShortCircuitReplica has an initial value of 2
That's because one is ShortCircuitCache, and one is HDFS BlockReaderLocal.
The problem I found here is that both hdfs and hbase need to be modified.
First, the ShortCircuitCacheCleaner of hdfs-client reports only the expireTime 
to purge(delete) the cache
However, ShortCircuitReplica has a Slot and I need the code to Pugrge (delete) 
it via Slot.
Secondly, It is lazy to check the status of HDFS client BlockReaderLocal in 
hbase.
So even if you purged the cache in ShortCircuitCacheCleaner, the refCount of 
the hdfs client is fixed to 1 if there is no access to the hfile.
I need to periodically check and close BlockReaderLocal on the HDFS client in 
Hbase.

I have added the following code to ShortCircuitCacheCleaner to solve this 
problem.
This solution is only available in hbase and is a very temporary fix.

> RegionServer Do not close file descriptor when using shortcircuit
> -----------------------------------------------------------------
>
>                 Key: HBASE-18454
>                 URL: https://issues.apache.org/jira/browse/HBASE-18454
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 1.2.6
>         Environment: HDFS 2.7.3, HBASE 1.2.6, centOS 6.8
>            Reporter: HanRyong,Jung
>
> I am a user using HDFS 2.7.3, HBASE 1.2.6, centOS 6.8.
> The regionserver uses 11 hard disks(jbod) and uses the hbase short circuit.
> At this time, when one disk fails in HDFS, I found a phenomenon that I did a 
> hotswap but did not close file descriptor in hbase.
> And the fd path on the umount disk is changed to an incorrect path.
> If I check /proc/regionserver_pid/fd, if I used /data1/volumn and umounted 
> data1, the path changed to /volumn.
> And many file descriptors used in shortcircuit are in the delete state.
> example ) 
> ls -al /proc/regionserver_pid/fd 
> lr-x------ 1 dragonboy dragonboy 64 2017-07-26 20:54 946 -> 
> /data8/volumn/hdfs/datanode/current/BP-199986352-10.114.243.73-1490077615453/current/finalized/subdir111/subdir21/blk_1215239490
>  (deleted)
> lr-x------ 1 dragonboy dragonboy 64 2017-07-26 20:54 947 -> 
> /data8/volumn/hdfs/datanode/current/BP-199986352-10.114.243.73-1490077615453/current/finalized/subdir111/subdir21/blk_1215239490_141511919.meta
>  (deleted)
> lr-x------ 1 dragonboy dragonboy 64 2017-07-26 20:54 948 -> 
> /data7/volumn/hdfs/datanode/current/BP-199986352-10.114.243.73-1490077615453/current/finalized/subdir111/subdir27/blk_1215241080
>  (deleted)
> lr-x------ 1 dragonboy dragonboy 64 2017-07-26 20:54 949 -> 
> /data7/volumn/hdfs/datanode/current/BP-199986352-10.114.243.73-1490077615453/current/finalized/subdir111/subdir27/blk_1215241080_141513509.meta
>  (deleted)
> lr-x------ 1 dragonboy dragonboy 64 2017-07-26 20:54 902 -> 
> */volumn/hdfs/datanode/current/BP-199986352-10.114.243.73-1490077615453/current/finalized/subdir244/subdir160/blk_1257545757
>  (deleted)*
>                                                      .
>                                                      .
>                                                      .
>                                                      .
>                                                      
> when data4 fails, execute fuser)
> /sbin/fuser -cu /data4
> Cannot stat file /proc/regionserver_pid/fd/*192*: input/output error
> Cannot stat file /proc/regionserver_pid/fd/1282: input/output error
> Cannot stat file /proc/regionserver_pid/fd/1283: input/output error
>                                                      .
>                                                      .
>                                                      .
>                                                      .
>                                                      .
>                                                      



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to