[ https://issues.apache.org/jira/browse/HDFS-3599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhe Zhang reassigned HDFS-3599: ------------------------------- Assignee: Zhe Zhang > Better expose when under-construction files are preventing DN decommission > -------------------------------------------------------------------------- > > Key: HDFS-3599 > URL: https://issues.apache.org/jira/browse/HDFS-3599 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode > Affects Versions: 3.0.0 > Reporter: Todd Lipcon > Assignee: Zhe Zhang > > Filing on behalf of Konstantin Olchanski: > {quote} > I have been trying to decommission a data node, but the process > stalled. I followed the correct instructions, observed my node > listed in "Decommissioning Nodes", etc, observed "Under Replicated Blocks" > decrease, etc. But the count went down to "1" and the decommissin process > stalled. > There was no visible activity anywhere, nothing was happening (well, > maybe in some hidden log file somewhere something complained, > but I did not look). > It turns out that I had some files stuck in "OPENFORWRITE" mode, > as reported by "hdfs fsck / -openforwrite -files -blocks -locations -racks": > {code} > /users/trinat/data/.fuse_hidden0000177e00000002 0 bytes, 0 block(s), > OPENFORWRITE: OK > /users/trinat/data/.fuse_hidden0000178d00000003 0 bytes, 0 block(s), > OPENFORWRITE: OK > /users/trinat/data/.fuse_hidden00001da300000004 0 bytes, 1 block(s), > OPENFORWRITE: OK > 0. > BP-88378204-142.90.119.126-1340494203431:blk_6980480609696383665_20259{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[142.90.111.72:50010|RBW], > ReplicaUnderConstruction[142.90.119.162:50010|RBW], > ReplicaUnderConstruction[142.90.119.126:50010|RBW]]} len=0 repl=3 > [/detfac/142.90.111.72:50010, /isac2/142.90.119.162:50010, > /isac2/142.90.119.126:50010] > {code} > After I deleted those files, the decommission process completed successfully. > Perhaps one can add some visible indication somewhere on the HDFS status web > page > that the decommission process is stalled and maybe report why it is stalled? > Maybe the number of "OPENFORWRITE" files should be listed on the status page > next to the "Number of Under-Replicated Blocks"? (Since I know that nobody is > writing > to my HDFS, the non-zero count would give me a clue that something is wrong). > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)