[ https://issues.apache.org/jira/browse/HDFS-11852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kihwal Lee resolved HDFS-11852. ------------------------------- Resolution: Duplicate > Under-repicated block never completes because of failure in > commitBlockSynchronization() > ---------------------------------------------------------------------------------------- > > Key: HDFS-11852 > URL: https://issues.apache.org/jira/browse/HDFS-11852 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.7.3 > Reporter: Ravi Prakash > > Credit goes to Charles Wimmer and Karthik Kumar for pointing me to this issue. > We noticed a block is holding up decommissioning because recovery failed. > (The stack trace below is from the time when the cluster was 2.7.2) . DN2 and > DN3 are no longer part of the cluster. DN1 is the node held up for > decomissioning. I checked that the block and meta file indeed are in the > finalized directory. > {code}2016-09-19 09:02:25,837 WARN > org.apache.hadoop.hdfs.server.datanode.DataNode: recoverBlocks FAILED: > RecoveringBlock{BP-<someid>:blk_1094097355_20357090; getBlockSize()=0; > corrupt=false; offset=-1; > locs=[DatanodeInfoWithStorage[<DN1>:50010,null,null], > DatanodeInfoWithStorage[<DN2>:50010,null,null], > DatanodeInfoWithStorage[<DN3>:50010,null,null]]} > org.apache.hadoop.ipc.RemoteException(java.lang.IllegalStateException): > Failed to finalize INodeFile <filename> since blocks[0] is non-complete, > where blocks=[blk_1094097355_20552508{UCState=COMMITTED, truncateBlock=null, > primaryNodeIndex=0, > replicas=[ReplicaUC[[DISK]DS-03bed13e-5cdd-4207-91b6-abd83f9eb7d3:NORMAL:<DN1>:50010|RBW]]}]. > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.assertAllBlocksComplete(INodeFile.java:222) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.toCompleteFile(INodeFile.java:209) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.finalizeINodeFileUnderConstruction(FSNamesystem.java:4218) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4457) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4419) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:837) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:291) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28768) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > at org.apache.hadoop.ipc.Client.call(Client.java:1475) > at org.apache.hadoop.ipc.Client.call(Client.java:1412) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy16.commitBlockSynchronization(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolClientSideTranslatorPB.java:312) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:2780) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2642) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.access$400(DataNode.java:243) > at > org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2519) > at java.lang.Thread.run(Thread.java:744){code} > On the namenode side > {code} > 2016-09-19 09:02:25,835 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(oldBlock=BP-<someid>:blk_1094097355_20357090, > newgenerationstamp=20552508, newlength=18642324, newtargets=[<DN1>:50010], > closeFile=true, deleteBlock=false){code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org