[ https://issues.apache.org/jira/browse/HDFS-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dinesh Chitlangia resolved HDFS-17430. -------------------------------------- Fix Version/s: 3.5.0 Resolution: Fixed > RecoveringBlock will skip no live replicas when get block recovery command. > --------------------------------------------------------------------------- > > Key: HDFS-17430 > URL: https://issues.apache.org/jira/browse/HDFS-17430 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Haiyang Hu > Assignee: Haiyang Hu > Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > RecoveringBlock maybe skip no live replicas when get block recovery command. > *Issue:* > Currently the following scenarios may lead to failure in the execution of > BlockRecoveryWorker by the datanode, resulting file being not to be closed > for a long time. > *t1.* The block_xxx_xxx has two replicas[dn1,dn2]; the dn1 machine shut down > and will be dead status, the dn2 is live status. > *t2.* Occurs block recovery. > related logs: > {code:java} > 2024-03-13 21:58:00.651 WARN hdfs.StateChange DIR* > NameSystem.internalReleaseLease: File /xxx/file has not been closed. Lease > recovery is in progress. RecoveryId = 28577373754 for block blk_xxx_xxx > {code} > *t3.* The dn2 is chosen for block recovery. > dn1 is marked as stale (is dead state) at this time, here the > recoveryLocations size is 1, currently according to the following logic, dn1 > and dn2 will be chosen to participate in block recovery. > DatanodeManager#getBlockRecoveryCommand > {code:java} > // Skip stale nodes during recovery > final List<DatanodeStorageInfo> recoveryLocations = > new ArrayList<>(storages.length); > final List<Integer> storageIdx = new ArrayList<>(storages.length); > for (int i = 0; i < storages.length; ++i) { > if (!storages[i].getDatanodeDescriptor().isStale(staleInterval)) { > recoveryLocations.add(storages[i]); > storageIdx.add(i); > } > } > ... > // If we only get 1 replica after eliminating stale nodes, choose all > // replicas for recovery and let the primary data node handle failures. > DatanodeInfo[] recoveryInfos; > if (recoveryLocations.size() > 1) { > if (recoveryLocations.size() != storages.length) { > LOG.info("Skipped stale nodes for recovery : " > + (storages.length - recoveryLocations.size())); > } > recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(recoveryLocations); > } else { > // If too many replicas are stale, then choose all replicas to > // participate in block recovery. > recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(storages); > } > {code} > {code:java} > 2024-03-13 21:58:01,425 INFO datanode.DataNode > (BlockRecoveryWorker.java:logRecoverBlock(563)) > [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - > BlockRecoveryWorker: NameNode at xxx:8040 calls > recoverBlock(BP-xxx:blk_xxx_xxx, > targets=[DatanodeInfoWithStorage[dn1:50010,null,null], > DatanodeInfoWithStorage[dn2:50010,null,null]], > newGenerationStamp=28577373754, newBlock=null, isStriped=false) > {code} > *t4.* When dn2 executes BlockRecoveryWorker#recover, it will call > initReplicaRecovery operation on dn1, however, since the dn1 machine is > currently down state at this time, it will take a very long time to timeout, > the default number of retries to establish a server connection is 45 times. > related logs: > {code:java} > 2024-03-13 21:59:31,518 INFO ipc.Client > (Client.java:handleConnectionTimeout(904)) > [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - > Retrying connect to server: dn1:8010. Already tried 0 time(s); maxRetries=45 > ... > 2024-03-13 23:05:35,295 INFO ipc.Client > (Client.java:handleConnectionTimeout(904)) > [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - > Retrying connect to server: dn2:8010. Already tried 44 time(s); maxRetries=45 > 2024-03-13 23:07:05,392 WARN protocol.InterDatanodeProtocol > (BlockRecoveryWorker.java:recover(170)) > [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - > Failed to recover block (block=BP-xxx:blk_xxx_xxx, > datanode=DatanodeInfoWithStorage[dn1:50010,null,null]) > org.apache.hadoop.net.ConnectTimeoutException: > Call From dn2 to dn1:8010 failed on socket timeout exception: > org.apache.hadoop.net.ConnectTimeoutException: 90000 millis timeout while > waiting for channel to be ready for connect.ch : > java.nio.channels.SocketChannel[connection-pending remote=dn:8010]; For more > details see: http://wiki.apache.org/hadoop/SocketTimeout > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:931) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:866) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1583) > at org.apache.hadoop.ipc.Client.call(Client.java:1511) > at org.apache.hadoop.ipc.Client.call(Client.java:1402) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:268) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:142) > at com.sun.proxy.$Proxy23.initReplicaRecovery(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolTranslatorPB.initReplicaRecovery(InterDatano > deProtocolTranslatorPB.java:83) > at > org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker.callInitReplicaRecovery(BlockRecoveryWor > ker.java:579) > at > org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker.access$400(BlockRecoveryWorker.java:57) > at > org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockReco > veryWorker.java:135) > at > org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:620) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.net.ConnectTimeoutException: 90000 millis > timeout while waiting for channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending remote=dn1:8010] > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:607) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:662) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:783) > at > org.apache.hadoop.ipc.Client$Connection.access$3900(Client.java:346) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1653) > at org.apache.hadoop.ipc.Client.call(Client.java:1449) > ... 10 more > {code} > *t5.* The user or nn sends some new recovery command within the timeout > period. > related logs: > {code:java} > 2024-03-13 22:13:01.158 WARN hdfs.StateChange DIR* > NameSystem.internalReleaseLease: File /xxx/file has not been closed. Lease > recovery is in progress. RecoveryId = 28577807097 for block blk_xxx_xxx > 2024-03-13 22:58:02.701 WARN hdfs.StateChange DIR* > NameSystem.internalReleaseLease: File /xxx/file has not been closed. Lease > recovery is in progress. RecoveryId = 28578772548 for block blk_xxx_xxx > {code} > *t6.* Current > [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] will > continue run ReplicaRecovery on dn2, but the final call to > commitBlockSynchronization fails because the current recovery ID is smaller > than the recovery ID on the nn . > {code:java} > 2024-03-13 23:07:05,401 WARN datanode.DataNode > (BlockRecoveryWorker.java:run(623)) > [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - > recover Block: RecoveringBlock{BP-xxx:blk_xxx_xxx; getBlockSize()=0; > corrupt=false; offset=-1; locs=[DatanodeInfoWithStorage[dn1:50010,null,null], > DatanodeInfoWithStorage[dn2:50010,null,null]]; cachedLocs=[]} > FAILED: {} org.apache.hadoop.ipc.RemoteException(java.io.IOException): The > recovery id 28577373754 does not match current recovery id 28578772548 for > block BP-xxx:blk_xxx_xxx > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4129) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:1184) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:310) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:34391) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:635) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:603) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1137) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1236) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1134) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2005) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3360) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1579) > at org.apache.hadoop.ipc.Client.call(Client.java:1511) > at org.apache.hadoop.ipc.Client.call(Client.java:1402) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:268) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:142) > at com.sun.proxy.$Proxy17.commitBlockSynchronization(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolClientSideTranslatorPB.java:342) > at > org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.syncBlock(BlockRecoveryWorker.java:334) > at > org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockRecoveryWorker.java:189) > at > org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:620) > at java.lang.Thread.run(Thread.java:748) > {code} > *t7.* Due to the above problems, the user or nn will continue to send some > new recovery commands, which will cause each block recovery to fail. > *t8.* The recovery operation will success until the abnormal dn1 machine is > recovered. > So we should fix this issue skip the no live replicas when building the > BlockRecoveryCommand to avoid the situation that causes recovery failure. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org