[ 
https://issues.apache.org/jira/browse/HDFS-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Chitlangia resolved HDFS-17430.
--------------------------------------
    Fix Version/s: 3.5.0
       Resolution: Fixed

> RecoveringBlock will skip no live replicas when get block recovery command.
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-17430
>                 URL: https://issues.apache.org/jira/browse/HDFS-17430
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haiyang Hu
>            Assignee: Haiyang Hu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.5.0
>
>
> RecoveringBlock maybe skip no live replicas when get block recovery command.
> *Issue:*
> Currently the following scenarios may lead to failure in the execution of 
> BlockRecoveryWorker by the datanode, resulting file being not to be closed 
> for a long time.
> *t1.*  The block_xxx_xxx has two replicas[dn1,dn2]; the dn1 machine shut down 
> and will be dead status, the dn2 is live status.
> *t2.* Occurs block recovery.
> related logs:
> {code:java}
> 2024-03-13 21:58:00.651 WARN hdfs.StateChange DIR* 
> NameSystem.internalReleaseLease: File /xxx/file has not been closed. Lease 
> recovery is in progress. RecoveryId = 28577373754 for block blk_xxx_xxx
> {code}
> *t3.*  The dn2 is chosen for block recovery.
> dn1 is marked as stale (is dead state) at this time, here the 
> recoveryLocations size is 1, currently according to the following logic, dn1 
> and dn2 will be chosen to participate in block recovery.
> DatanodeManager#getBlockRecoveryCommand
> {code:java}
>    // Skip stale nodes during recovery
>      final List<DatanodeStorageInfo> recoveryLocations =
>          new ArrayList<>(storages.length);
>      final List<Integer> storageIdx = new ArrayList<>(storages.length);
>      for (int i = 0; i < storages.length; ++i) {
>        if (!storages[i].getDatanodeDescriptor().isStale(staleInterval)) {
>          recoveryLocations.add(storages[i]);
>          storageIdx.add(i);
>        }
>      }
>      ...
>      // If we only get 1 replica after eliminating stale nodes, choose all
>      // replicas for recovery and let the primary data node handle failures.
>      DatanodeInfo[] recoveryInfos;
>      if (recoveryLocations.size() > 1) {
>        if (recoveryLocations.size() != storages.length) {
>          LOG.info("Skipped stale nodes for recovery : "
>              + (storages.length - recoveryLocations.size()));
>        }
>        recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(recoveryLocations);
>      } else {
>        // If too many replicas are stale, then choose all replicas to
>        // participate in block recovery.
>        recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(storages);
>      }
> {code}
> {code:java}
> 2024-03-13 21:58:01,425 INFO  datanode.DataNode 
> (BlockRecoveryWorker.java:logRecoverBlock(563))
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
> BlockRecoveryWorker: NameNode at xxx:8040 calls 
> recoverBlock(BP-xxx:blk_xxx_xxx, 
> targets=[DatanodeInfoWithStorage[dn1:50010,null,null], 
> DatanodeInfoWithStorage[dn2:50010,null,null]], 
> newGenerationStamp=28577373754, newBlock=null, isStriped=false)
> {code}
> *t4.* When dn2 executes BlockRecoveryWorker#recover, it will call 
> initReplicaRecovery operation on dn1, however, since the dn1 machine is 
> currently down state at this time, it will take a very long time to timeout,  
> the default number of retries to establish a server connection is 45 times.
> related logs:
> {code:java}
> 2024-03-13 21:59:31,518 INFO  ipc.Client 
> (Client.java:handleConnectionTimeout(904)) 
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
> Retrying connect to server: dn1:8010. Already tried 0 time(s); maxRetries=45
> ...
> 2024-03-13 23:05:35,295 INFO  ipc.Client 
> (Client.java:handleConnectionTimeout(904)) 
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
> Retrying connect to server: dn2:8010. Already tried 44 time(s); maxRetries=45
> 2024-03-13 23:07:05,392 WARN  protocol.InterDatanodeProtocol 
> (BlockRecoveryWorker.java:recover(170)) 
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
> Failed to recover block (block=BP-xxx:blk_xxx_xxx, 
> datanode=DatanodeInfoWithStorage[dn1:50010,null,null]) 
> org.apache.hadoop.net.ConnectTimeoutException:
> Call From dn2 to dn1:8010 failed on socket timeout exception: 
> org.apache.hadoop.net.ConnectTimeoutException: 90000 millis timeout while 
> waiting for channel to be ready for connect.ch : 
> java.nio.channels.SocketChannel[connection-pending remote=dn:8010]; For more 
> details see:  http://wiki.apache.org/hadoop/SocketTimeout
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:931)
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:866)
>         at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1583)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1511)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1402)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:268)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:142)
>         at com.sun.proxy.$Proxy23.initReplicaRecovery(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolTranslatorPB.initReplicaRecovery(InterDatano
> deProtocolTranslatorPB.java:83)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker.callInitReplicaRecovery(BlockRecoveryWor
> ker.java:579)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker.access$400(BlockRecoveryWorker.java:57)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockReco
> veryWorker.java:135)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:620)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.net.ConnectTimeoutException: 90000 millis 
> timeout while waiting for channel to be ready for connect. ch : 
> java.nio.channels.SocketChannel[connection-pending remote=dn1:8010]
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:607)
>         at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:662)
>         at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:783)
>         at 
> org.apache.hadoop.ipc.Client$Connection.access$3900(Client.java:346)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1653)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1449)
>         ... 10 more
> {code}
> *t5.*  The user or nn sends some new recovery command within the timeout 
> period.
> related logs:
> {code:java}
> 2024-03-13 22:13:01.158 WARN hdfs.StateChange DIR* 
> NameSystem.internalReleaseLease: File /xxx/file has not been closed. Lease 
> recovery is in progress. RecoveryId = 28577807097 for block blk_xxx_xxx
> 2024-03-13 22:58:02.701 WARN hdfs.StateChange DIR* 
> NameSystem.internalReleaseLease: File /xxx/file has not been closed. Lease 
> recovery is in progress. RecoveryId = 28578772548 for block blk_xxx_xxx
> {code}
> *t6.* Current 
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] will 
> continue run ReplicaRecovery on dn2, but the final call to 
> commitBlockSynchronization fails because the current recovery ID is smaller 
> than the recovery ID on the nn .
> {code:java}
> 2024-03-13 23:07:05,401 WARN  datanode.DataNode 
> (BlockRecoveryWorker.java:run(623)) 
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
> recover Block: RecoveringBlock{BP-xxx:blk_xxx_xxx; getBlockSize()=0; 
> corrupt=false; offset=-1; locs=[DatanodeInfoWithStorage[dn1:50010,null,null], 
> DatanodeInfoWithStorage[dn2:50010,null,null]]; cachedLocs=[]}
> FAILED: {} org.apache.hadoop.ipc.RemoteException(java.io.IOException): The 
> recovery id 28577373754 does not match current recovery id 28578772548 for 
> block BP-xxx:blk_xxx_xxx
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4129)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:1184)
>         at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:310)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:34391)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:635)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:603)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:587)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1137)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1236)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1134)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2005)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3360)
>         at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1579)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1511)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1402)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:268)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:142)
>         at com.sun.proxy.$Proxy17.commitBlockSynchronization(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolClientSideTranslatorPB.java:342)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.syncBlock(BlockRecoveryWorker.java:334)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockRecoveryWorker.java:189)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:620)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
> *t7.* Due to the above problems, the user or nn will continue to send some 
> new recovery commands, which will cause each block recovery to fail.
> *t8.* The recovery operation will success until the abnormal dn1 machine is 
> recovered.
> So we should fix this issue skip the no live replicas when building the 
> BlockRecoveryCommand to avoid the situation that causes recovery failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to