[ https://issues.apache.org/jira/browse/HDFS-17090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17743790#comment-17743790 ]
Xiaoqiao He commented on HDFS-17090: ------------------------------------ We mark `DatanodeProtocol#registerDatanode` interface as 'Idempotent' now. However the implement of registerDataNode is not idempotent actually, such as `blockReportCount` will be reset to 0 always, which will affect other logic. So maybe another way is that mark #registerDataNode `AtMostOnce` and use RetryCache to cover this corner case. > Decommission will be stuck for long time when restart because overlapped > process Register and BlockReport. > ---------------------------------------------------------------------------------------------------------- > > Key: HDFS-17090 > URL: https://issues.apache.org/jira/browse/HDFS-17090 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Reporter: Xiaoqiao He > Assignee: Xiaoqiao He > Priority: Major > > I met one corner case recently, which decommission DataNode impact > performance of NameNode. After dig carefully, I have reproduced this case. > a. Add some DataNodes to exclude and prepare to decommission this Datanodes. > b. Execute bin/hdfs dfsadmin -refresh (This is optional step). > c. Restart NameNode for upgrade or other reason before complete to > decommission. > d. All DataNodes will be trigger to register and FBR. > e. Considering that the load of NameNode will be very high, especially 8040 > CallQueue will be full for a long time because RPC flood about > register/heartbeat/FBR from DataNodes. > f. For one decommission in-progress node, it will not complete to > decommission until next FBR even all replicas of this node has been > processed, because the request order register-heartbeat-(blockreport, > register), and the second register could be one retry RPC request from > DataNode (No more log information from DataNode to confirm), and for > (blockreport, register), NameNode could process one storage then process > register then process remaining storages in order. > g. Because the second register RPC, the related DataNodes will be marked > unhealthy by BlockManager#isNodeHealthyForDecommissionOrMaintenance. So > decommission will be stuck for long time until next FBR. Thus NameNode need > to scan this DataNode at every round to check if could complete which hold > the global write lock and impact performance of NameNode. > To improve it, I think we could filter the repeated register RPC request at > startup progress. Not think carefully if it will involve other risks when > filter register directly. Welcome anymore discussions. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org