[ https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963398#comment-16963398 ]
Miklos Szegedi commented on YARN-9863: -------------------------------------- [~belugabehr], thank you for the feedback. I did some end to end tests for replication of files of a few gigabytes in 2017. The way HDFS does this is that it copies the file first to one data node. Once the replication is set, it starts streaming over full duplex lines based on my results, so no data node requires more that 1 connection. The final replication count should be proportional to the node count, so that connections are not limited, when localizing, in fact in some cases data local mapping may help. I do not remember well but I used an API to check the current replica count to wait for. I can look it up, if you are interested in the details. [~snemeth], do you think this feature is required? > Randomize List of Resources to Localize > --------------------------------------- > > Key: YARN-9863 > URL: https://issues.apache.org/jira/browse/YARN-9863 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager > Reporter: David Mollitor > Assignee: David Mollitor > Priority: Minor > Attachments: YARN-9863.1.patch, YARN-9863.2.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java > Add a new parameter to {{LocalResourceBuilder}} that allows the list of > resources to be shuffled randomly. This will allow the Localizer to spread > the load of requests so that not all of the NodeManagers are requesting to > localize the same files, in the same order, from the same DataNodes, -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org