[ 
https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963398#comment-16963398
 ] 

Miklos Szegedi commented on YARN-9863:
--------------------------------------

[~belugabehr], thank you for the feedback. I did some end to end tests for 
replication of files of a few gigabytes in 2017. The way HDFS does this is that 
it copies the file first to one data node. Once the replication is set, it 
starts streaming over full duplex lines based on my results, so no data node 
requires more that 1 connection. The final replication count should be 
proportional to the node count, so that connections are not limited, when 
localizing, in fact in some cases data local mapping may help. I do not 
remember well but I used an API to check the current replica count to wait for. 
I can look it up, if you are interested in the details.


 [~snemeth], do you think this feature is required?

> Randomize List of Resources to Localize
> ---------------------------------------
>
>                 Key: YARN-9863
>                 URL: https://issues.apache.org/jira/browse/YARN-9863
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Minor
>         Attachments: YARN-9863.1.patch, YARN-9863.2.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java
> Add a new parameter to {{LocalResourceBuilder}} that allows the list of 
> resources to be shuffled randomly.  This will allow the Localizer to spread 
> the load of requests so that not all of the NodeManagers are requesting to 
> localize the same files, in the same order, from the same DataNodes,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to