[ 
https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949707#comment-16949707
 ] 

Miklos Szegedi edited comment on YARN-9863 at 10/11/19 6:39 PM:
----------------------------------------------------------------

[~belugabehr], thank you for the patch. Could you explain the motivation for 
this change a bit more?

AFAIK the order is better to be decided in HDFS. Also, once you use a random 
number, why do not you use SecureRandom? My third question is whether 
localization is running in parallel in which case the order does not matter so 
much.

All in all my experience with YARN and localization suggests that if you have a 
bottleneck on HDFS, you would rather just do a suitable replica increase in 
HDFS even if it is temporary. HDFS is much better in doing replicas for 
localization, since it can do streaming avoiding any bottlenecks. Then the 
localization goes to the local instance, making it practically painless.


was (Author: szegedim):
[~belugabehr], could you explain the motivation for this change a bit more?

AFAIK the order is better to be decided in HDFS. Also, once you use a random 
number, why do not you use SecureRandom? My third question is whether 
localization is running in parallel in which case the order does not matter so 
much.

All in all my experience with YARN and localization suggests that if you have a 
bottleneck on HDFS, you would rather just do a suitable replica increase in 
HDFS even if it is temporary. HDFS is much better in doing replicas for 
localization, since it can do streaming avoiding any bottlenecks. Then the 
localization goes to the local instance, making it practically painless.

> Randomize List of Resources to Localize
> ---------------------------------------
>
>                 Key: YARN-9863
>                 URL: https://issues.apache.org/jira/browse/YARN-9863
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Minor
>         Attachments: YARN-9863.1.patch, YARN-9863.2.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java
> Add a new parameter to {{LocalResourceBuilder}} that allows the list of 
> resources to be shuffled randomly.  This will allow the Localizer to spread 
> the load of requests so that not all of the NodeManagers are requesting to 
> localize the same files, in the same order, from the same DataNodes,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to