[ https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949769#comment-16949769 ]
David Mollitor commented on YARN-9863: -------------------------------------- [~szegedim] Thank you for your feedback. The background here is that I am working with a large cluster that has one job in particular that is crushing it. This one job is required to localize many resources, of varying file sizes, for the job to complete. As I understand YARN, when a job is submitted to the cluster, a list of files to localize is sent to each NodeManager involved in the job. In this case, all nodes are involved. All NodeManagers receive a carbon copy of the list of files from the ResourceManager (or maybe it's the 'yarn' client?). That is, they all have the same list, with the same ordering. The NodeManager then iterate through the list and request that each file be localized. So, it would seem to me that all of the NodeManagers would request from HDFS file1, file2, file3, ... This would have a stampeding affect on the HDFS DataNodes. I am familiar with {{mapreduce.client.submit.file.replication}}. I understand that this is used to pump-up the replication of the submitted files so that they are available on more DataNodes. However, the way that it works, as I understand it, is that the file is first written to the HDFS cluster with the default replication (usually 3), and then the client requests that the file be replicated up to the final size in a separate request (setrep). This replication process happens asynchronously. If the {{mapreduce.client.submit.file.replication}} is set to 10, for example, the job may be submitted and finished before the file actually achieves a final replication of 10. This becomes exacerbated on larger clusters. If a cluster has 1,000 nodes, the recommended value of {{mapreduce.client.submit.file.replication}} is sqrt(1000) or ~32. The default number of connections each DataNode can support is 10 ({{dfs.datanode.handler.count}}). So, even if the desired replication is achieved, that is 32 x 10 connections = 320 connections supported at once. In a cluster with 1,000 nodes, that is going to stall. By simply randomizing the list, the load can be spread across many sets of 32 nodes and better support this scenario. For your questions: # I'm not sure how HDFS would manage this. The requests are generated by the NodeManagers and the HDFS cluster is simply serving. They have no way to randomize the requests. # SecureRandom. This is not a secure operation. It only requires a fast and pretty-good randomization of the list to spread the load # I believe that the parallel nature of the localization is configurable with {{yarn.nodemanager.localizer.fetch.thread-count}} (default 4), but I believe that the requests are submitted to a work-queue in order, so there will still be some level of trampling, especially if there are more than 4 files to localize (as is this case with the scenario I am reviewing) > Randomize List of Resources to Localize > --------------------------------------- > > Key: YARN-9863 > URL: https://issues.apache.org/jira/browse/YARN-9863 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager > Reporter: David Mollitor > Assignee: David Mollitor > Priority: Minor > Attachments: YARN-9863.1.patch, YARN-9863.2.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java > Add a new parameter to {{LocalResourceBuilder}} that allows the list of > resources to be shuffled randomly. This will allow the Localizer to spread > the load of requests so that not all of the NodeManagers are requesting to > localize the same files, in the same order, from the same DataNodes, -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org