[ https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15287735#comment-15287735 ]
Daniel Templeton commented on YARN-4958: ---------------------------------------- The test failures are unrelated. I filed MAPREDUCE-6702 to track. > The file localization process should allow for wildcards to reduce the > application footprint in the state store > --------------------------------------------------------------------------------------------------------------- > > Key: YARN-4958 > URL: https://issues.apache.org/jira/browse/YARN-4958 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager > Affects Versions: 2.8.0 > Reporter: Daniel Templeton > Assignee: Daniel Templeton > Priority: Critical > Attachments: YARN-4958.001.patch, YARN-4958.002.patch, > YARN-4958.003.patch > > > When using the -libjars option to add classes to the classpath, every library > so added is explicitly listed in the {{ContainerLaunchContext}}'s local > resources even though they're all uploaded to the same directory in HDFS. > When using tools like Crunch without an uber JAR or when trying to take > advantage of the shared cache, the number of libraries can be quite large. > We've seen many cases where we had to turn down the max number of > applications to prevent ZK from running out of heap because of the size of > the state store entries. > Rather than listing all files independently, this JIRA proposes to have the > NM allow wildcards in the resource localization paths. Specifically, we > propose to allow a path to have a final component (name) set to "*", which is > interpreted by the NM as "download the full directory and link to every file > in it from the job's working directory." This behavior is the same as the > current behavior when using -libjars, but avoids explicitly listing every > file. > This JIRA does not attempt to provide more general purpose wildcards, such as > "\*.jar" or "file\*", as having multiple entries for a single directory > presents numerous logistical issues. > This JIRA also does not attempt to integrate with the shared cache. That > work will be left to a future JIRA. Specifically, this JIRA only applies > when a full directory is uploaded. Currently the shared cache does not > handle directory uploads. > This JIRA proposes to allow for wildcards both in the internal processing of > the -libjars switch and in paths added through the {{Job}} and > {{DistributedCache}} classes. > The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of > all file verification and localization. In the final step, the NM will query > the localized directory to get a list of the files in "dir" such that each > can be linked from the job's working directory. Since $PWD/\* is always > included on the classpath, all JAR files in "dir" will be in the classpath. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org