[ https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258473#comment-15258473 ]
Sangjin Lee commented on YARN-4958: ----------------------------------- Just to be sure, {{foo.xml}} does appear explicitly in the task's final classpath (i.e. in the container launch script), correct? bq. What won't work is if I, as a user, add libs/* to the archives and expect it to also put the non-jars into the classpath. Understood. And that's fine. Thanks! > The file localization process should allow for wildcards to reduce the > application footprint in the state store > --------------------------------------------------------------------------------------------------------------- > > Key: YARN-4958 > URL: https://issues.apache.org/jira/browse/YARN-4958 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager > Affects Versions: 2.8.0 > Reporter: Daniel Templeton > Assignee: Daniel Templeton > Priority: Critical > Attachments: YARN-4958.001.patch > > > When using the -libjars option to add classes to the classpath, every library > so added is explicitly listed in the {{ContainerLaunchContext}}'s local > resources even though they're all uploaded to the same directory in HDFS. > When using tools like Crunch without an uber JAR or when trying to take > advantage of the shared cache, the number of libraries can be quite large. > We've seen many cases where we had to turn down the max number of > applications to prevent ZK from running out of heap because of the size of > the state store entries. > Rather than listing all files independently, this JIRA proposes to have the > NM allow wildcards in the resource localization paths. Specifically, we > propose to allow a path to have a final component (name) set to "*", which is > interpreted by the NM as "download the full directory and link to every file > in it from the job's working directory." This behavior is the same as the > current behavior when using -libjars, but avoids explicitly listing every > file. > This JIRA does not attempt to provide more general purpose wildcards, such as > "\*.jar" or "file\*", as having multiple entries for a single directory > presents numerous logistical issues. > This JIRA also does not attempt to integrate with the shared cache. That > work will be left to a future JIRA. Specifically, this JIRA only applies > when a full directory is uploaded. Currently the shared cache does not > handle directory uploads. > This JIRA proposes to allow for wildcards both in the internal processing of > the -libjars switch and in paths added through the {{Job}} and > {{DistributedCache}} classes. > The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of > all file verification and localization. In the final step, the NM will query > the localized directory to get a list of the files in "dir" such that each > can be linked from the job's working directory. Since $PWD/\* is always > included on the classpath, all JAR files in "dir" will be in the classpath. -- This message was sent by Atlassian JIRA (v6.3.4#6332)