[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-06-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339962#comment-15339962
 ] 

Hudson commented on YARN-4958:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #9986 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9986/])
YARN-4958. The file localization process should allow for wildcards to (sjlee: 
rev 5107a967fa2558deba11c33a326d4d2e5748f452)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java


> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Fix For: 2.9.0
>
> Attachments: YARN-4958.001.patch, YARN-4958.002.patch, 
> YARN-4958.003.patch, YARN-4958.004.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-06-20 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339917#comment-15339917
 ] 

Daniel Templeton commented on YARN-4958:


I think it's safe for 2.9.

> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Fix For: 2.9.0
>
> Attachments: YARN-4958.001.patch, YARN-4958.002.patch, 
> YARN-4958.003.patch, YARN-4958.004.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-06-20 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339913#comment-15339913
 ] 

Daniel Templeton commented on YARN-4958:


I filed HADOOP-13296 to deal with the {{Path}} changes.

> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Fix For: 2.9.0
>
> Attachments: YARN-4958.001.patch, YARN-4958.002.patch, 
> YARN-4958.003.patch, YARN-4958.004.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-06-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339890#comment-15339890
 ] 

Sangjin Lee commented on YARN-4958:
---

Thanks for the update [~templedf]! The latest patch looks good to me. I'll 
commit it shortly and look at MAPREDUCE-6719 after that. Should this go into 
trunk and 2.9.0?

> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4958.001.patch, YARN-4958.002.patch, 
> YARN-4958.003.patch, YARN-4958.004.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-06-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339669#comment-15339669
 ] 

Hadoop QA commented on YARN-4958:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
56s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 52s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 30m 44s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.queuing.TestQueuingContainerManager
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:e2f6409 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12811828/YARN-4958.004.patch |
| JIRA Issue | YARN-4958 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux f7999c69fd98 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / fc6b50c |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/12080/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-YARN-Build/12080/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/12080/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Bu

[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-06-10 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325298#comment-15325298
 ] 

Sangjin Lee commented on YARN-4958:
---

{quote}
Strictly speaking this should be two separate JIRAs, but I don't think anyone 
is that fussy about it. I've seen plenty of patches that touch more than one 
project. I've submitted several myself that touched common and HDFS.
{quote}
OK, although it'd be ideal to have two JIRAs (YARN to enable support for 
wildcards in container launch context and MAPREDUCE to take advantage of it), 
it might be good to move it to MAPREDUCE at least. The majority of the changes 
are really in MAPREDUCE. What do you think?

> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4958.001.patch, YARN-4958.002.patch, 
> YARN-4958.003.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-06-10 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325272#comment-15325272
 ] 

Daniel Templeton commented on YARN-4958:


No worries.  Thanks for the review, [~sjlee0].

On point #1, I had the same thought with the same conclusion.  Since it's an 
issue that already exists for directories, I don't see where this patch really 
changes anything.  In the majority of cases, this patch will be used via 
-libjars, in which case there is no issue.  In the remaining cases, the 
behavior is clearly documented in the javadoc, so I should think that's OK.

On point #2, I haven't actually tested it, but it should work as one would 
expect: each subdir should be linked from the working dir.

#3 is a good point.  I think I did, but I don't remember anymore.  I'll give 
that a go and let you know.

I'll post a patch with fixes for #4 and #5.

Strictly speaking this should be two separate JIRAs, but I don't think anyone 
is that fussy about it.  I've seen plenty of patches that touch more than one 
project.  I've submitted several myself that touched common and HDFS.

> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4958.001.patch, YARN-4958.002.patch, 
> YARN-4958.003.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-06-10 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325157#comment-15325157
 ] 

Sangjin Lee commented on YARN-4958:
---

Sorry [~templedf] it took me a while to get back to this.

I finally got around to trying out the patch with a pseudo-distributed setup. I 
can confirm that the main use cases seem to work correctly, including the case 
of a non-jar resource in the staging directory. That said, I do have some high 
level comments as well as a couple of minor nits.

(1)
Regarding the decision of determining the public-ness solely based on the 
parent directory in the case of the wildcard, I'm wondering whether that would 
have any implications. It's probably not going to be common, but it is possible 
that the directory is public but there may be files that are not readable by 
others. Again, it's hard to imagine why one would do this, but if they did, 
would it cause a security issue on localization or a localization failure? 
Should we chalk that up to an unsupported setting? To be fair, I can see this 
being an issue if a directory was specified (not a wildcard), too. In that 
sense, we could say this is a manifestation of an existing issue... Thoughts?

(2)
With {{ContainerExecutor.java}}, what happens if the wildcarded directory has 
further nested directories? It appears we're symlinking only at the immediate 
child level. I suspect it would work correctly, but wanted to double check.

(3)
Were you able to test it in the local job mode?

(4) {{ClientDistributedCacheManager.java}}
- l.303: change {{System.out.println()}} to a logger logging statement

(5) {{DistributedCache.java}}
- l.295: typo: "it's" -> "its"

Finally, since most of the changes are really in MAPREDUCE, perhaps this JIRA 
should be moved to the MAPREDUCE project. What do you think? If we really want 
to follow the rules to the letter, we would need to create separate JIRAs for 
all projects involved (HADOOP, YARN, and MAPREDUCE). I'd like to hear what you 
think. On a related note, you may want to drop the changes to {{Path.java}} if 
you can help it.


> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4958.001.patch, YARN-4958.002.patch, 
> YARN-4958.003.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

--

[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-06-01 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311300#comment-15311300
 ] 

Daniel Templeton commented on YARN-4958:


[~sjlee0], I did test that non-JAR files work, but I did it by proxy.  I 
submitted a regular file through -libjars and confirmed that the regular file 
was in the task's classpath.  This patch does not change the way that the 
classpath is constructed.

> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4958.001.patch, YARN-4958.002.patch, 
> YARN-4958.003.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-05-27 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304675#comment-15304675
 ] 

Sangjin Lee commented on YARN-4958:
---

Sorry [~templedf], haven't had cycles to get back to this one. I'll definitely 
review the latest patch next week.

Just to get data from you, I assume that you tested for cases such as non-jar 
entries in libjars, etc. Could you kindly confirm?

> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4958.001.patch, YARN-4958.002.patch, 
> YARN-4958.003.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-05-26 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302955#comment-15302955
 ] 

Gera Shegalov commented on YARN-4958:
-

Hi [~templedf], no particular comment other than there is that workaround that 
can achieve  it with the what i was suggesting in  HADOOP-12747, or 
programmatically, but it would be nice if it can be done in a more obvious way.

> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4958.001.patch, YARN-4958.002.patch, 
> YARN-4958.003.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-05-17 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15287735#comment-15287735
 ] 

Daniel Templeton commented on YARN-4958:


The test failures are unrelated.  I filed MAPREDUCE-6702 to track.

> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4958.001.patch, YARN-4958.002.patch, 
> YARN-4958.003.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-05-17 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15287486#comment-15287486
 ] 

Daniel Templeton commented on YARN-4958:


Looks like I need to dig into some of the unit test failures.

> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4958.001.patch, YARN-4958.002.patch, 
> YARN-4958.003.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-05-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15287467#comment-15287467
 ] 

Hadoop QA commented on YARN-4958:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 48s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
45s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 33s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 49s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
56s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 49s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 13m 0s {color} 
| {color:red} root generated 23 new + 671 unchanged - 0 fixed = 694 total (was 
671) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 49s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 46s 
{color} | {color:red} root: patch generated 1 new + 228 unchanged - 18 fixed = 
229 total (was 246) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 1s 
{color} | {color:green} 
hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core 
generated 0 new + 2508 unchanged - 1 fixed = 2508 total (was 2509) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s 
{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} hadoop-mapreduce-client-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} hadoop-mapreduce-client-jobclient in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 32s 
{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 12m 35s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 48s 
{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 

[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285958#comment-15285958
 ] 

Hadoop QA commented on YARN-4958:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 12s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 20s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
9s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 4s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 8m 49s {color} 
| {color:red} root-jdk1.8.0_91 with JDK v1.8.0_91 generated 23 new + 672 
unchanged - 0 fixed = 695 total (was 672) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 5s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 15m 54s 
{color} | {color:red} root-jdk1.7.0_95 with JDK v1.7.0_95 generated 23 new + 
681 unchanged - 0 fixed = 704 total (was 681) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
26s {color} | {color:green} root: patch generated 0 new + 229 unchanged - 17 
fixed = 229 total (was 246) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
6s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 7s 
{color} | {color:green} 
hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core-jdk1.8.0_91
 with JDK v1.8.0_91 generated 0 new + 2508 unchanged - 1 fixed = 2508 total 
(was 2509) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_91. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_91. {color} |
| {color:green}+1{

[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-05-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282095#comment-15282095
 ] 

Hadoop QA commented on YARN-4958:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 5s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 2s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 50s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
5s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
54s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 36s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 33s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 9m 7s {color} 
| {color:red} root-jdk1.8.0_91 with JDK v1.8.0_91 generated 24 new + 671 
unchanged - 0 fixed = 695 total (was 671) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 2s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 16m 9s {color} 
| {color:red} root-jdk1.7.0_95 with JDK v1.7.0_95 generated 24 new + 680 
unchanged - 0 fixed = 704 total (was 680) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 2s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 26s 
{color} | {color:red} root: patch generated 4 new + 233 unchanged - 16 fixed = 
237 total (was 249) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
2s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 14s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 9s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_91. {colo

[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-04-26 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258754#comment-15258754
 ] 

Daniel Templeton commented on YARN-4958:


bq. l.242-243: Would it be simpler to reset current to the parent directory and 
simply invoke ancestorsHaveExecutePermissions() on it instead? Then, 
getFileStatus doesn't need to change, and the stat cache would also have only 
real paths (i.e. no "*" paths). Thoughts?

I can't as easily do the path manipulation above the level of 
{{getFileStatus()}} because we're dealing with a URI above that.  The net 
result of the current code, though, is that the wildcard path is swapped with 
its parent inside {{getFileStatus()}}, and only the parent directory is added 
to the stat cache.  Looks like there was a bug there, though, that caused a 
wildcard path to always have a cache miss.

bq. l.323-329: Would there be a case where there can be multiple attempts for 
the same directory? Is it for the case both "dir" and "dir/*" are included in 
cache.files? I'm not sure if you're addressing a new concern or an existing one.

It's both.  The current behavior is to quietly overwrite a duplicate link.  
That seems like a bad idea.

l.338-341: Why would there be a wildcard for the paths (which come from 
mapreduce.job.classpath.files)?

Good point.  I got a little overzealous. :)  I'll remove that part in the next 
patch.

> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4958.001.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-04-26 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258526#comment-15258526
 ] 

Daniel Templeton commented on YARN-4958:


Oh, whoops.  That was the intended behavior, but it looks like I clobbered it 
accidentally.  I'll add that to the next patch.

> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4958.001.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-04-26 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258491#comment-15258491
 ] 

Daniel Templeton commented on YARN-4958:


bq. Just to be sure, foo.xml does appear explicitly in the task's final 
classpath (i.e. in the container launch script), correct?

Yes.  With this patch, I split the definition of the classpath and the cache 
list.  The classpath remains the same as it was before, i.e. all non-archives 
added via -libjar.  Only the cache list gets replaced with tmpjars/*.

> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4958.001.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-04-26 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258473#comment-15258473
 ] 

Sangjin Lee commented on YARN-4958:
---

Just to be sure, {{foo.xml}} does appear explicitly in the task's final 
classpath (i.e. in the container launch script), correct?

bq. What won't work is if I, as a user, add libs/* to the archives and expect 
it to also put the non-jars into the classpath.
Understood. And that's fine. Thanks!

> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4958.001.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-04-26 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258426#comment-15258426
 ] 

Daniel Templeton commented on YARN-4958:


bq. Actually there is more to it.

Got it.  That scenario works.  The dist cache will explicitly add the 
non-archive to the classpath, and the wildcard will make sure it gets linked on 
the NM.

What won't work is if I, as a user, add libs/\* to the archives and expect it 
to also put the non-jars into the classpath.

> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4958.001.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-04-26 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258346#comment-15258346
 ] 

Sangjin Lee commented on YARN-4958:
---

{quote}
Depends of the definition of correctly.  I defined * similar to your definition 
of * from HADOOP-12747: all JARs go in the classpath. This patch also links all 
non-JARs from the working directory, but they are not added to the classpath. I 
think that behavior is more consistent with HADOOP-12747 than adding everything 
to the classpath.
{quote}

Actually there is more to it. HADOOP-12747 is orthogonal to this.

Suppose the user specified
{noformat}
-libjars lib/foo.xml
{noformat}

The intent is to upload that file to the staging directory and make it part of 
the task classpath. So, (1) that file should be uploaded to the staging/libjars 
directory (and localized of course), and (2) more importantly it should be made 
part of the task classpath. As we noted, {{libjars/*}} will not pick up this 
file. Therefore the task classpath should look something like

{noformat}
libjars/*:libjars/foo.xml:...
{noformat}

In other words, non-jar entries in the libjar directory must be explicitly 
enumerated. I believe this is what the {{addToClassPathIfNonJar()}} is all 
about. This should preserve this behavior. I suspect the current patch does 
that, but was wondering if you could confirm it with a test.

bq. Didn't notice that one. What would be the best behavior? Report the 
aggregate file size?
The ideal behavior would be the aggregate size under the directory, but there 
would be complexity. Also, given that this is an existing issue, I'm fine with 
filing a separate JIRA to discuss and address it.

> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4958.001.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-04-26 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258150#comment-15258150
 ] 

Daniel Templeton commented on YARN-4958:


Thanks for the review, [~sjlee0]!  I'm working on unit tests and cleaning up 
the code right now.  I hope to have an updated patch by tomorrow.

bq. does this work correctly if we're dealing with a non-jar entry in the 
staging libjars directory?

Depends of the definition of correctly. :)  I defined * similar to your 
definition of * from HADOOP-12747: all JARs go in the classpath.  This patch 
also links all non-JARs from the working directory, but they are not added to 
the classpath.  I think that behavior is more consistent with HADOOP-12747 than 
adding everything to the classpath.

bq. Would there be any cross-platform issues?

Good question.  I was careful to keep the changes agnostic, but who knows.  
Probably worth testing.

bq. This is just noting that the size of a wildcard entry (as in 
mapreduce.job.cache.files.filesizes) would be reported as 0.

Didn't notice that one.  What would be the best behavior?  Report the aggregate 
file size?

> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4958.001.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-04-25 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257160#comment-15257160
 ] 

Sangjin Lee commented on YARN-4958:
---

Thanks for your proposal [~templedf]! This is an important improvement. I took 
a look at the patch, and have some early feedback. I haven't had a chance to 
run the patch against a cluster yet, however.

A quick note on the shared cache: I don't think this patch will break the 
shared cache. This patch goes to work mostly after the shared cache has done 
its part. The only thing is that jobs with a set of resources that are heavily 
drawn from the shared cache would not benefit from this patch as there would be 
few classpath files that will be coming from the staging directory. But that's 
for later...

Also, I think HADOOP-12747 is largely orthogonal. It merely gives users a 
shorthand to address a set of jars.

More questions and comments:
- We'll need unit tests for this.
- I suppose a test will quickly confirm this, but does this work correctly if 
we're dealing with a non-jar entry in the staging libjars directory? I just 
wanted to confirm that it gets added to the container-side classpath 
explicitly. The java classpath of "\*" does not include non-jar resources in 
the directory.
- Would there be any cross-platform issues? Have you had a chance to test it on 
Windows specifically? At first glance, there was nothing obvious that might be 
a platform-specific issue, but it would be good to double check.
- This is just noting that the size of a wildcard entry (as in 
{{mapreduce.job.cache.files.filesizes}}) would be reported as 0. This is an 
existing behavior/issue with a directory entry.

(ClientDistributedCacheManager.java)
- l.242-243: Would it be simpler to reset {{current}} to the parent directory 
and simply invoke {{ancestorsHaveExecutePermissions()}} on it instead? Then, 
{{getFileStatus}} doesn't need to change, and the stat cache would also have 
only real paths (i.e. no "*" paths). Thoughts?

(MRApps.java)
- l.323-329: Would there be a case where there can be multiple attempts for the 
same directory? Is it for the case both "dir" and "dir/*" are included in 
cache.files? I'm not sure if you're addressing a new concern or an existing one.
- l.338-341: Why would there be a wildcard for the paths (which come from 
{{mapreduce.job.classpath.files}})?

(ContainerExecutor.java)
- This would apply to *any* entries that have the wildcard, and would effect 
things like {{PWD/*}} too?


> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4958.001.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and local

[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-04-21 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251934#comment-15251934
 ] 

Daniel Templeton commented on YARN-4958:


On closer examination, HADOOP-12747 may be solving a slightly different 
problem.  I believe it's solving the problem of reducing the end user pain in 
specifying a large number of JARs in -libjar.  This JIRA is solving the problem 
of reducing the state store impact of specifying a large number of JARs in 
-libjar.  Based on looking at the implementations, the two JIRAs are related 
but orthogonal.

> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4958.001.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-04-21 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251925#comment-15251925
 ] 

Daniel Templeton commented on YARN-4958:


[~jira.shegalov], I just noticed your comments on HADOOP-12747.  I'd be curious 
to have your opinion on this one.

> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4958.001.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the full directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "\*.jar" or "file\*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.  Specifically, this JIRA only applies 
> when a full directory is uploaded.  Currently the shared cache does not 
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of 
> all file verification and localization.  In the final step, the NM will query 
> the localized directory to get a list of the files in "dir" such that each 
> can be linked from the job's working directory.  Since $PWD/\* is always 
> included on the classpath, all JAR files in "dir" will be in the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store

2016-04-21 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251911#comment-15251911
 ] 

Daniel Templeton commented on YARN-4958:


This JIRA is in the same space as HADOOP-12747.  It's solving the same problem 
in a completely different way, with different side-effects.  I think there is 
room for and value in both JIRAs.

> The file localization process should allow for wildcards to reduce the 
> application footprint in the state store
> ---
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4958.001.patch
>
>
> When using the -libjars option to add classes to the classpath, every library 
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local 
> resources even though they're all uploaded to the same directory in HDFS.  
> When using tools like Crunch without an uber JAR or when trying to take 
> advantage of the shared cache, the number of libraries can be quite large.  
> We've seen many cases where we had to turn down the max number of 
> applications to prevent ZK from running out of heap because of the size of 
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the 
> NM allow wildcards in the resource localization paths.  Specifically, we 
> propose to allow a path to have a final component (name) set to "*", which is 
> interpreted by the NM as "download the fell directory and link to every file 
> in it from the job's working directory."  This behavior is the same as the 
> current behavior when using -libjars, but avoids explicitly listing every 
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as 
> "*.jar" or "file*", as having multiple entries for a single directory 
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache.  That 
> work will be left to a future JIRA.
> This JIRA proposes to allow for wildcards both in the internal processing of 
> the -libjars switch and in paths added through the {{Job}} and 
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/*", as "dir" for purposes of 
> all file verification.  In the final step, the NM will query the localized 
> directory to get a list of the files in "dir" such that each can be linked 
> from the job's working directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)