[ 
https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319446#comment-16319446
 ] 

Miklos Szegedi commented on YARN-2185:
--------------------------------------

Thank you for the review, [~jlowe] and [~grepas].

bq. This patch adds parallel copying of directories and ability to ignore 
timestamps which IMHO is outside the scope of this patch. I think those should 
be handled in separate JIRAs with appropriate unit tests.
I created YARN-7713 and YARN-7712 for these two ideas respectively.
bq. Do we really want a while loop to pull off extensions? That's not how it 
behaved before, and it will now do different, potentially unexpected, things 
for a ".jar.gz", ".tgz.tgz", ".gz.jar", etc. Granted these are unlikely 
filenames to encounter, but if somehow a user had those before and were 
working, this will change that behavior.
I agree, let's be more conservative and do not change the existing behavior.
bq. Does it make sense to use an external process to unpack .jar and .zip 
files? The original code did this via the JDK directly rather than requiring a 
subprocess, and given there's a ZipInputStream and JarInputStream, it seems we 
could do the same here with the input stream rather than requiring the overhead 
of a subprocess and copying of data between streams.
I was wondering, if it should be better for OS specific archives containing 
symlinks for example. Here is what I suggest. I do not like the idea to use 
different method for tar and zip. On the other hand we do not have enough 
information to know, how many users would actually work better with OS toolset. 
I am not in favor of too many config knobs either but in this case I created 
one. The administrator can decide whether to rely on OS tools altogether or 
java itself. Calling out jar indeed does not make much sense, since it does the 
same what we would do in the child process. I think this solution is much 
cleaner than arbitrarily use tar in Linux but rely on Java for jars.
bq. This now logs an INFO message for all stderr and stdout for the subprocess
I addressed this concern.
bq. Wouldn't "cd destpath && tar -xf" be better than "cd destpath; tar -xf"? 
Otherwise if somehow the change directory fails it will end up unpacking to an 
unexpected place.
I just followed the existing pattern. I agree, let’s be more resilient and I 
will change to && as you suggested.
bq. Is the "rm -rf" necessary?
It does not hurt. I replaced with a delete that just returns, if the directory 
does not exist
bq. Has this been tested on Windows? The zipOrJar path appears to be eligible 
for Windows but uses shell syntax that cmd is not going to understand, like 
semicolons between commands.
Good catch, the latest code uses the Java path now for Windows. However, I have 
limited ability to test there.
bq. I wonder if it would be better to add a method to Shell that takes an input 
stream so we could reuse all the subprocess handling code there.
My usage of ProcessBuilder seems very simple I am hesitant to overload the 
shell executor with this functionality. This helps with backports as well.

I think I addressed most other comments in the next patch that I will submit 
soon.

> Use pipes when localizing archives
> ----------------------------------
>
>                 Key: YARN-2185
>                 URL: https://issues.apache.org/jira/browse/YARN-2185
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 2.4.0
>            Reporter: Jason Lowe
>            Assignee: Miklos Szegedi
>         Attachments: YARN-2185.000.patch, YARN-2185.001.patch, 
> YARN-2185.002.patch
>
>
> Currently the nodemanager downloads an archive to a local file, unpacks it, 
> and then removes it.  It would be more efficient to stream the data as it's 
> being unpacked to avoid both the extra disk space requirements and the 
> additional disk activity from storing the archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to