[ 
https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16324545#comment-16324545
 ] 

Jason Lowe commented on YARN-2185:
----------------------------------

Thanks for updating the patch!

bq. I was wondering, if it should be better for OS specific archives containing 
symlinks for example.

symlinks is a good example of why it is important to use {{tar}} on Linux.  
However if we're going to use OS tools to support ZIP files properly then we 
should not be using the {{jar}} command.  {{jar}} does not properly support 
symlinks in a ZIP archive but {{unzip}} does.  To be honest I don't see the 
appeal of using the jar command for anything since I do not see how it is going 
to do anything different than the JDK builtin support since it is part of the 
JDK.  As with the parallel copying and timestamp ignore support, I think it may 
be best to have this patch focus on using pipes rather than also changing the 
semantics of what tools are used to unpack the archives.  We can discuss what 
should be done with jars and zips in another JIRA.

Speaking of symbolic links, now that all supported releases are on JDK7 or 
later, we can update unTarUsingJava to fix the lacking symlink support.  I 
filed HADOOP-15170 to track that.

The failure on the path containing a single quote seems overly paranoid.  
Single quotes can be escaped in the POSIX shell code path per my previous 
comment, e.g.: 
{code}
  filename.replace("'","'\\''")
{code}

Rather than do the subshell with an extra {{cd}}, could it be simplified to 
just {{ tar -C }}?

It doesn't look like the following comment was addressed as runCommandOnStream 
is still creating an executor sometimes yet does not clean it up in those 
cases.  The Javadoc should also be updated to document the deadlock potential.
{quote}
I think we should shutdown the executor service if we created it, otherwise we 
risk running out of threads before the garbage collector gets around to 
noticing it can free everything up.  Either that or require the caller to 
provide the executor.
{quote}

Speaking of deadlock, if debug logging is not enabled then this can deadlock in 
the manner I described above.  That's one of the reasons why I think leveraging 
Shell isn't a terrible idea.  Wielding ProcessBuilder and Process properly is 
trickier than most people believe.  It also side-steps a lot of the executor 
management stuff.  I'm OK if you still think it's better to roll our own here.

"parallelVerifyAndCopy" is no longer parallel so the name and javadoc is 
misleading.  It also does not throw InterruptedException nor ExecutionException.

It's weird to allow the user to specify a downloader thread count yet we setup 
and teardown the executor for every unpack call and the unpack call is only 
going to use two threads.  Essentially it shouldn't be a config since any value 
but 2 is either wrong or wasteful.

Speaking of executors, would it makes more sense for FSDownload to be able to 
leverage an existing executor rather than creating its own?  All of the 
existing usages of FSDownload are using an executor of one form or another, so 
it'd be nice not to have to keep building and tearing these down for every call.

I'm still confuse about the recursive destination directory delete that was 
added.  It was not there before, and I don't think it's appropriate here.  What 
are we worried is going to be there, and if somehow something is there, are we 
sure it's OK to just blindly remove it?

destionationTmp s/b destinationTmp

FSDownloadException is no longer needed.


> Use pipes when localizing archives
> ----------------------------------
>
>                 Key: YARN-2185
>                 URL: https://issues.apache.org/jira/browse/YARN-2185
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 2.4.0
>            Reporter: Jason Lowe
>            Assignee: Miklos Szegedi
>         Attachments: YARN-2185.000.patch, YARN-2185.001.patch, 
> YARN-2185.002.patch, YARN-2185.003.patch, YARN-2185.004.patch
>
>
> Currently the nodemanager downloads an archive to a local file, unpacks it, 
> and then removes it.  It would be more efficient to stream the data as it's 
> being unpacked to avoid both the extra disk space requirements and the 
> additional disk activity from storing the archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to