[ https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16302123#comment-16302123 ]
Miklos Szegedi edited comment on YARN-2185 at 12/23/17 12:10 AM: ----------------------------------------------------------------- Attaching my suggestion how to solve this. The code streams HDFS as standard input to the tar and gzip commands. It handles Windows as well. As an addition I create the temporary directory with permissions 700 instead of 755. I do not create any additional temporary directories for extraction, one is enough. A difference is that I use jar command for zips as well, so that it handles Windows properly. Also I added an additional switch to be able to disable the modification time check specifying -1 as the timestamp. I also do parallel copy for directory localization to leverage the distributed storage in HDFS. was (Author: miklos.szeg...@cloudera.com): Attaching my suggestion how to solve this. The code streams HDFS as standard input to the tar and gzip commands. It handles Windows as well. As an addition I create temporary files with permissions 700 instead of 755. I do not create any additional temporary directories for extraction, one is enough. A difference is that I use jar command for zips as well, so that it handles Windows properly. Also I added an additional switch to be able to disable the modification time check specifying -1 as the timestamp. I also do parallel copy for directory localization to leverage the distributed storage in HDFS. > Use pipes when localizing archives > ---------------------------------- > > Key: YARN-2185 > URL: https://issues.apache.org/jira/browse/YARN-2185 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager > Affects Versions: 2.4.0 > Reporter: Jason Lowe > Assignee: Miklos Szegedi > Attachments: YARN-2185.000.patch > > > Currently the nodemanager downloads an archive to a local file, unpacks it, > and then removes it. It would be more efficient to stream the data as it's > being unpacked to avoid both the extra disk space requirements and the > additional disk activity from storing the archive. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org