[ https://issues.apache.org/jira/browse/YARN-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784708#comment-13784708 ]
Omkar Vinit Joshi commented on YARN-1219: ----------------------------------------- bq. I didn't see anywhere in code to treat the ".tmp" file differently. If you know please let me know. If the original author only used a suffix to make sure the name is different than the original file name, it doesn't seem to be worth it to add an unnecessary and error-prone rename operations just to keep the temporary file name suffix. No we are not adding new just moving them around. from unpack to here..Ideally that rename code should have been present here only. I remember we had a bug to remove that .tmp file. But I think it is fine we can go ahead with this patch. As it will not break anything else. > FSDownload changes file suffix making FileUtil.unTar() throw exception > ---------------------------------------------------------------------- > > Key: YARN-1219 > URL: https://issues.apache.org/jira/browse/YARN-1219 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 3.0.0, 2.1.1-beta, 2.1.2-beta > Reporter: shanyu zhao > Assignee: shanyu zhao > Fix For: 2.1.2-beta > > Attachments: YARN-1219.patch > > > While running a Hive join operation on Yarn, I saw exception as described > below. This is caused by FSDownload copy the files into a temp file and > change the suffix into ".tmp" before unpacking it. In unpack(), it uses > FileUtil.unTar() which will determine if the file is "gzipped" by looking at > the file suffix: > {code} > boolean gzipped = inFile.toString().endsWith("gz"); > {code} > To fix this problem, we can remove the ".tmp" in the temp file name. > Here is the detailed exception: > org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:240) > at org.apache.hadoop.fs.FileUtil.unTarUsingJava(FileUtil.java:676) > at org.apache.hadoop.fs.FileUtil.unTar(FileUtil.java:625) > at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:203) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:287) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.1#6144)