[jira] [Commented] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources
[ https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908644#comment-16908644 ] wangchengwei commented on YARN-9616: [~wzzdreamer] it seems I can't upload my patch... > Shared Cache Manager Failed To Upload Unpacked Resources > > > Key: YARN-9616 > URL: https://issues.apache.org/jira/browse/YARN-9616 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.3, 2.9.2, 2.8.5 >Reporter: zhenzhao wang >Assignee: zhenzhao wang >Priority: Major > Attachments: YARN-9616.001-2.9.patch > > > Yarn will unpack archives files and some other files based on the file type > and configuration. E.g. > If I started an MR job with -archive one.zip, then the one.zip will be > unpacked while download. Let's say there're file1 && file2 inside one.zip. > Then the files kept on local disk will be like > /disk3/yarn/local/filecache/352/one.zip/file1 > and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache > uploader couldn't upload one.zip to shared cache as it was removed during > localization. The following errors will be thrown. > {code:java} > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader: > Exception while uploading the file dict.zip > java.io.FileNotFoundException: File > /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources
[ https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900507#comment-16900507 ] zhenzhao wang commented on YARN-9616: - [~smarthan] Sorry, I missed the msg. I got a patch which works well in our cluster internally. However, I hadn't got a chance to sort it out and contribute to the public repo. I uploaded the [^YARN-9616.001-2.9.patch] for reference. Feel free to share your patch. Thanks. > Shared Cache Manager Failed To Upload Unpacked Resources > > > Key: YARN-9616 > URL: https://issues.apache.org/jira/browse/YARN-9616 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.3, 2.9.2, 2.8.5 >Reporter: zhenzhao wang >Assignee: zhenzhao wang >Priority: Major > Attachments: YARN-9616.001-2.9.patch > > > Yarn will unpack archives files and some other files based on the file type > and configuration. E.g. > If I started an MR job with -archive one.zip, then the one.zip will be > unpacked while download. Let's say there're file1 && file2 inside one.zip. > Then the files kept on local disk will be like > /disk3/yarn/local/filecache/352/one.zip/file1 > and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache > uploader couldn't upload one.zip to shared cache as it was removed during > localization. The following errors will be thrown. > {code:java} > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader: > Exception while uploading the file dict.zip > java.io.FileNotFoundException: File > /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources
[ https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899837#comment-16899837 ] wangchengwei commented on YARN-9616: Hi, [~wzzdreamer] , I have figured out a solution to this issue. What caused this issue is that the packed arhcive files would be unpacked by NM automatically after localization, then the _SharedCacheUploader_ coludn't find the origin archive files and threw _FileNotFoundException_. This issue wolud lead to the packed archive files wolud never upload to share cache, and would be uploaded and localized again and again. All origin resource files wolud be upload to a hdfs path (staging or specified by user) before job submitted, so all resource files cloud be found at hdfs. As the origin resource files of packed archives cloud not found in NM, we cloud get these files from their hdfs path rather than NM local path. So the solution to this issue is: # *check whether the resource is packed archive before upload* # *if not, uploaded it from NM local path* # *if yes, copied origin file in hdfs to the shared cache path* The solution colud solve this issue in my tests. I submit the patch here, please review it if possible. > Shared Cache Manager Failed To Upload Unpacked Resources > > > Key: YARN-9616 > URL: https://issues.apache.org/jira/browse/YARN-9616 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.3, 2.9.2, 2.8.5 >Reporter: zhenzhao wang >Assignee: zhenzhao wang >Priority: Major > > Yarn will unpack archives files and some other files based on the file type > and configuration. E.g. > If I started an MR job with -archive one.zip, then the one.zip will be > unpacked while download. Let's say there're file1 && file2 inside one.zip. > Then the files kept on local disk will be like > /disk3/yarn/local/filecache/352/one.zip/file1 > and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache > uploader couldn't upload one.zip to shared cache as it was removed during > localization. The following errors will be thrown. > {code:java} > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader: > Exception while uploading the file dict.zip > java.io.FileNotFoundException: File > /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources
[ https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896854#comment-16896854 ] smarthan commented on YARN-9616: Hi, is there any solution to this issue now ? I merged Yarn Shared Cache patch to 2.6.0-cdh5.10.0 and had seen this issue too. I think the [YARN-6097|https://issues.apache.org/jira/browse/YARN-6097] which try to support cache directories in shared cache can't solve this issue, as it seems not able to match the original archive file with the unpacked directory in shared cache by checksum before job submitted. The archive files which were packed would be uploaded to the staging directoty of job again and again. I have tried to upload the packed archives before job submitted, but the read-only permissions of shared cache directory to common users brought new problem. Please let me know if there is any available solution to this. Thanks! > Shared Cache Manager Failed To Upload Unpacked Resources > > > Key: YARN-9616 > URL: https://issues.apache.org/jira/browse/YARN-9616 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.3, 2.9.2, 2.8.5 >Reporter: zhenzhao wang >Assignee: zhenzhao wang >Priority: Major > > Yarn will unpack archives files and some other files based on the file type > and configuration. E.g. > If I started an MR job with -archive one.zip, then the one.zip will be > unpacked while download. Let's say there're file1 && file2 inside one.zip. > Then the files kept on local disk will be like > /disk3/yarn/local/filecache/352/one.zip/file1 > and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache > uploader couldn't upload one.zip to shared cache as it was removed during > localization. The following errors will be thrown. > {code:java} > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader: > Exception while uploading the file dict.zip > java.io.FileNotFoundException: File > /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources
[ https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860417#comment-16860417 ] zhenzhao wang commented on YARN-9616: - I had seen this issue in 2.9 and 2.6. More check is needed to identify the problem in the latest version. > Shared Cache Manager Failed To Upload Unpacked Resources > > > Key: YARN-9616 > URL: https://issues.apache.org/jira/browse/YARN-9616 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.3, 2.9.2, 2.8.5 >Reporter: zhenzhao wang >Assignee: zhenzhao wang >Priority: Major > > Yarn will unpack archives files and some other files based on the file type > and configuration. E.g. > If I started an MR job with -archive one.zip, then the one.zip will be > unpacked while download. Let's say there're file1 && file2 inside one.zip. > Then the files kept on local disk will be like > /disk3/yarn/local/filecache/352/one.zip/file1 > and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache > uploader couldn't upload one.zip to shared cache as it was removed during > localization. The following errors will be thrown. > {code:java} > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader: > Exception while uploading the file dict.zip > java.io.FileNotFoundException: File > /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org