[jira] [Commented] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources

2019-08-15 Thread wangchengwei (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908644#comment-16908644
 ] 

wangchengwei commented on YARN-9616:


[~wzzdreamer] it seems I can't upload my patch...

> Shared Cache Manager Failed To Upload Unpacked Resources
> 
>
> Key: YARN-9616
> URL: https://issues.apache.org/jira/browse/YARN-9616
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.3, 2.9.2, 2.8.5
>Reporter: zhenzhao wang
>Assignee: zhenzhao wang
>Priority: Major
> Attachments: YARN-9616.001-2.9.patch
>
>
> Yarn will unpack archives files and some other files based on the file type 
> and configuration. E.g. 
>  If I started an MR job with -archive one.zip, then the one.zip will be 
> unpacked while download. Let's say there're file1 && file2 inside one.zip. 
> Then the files kept on local disk will be like 
> /disk3/yarn/local/filecache/352/one.zip/file1 
> and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache 
> uploader couldn't upload one.zip to shared cache as it was removed during 
> localization. The following errors will be thrown.
> {code:java}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader:
>  Exception while uploading the file dict.zip
> java.io.FileNotFoundException: File 
> /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources

2019-08-05 Thread zhenzhao wang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900507#comment-16900507
 ] 

zhenzhao wang commented on YARN-9616:
-

[~smarthan] Sorry, I missed the msg. I got a patch which works well in our 
cluster internally. However, I hadn't got a chance to sort it out and 
contribute to the public repo. I uploaded the  [^YARN-9616.001-2.9.patch]  for 
reference. Feel free to share your patch. Thanks.

> Shared Cache Manager Failed To Upload Unpacked Resources
> 
>
> Key: YARN-9616
> URL: https://issues.apache.org/jira/browse/YARN-9616
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.3, 2.9.2, 2.8.5
>Reporter: zhenzhao wang
>Assignee: zhenzhao wang
>Priority: Major
> Attachments: YARN-9616.001-2.9.patch
>
>
> Yarn will unpack archives files and some other files based on the file type 
> and configuration. E.g. 
>  If I started an MR job with -archive one.zip, then the one.zip will be 
> unpacked while download. Let's say there're file1 && file2 inside one.zip. 
> Then the files kept on local disk will be like 
> /disk3/yarn/local/filecache/352/one.zip/file1 
> and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache 
> uploader couldn't upload one.zip to shared cache as it was removed during 
> localization. The following errors will be thrown.
> {code:java}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader:
>  Exception while uploading the file dict.zip
> java.io.FileNotFoundException: File 
> /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources

2019-08-05 Thread wangchengwei (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899837#comment-16899837
 ] 

wangchengwei commented on YARN-9616:


Hi, [~wzzdreamer] ,  I have figured out a solution to this issue.

What caused this issue is that the packed arhcive files would be unpacked by NM 
automatically after localization, then the _SharedCacheUploader_ coludn't find 
the origin archive files and threw _FileNotFoundException_. This issue wolud 
lead to the packed archive files wolud never upload to share cache, and would 
be uploaded and localized again and again. 

All origin resource files wolud be upload to a hdfs path (staging or  specified 
by user) before job submitted, so all resource files cloud be found at hdfs. As 
the origin resource files of packed archives cloud not found in NM, we cloud 
get these files from their hdfs path rather than NM local path. So the solution 
to this issue is:
 # *check whether the resource is packed archive before upload*
 #  *if not,  uploaded it from NM local path*
 # *if yes, copied origin file in hdfs to the shared cache path*

The solution colud solve this issue in my tests.  I submit the patch here, 
please review it if possible. 

 

> Shared Cache Manager Failed To Upload Unpacked Resources
> 
>
> Key: YARN-9616
> URL: https://issues.apache.org/jira/browse/YARN-9616
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.3, 2.9.2, 2.8.5
>Reporter: zhenzhao wang
>Assignee: zhenzhao wang
>Priority: Major
>
> Yarn will unpack archives files and some other files based on the file type 
> and configuration. E.g. 
>  If I started an MR job with -archive one.zip, then the one.zip will be 
> unpacked while download. Let's say there're file1 && file2 inside one.zip. 
> Then the files kept on local disk will be like 
> /disk3/yarn/local/filecache/352/one.zip/file1 
> and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache 
> uploader couldn't upload one.zip to shared cache as it was removed during 
> localization. The following errors will be thrown.
> {code:java}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader:
>  Exception while uploading the file dict.zip
> java.io.FileNotFoundException: File 
> /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources

2019-07-31 Thread smarthan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896854#comment-16896854
 ] 

smarthan commented on YARN-9616:


Hi, is there any solution to this issue now ?

I merged Yarn Shared Cache patch to 2.6.0-cdh5.10.0 and had seen this issue 
too. I think the [YARN-6097|https://issues.apache.org/jira/browse/YARN-6097] 
which try to support cache directories in shared cache can't  solve this issue, 
as it seems not able to match the  original archive file with the unpacked 
directory in shared cache by checksum before job submitted. The archive files 
which were packed would be uploaded to the staging directoty of job again and 
again. 

I have tried to upload the packed archives before job submitted, but the 
read-only permissions of shared cache directory  to common users brought new 
problem.

Please let me know if there is any available solution to this. Thanks!

> Shared Cache Manager Failed To Upload Unpacked Resources
> 
>
> Key: YARN-9616
> URL: https://issues.apache.org/jira/browse/YARN-9616
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.3, 2.9.2, 2.8.5
>Reporter: zhenzhao wang
>Assignee: zhenzhao wang
>Priority: Major
>
> Yarn will unpack archives files and some other files based on the file type 
> and configuration. E.g. 
>  If I started an MR job with -archive one.zip, then the one.zip will be 
> unpacked while download. Let's say there're file1 && file2 inside one.zip. 
> Then the files kept on local disk will be like 
> /disk3/yarn/local/filecache/352/one.zip/file1 
> and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache 
> uploader couldn't upload one.zip to shared cache as it was removed during 
> localization. The following errors will be thrown.
> {code:java}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader:
>  Exception while uploading the file dict.zip
> java.io.FileNotFoundException: File 
> /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources

2019-06-10 Thread zhenzhao wang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860417#comment-16860417
 ] 

zhenzhao wang commented on YARN-9616:
-

I had seen this issue in 2.9 and 2.6. More check is needed to identify the 
problem in the latest version.

> Shared Cache Manager Failed To Upload Unpacked Resources
> 
>
> Key: YARN-9616
> URL: https://issues.apache.org/jira/browse/YARN-9616
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.3, 2.9.2, 2.8.5
>Reporter: zhenzhao wang
>Assignee: zhenzhao wang
>Priority: Major
>
> Yarn will unpack archives files and some other files based on the file type 
> and configuration. E.g. 
>  If I started an MR job with -archive one.zip, then the one.zip will be 
> unpacked while download. Let's say there're file1 && file2 inside one.zip. 
> Then the files kept on local disk will be like 
> /disk3/yarn/local/filecache/352/one.zip/file1 
> and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache 
> uploader couldn't upload one.zip to shared cache as it was removed during 
> localization. The following errors will be thrown.
> {code:java}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader:
>  Exception while uploading the file dict.zip
> java.io.FileNotFoundException: File 
> /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org