[jira] [Commented] (YARN-5727) Improve YARN shared cache support for LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578769#comment-16578769 ] Chris Trezzo commented on YARN-5727: Assigning to [~wzzdreamer]. He will start working on a new draft for the design doc. > Improve YARN shared cache support for LinuxContainerExecutor > > > Key: YARN-5727 > URL: https://issues.apache.org/jira/browse/YARN-5727 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: zhenzhao wang >Priority: Major > Attachments: YARN-5727-Design-v1.pdf > > > When running LinuxContainerExecutor in a secure mode > ({{yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users}} set > to {{false}}), all localized files are owned by the user that owns the > container which localized the resource. This presents a problem for the > shared cache when a YARN application requests a resource to be uploaded to > the shared cache that has a non-public visibility. The shared cache uploader > (running as the node manager user) does not have access to the localized > files and can not compute the checksum of the file or upload it to the cache. > The solution should ideally satisfy the following three requirements: > # Localized files should still be safe/secure. Other users that run > containers should not be able to modify, or delete the publicly localized > files of others. > # The node manager user should be able to access these files for the purpose > of checksumming and uploading to the shared cache without being a privileged > user. > # The solution should avoid making unnecessary copies of the localized files. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-5727) Improve YARN shared cache support for LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo reassigned YARN-5727: -- Assignee: zhenzhao wang > Improve YARN shared cache support for LinuxContainerExecutor > > > Key: YARN-5727 > URL: https://issues.apache.org/jira/browse/YARN-5727 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: zhenzhao wang >Priority: Major > Attachments: YARN-5727-Design-v1.pdf > > > When running LinuxContainerExecutor in a secure mode > ({{yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users}} set > to {{false}}), all localized files are owned by the user that owns the > container which localized the resource. This presents a problem for the > shared cache when a YARN application requests a resource to be uploaded to > the shared cache that has a non-public visibility. The shared cache uploader > (running as the node manager user) does not have access to the localized > files and can not compute the checksum of the file or upload it to the cache. > The solution should ideally satisfy the following three requirements: > # Localized files should still be safe/secure. Other users that run > containers should not be able to modify, or delete the publicly localized > files of others. > # The node manager user should be able to access these files for the purpose > of checksumming and uploading to the shared cache without being a privileged > user. > # The solution should avoid making unnecessary copies of the localized files. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5727) Improve YARN shared cache support for LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578701#comment-16578701 ] Chris Trezzo commented on YARN-5727: As stated above, I no longer think that the v1 design attached is the right idea. It takes the approach of assuming that the permissions issue is a problem at the YARN layer. In fact, I think this is more a problem with the way map reduce supports the shared cache. Currently the shared cache only supports public resources (i.e. all resources uploaded to the shared cache will be world readable). The problem is that MapReduce is localizing all of the job resources into the user cache instead of a public one. YARN is then put in a place where it is essentially changing permissions for some resources. Ideally, MapReduce would initially localize resources intended for the shared cache into the public cache to begin with. This would allow the shared cache uploader to checksum and upload the resources even in the case of linux container executor. > Improve YARN shared cache support for LinuxContainerExecutor > > > Key: YARN-5727 > URL: https://issues.apache.org/jira/browse/YARN-5727 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Priority: Major > Attachments: YARN-5727-Design-v1.pdf > > > When running LinuxContainerExecutor in a secure mode > ({{yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users}} set > to {{false}}), all localized files are owned by the user that owns the > container which localized the resource. This presents a problem for the > shared cache when a YARN application requests a resource to be uploaded to > the shared cache that has a non-public visibility. The shared cache uploader > (running as the node manager user) does not have access to the localized > files and can not compute the checksum of the file or upload it to the cache. > The solution should ideally satisfy the following three requirements: > # Localized files should still be safe/secure. Other users that run > containers should not be able to modify, or delete the publicly localized > files of others. > # The node manager user should be able to access these files for the purpose > of checksumming and uploading to the shared cache without being a privileged > user. > # The solution should avoid making unnecessary copies of the localized files. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7694) Optionally run shared cache manager as part of the resource manager
Chris Trezzo created YARN-7694: -- Summary: Optionally run shared cache manager as part of the resource manager Key: YARN-7694 URL: https://issues.apache.org/jira/browse/YARN-7694 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Currently the shared cache manager is its own stand-alone daemon. It is a YARN composite service. Ideally, the shared cache manager could optionally be run as part of the resource manager. This way administrators would have to manage one less daemon. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo resolved YARN-1492. Resolution: Fixed Fix Version/s: 3.0.0 2.9.0 I am resolving this issue as all the parts are committed to trunk, branch-3.0 and branch-2. Thanks everyone who helped with reviews and design feedback! I am glad to see this feature reach this milestone. I am also looking forward to collaborating on shared cache phase two features in YARN-7282. > truly shared cache for jars (jobjar/libjar) > --- > > Key: YARN-1492 > URL: https://issues.apache.org/jira/browse/YARN-1492 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.0.4-alpha >Reporter: Sangjin Lee >Assignee: Chris Trezzo > Fix For: 2.9.0, 3.0.0 > > Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, > shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, > shared_cache_design_v5.pdf, shared_cache_design_v6.pdf, > YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, > YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, > YARN-1492-all-trunk-v5.patch > > > Currently there is the distributed cache that enables you to cache jars and > files so that attempts from the same job can reuse them. However, sharing is > limited with the distributed cache because it is normally on a per-job basis. > On a large cluster, sometimes copying of jobjars and libjars becomes so > prevalent that it consumes a large portion of the network bandwidth, not to > speak of defeating the purpose of "bringing compute to where data is". This > is wasteful because in most cases code doesn't change much across many jobs. > I'd like to propose and discuss feasibility of introducing a truly shared > cache so that multiple jobs from multiple users can share and cache jars. > This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2960) Add documentation for the YARN shared cache
[ https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193397#comment-16193397 ] Chris Trezzo commented on YARN-2960: Committed to trunk, branch-3.0, branch-2. Thanks! > Add documentation for the YARN shared cache > --- > > Key: YARN-2960 > URL: https://issues.apache.org/jira/browse/YARN-2960 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-2960-trunk-001.patch, YARN-2960-trunk-002.patch, > YARN-2960-trunk-003.patch, YARN-2960-trunk-004.patch > > > Add documentation around the architecture, api's and administration of the > YARN shared cache. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-2960) Add documentation for the YARN shared cache
[ https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193371#comment-16193371 ] Chris Trezzo edited comment on YARN-2960 at 10/5/17 6:19 PM: - I unfortunately made a small mistake while cherry-picking the patch to branch-3.0. I reverted the cherry-pick and then cherry-picked it again. Because of this there are three commits in the log: {noformat} commit e5af16cf6cd54c8358af066d1ec677378bc3029d Author: Chris Trezzo Date: Thu Oct 5 10:38:41 2017 -0700 YARN-2960. Add documentation for the YARN shared cache. (cherry picked from commit 7e76f85bc68166b01b51fcf6ba4b3fd9281d4a03) commit 4b9b66f921c671b6426d1bc912cca056cb2532c4 Author: Chris Trezzo Date: Thu Oct 5 11:11:02 2017 -0700 Revert "YARN-2960. Add documentation for the YARN shared cache." This reverts commit 54a01c28cc153872aa7eed68000ab0ddf010054a. commit 54a01c28cc153872aa7eed68000ab0ddf010054a Author: Chris Trezzo Date: Thu Oct 5 10:38:41 2017 -0700 YARN-2960. Add documentation for the YARN shared cache. (cherry picked from commit 7e76f85bc68166b01b51fcf6ba4b3fd9281d4a03) {noformat} Everything is correct now. was (Author: ctrezzo): I unfortunately made a small mistake while cherry-picking the patch to branch-3.0. I reverted the cherry-pick and then cherry-picked it again. Because of this there are three commits in the log: {noformat} commit e5af16cf6cd54c8358af066d1ec677378bc3029d Author: Chris Trezzo Date: Thu Oct 5 10:38:41 2017 -0700 YARN-2960. Add documentation for the YARN shared cache. (cherry picked from commit 7e76f85bc68166b01b51fcf6ba4b3fd9281d4a03) commit 4b9b66f921c671b6426d1bc912cca056cb2532c4 Author: Chris Trezzo Date: Thu Oct 5 11:11:02 2017 -0700 Revert "YARN-2960. Add documentation for the YARN shared cache." This reverts commit 54a01c28cc153872aa7eed68000ab0ddf010054a. commit 54a01c28cc153872aa7eed68000ab0ddf010054a Author: Chris Trezzo Date: Thu Oct 5 10:38:41 2017 -0700 YARN-2960. Add documentation for the YARN shared cache. (cherry picked from commit 7e76f85bc68166b01b51fcf6ba4b3fd9281d4a03) {noformat} > Add documentation for the YARN shared cache > --- > > Key: YARN-2960 > URL: https://issues.apache.org/jira/browse/YARN-2960 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-2960-trunk-001.patch, YARN-2960-trunk-002.patch, > YARN-2960-trunk-003.patch, YARN-2960-trunk-004.patch > > > Add documentation around the architecture, api's and administration of the > YARN shared cache. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2960) Add documentation for the YARN shared cache
[ https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193371#comment-16193371 ] Chris Trezzo commented on YARN-2960: I unfortunately made a small mistake while cherry-picking the patch to branch-3.0. I reverted the cherry-pick and then cherry-picked it again. Because of this there are three commits in the log: {noformat} commit e5af16cf6cd54c8358af066d1ec677378bc3029d Author: Chris Trezzo Date: Thu Oct 5 10:38:41 2017 -0700 YARN-2960. Add documentation for the YARN shared cache. (cherry picked from commit 7e76f85bc68166b01b51fcf6ba4b3fd9281d4a03) commit 4b9b66f921c671b6426d1bc912cca056cb2532c4 Author: Chris Trezzo Date: Thu Oct 5 11:11:02 2017 -0700 Revert "YARN-2960. Add documentation for the YARN shared cache." This reverts commit 54a01c28cc153872aa7eed68000ab0ddf010054a. commit 54a01c28cc153872aa7eed68000ab0ddf010054a Author: Chris Trezzo Date: Thu Oct 5 10:38:41 2017 -0700 YARN-2960. Add documentation for the YARN shared cache. (cherry picked from commit 7e76f85bc68166b01b51fcf6ba4b3fd9281d4a03) {noformat} > Add documentation for the YARN shared cache > --- > > Key: YARN-2960 > URL: https://issues.apache.org/jira/browse/YARN-2960 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-2960-trunk-001.patch, YARN-2960-trunk-002.patch, > YARN-2960-trunk-003.patch, YARN-2960-trunk-004.patch > > > Add documentation around the architecture, api's and administration of the > YARN shared cache. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2960) Add documentation for the YARN shared cache
[ https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192221#comment-16192221 ] Chris Trezzo commented on YARN-2960: Thanks [~mingma]! I will commit to trunk, branch-3.0 and branch-2. > Add documentation for the YARN shared cache > --- > > Key: YARN-2960 > URL: https://issues.apache.org/jira/browse/YARN-2960 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-2960-trunk-001.patch, YARN-2960-trunk-002.patch, > YARN-2960-trunk-003.patch, YARN-2960-trunk-004.patch > > > Add documentation around the architecture, api's and administration of the > YARN shared cache. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2960) Add documentation for the YARN shared cache
[ https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-2960: --- Attachment: YARN-2960-trunk-004.patch Attached v4 to add italics around parameters in the setup instructions. > Add documentation for the YARN shared cache > --- > > Key: YARN-2960 > URL: https://issues.apache.org/jira/browse/YARN-2960 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-2960-trunk-001.patch, YARN-2960-trunk-002.patch, > YARN-2960-trunk-003.patch, YARN-2960-trunk-004.patch > > > Add documentation around the architecture, api's and administration of the > YARN shared cache. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2960) Add documentation for the YARN shared cache
[ https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-2960: --- Attachment: YARN-2960-trunk-003.patch Thanks [~mingma]! Attached is a v3 patch. I put all of the configs in a markdown table. I did leave the config and setup sub-sections separate within the administration section. I added a comment in the setup to reference the configs in the following section. I mainly wanted to keep the setup steps the minimum amount of setup, versus the config section which is a reference for all configuration parameters that are available. Let me know if there is anything else. Thanks again! > Add documentation for the YARN shared cache > --- > > Key: YARN-2960 > URL: https://issues.apache.org/jira/browse/YARN-2960 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-2960-trunk-001.patch, YARN-2960-trunk-002.patch, > YARN-2960-trunk-003.patch > > > Add documentation around the architecture, api's and administration of the > YARN shared cache. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2960) Add documentation for the YARN shared cache
[ https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-2960: --- Attachment: YARN-2960-trunk-002.patch Thanks for the review [~mingma]! Attached is a trunk v2 patch to address your comments. Please let me know if there is anything else. > Add documentation for the YARN shared cache > --- > > Key: YARN-2960 > URL: https://issues.apache.org/jira/browse/YARN-2960 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-2960-trunk-001.patch, YARN-2960-trunk-002.patch > > > Add documentation around the architecture, api's and administration of the > YARN shared cache. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2960) Add documentation for the YARN shared cache
[ https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189050#comment-16189050 ] Chris Trezzo commented on YARN-2960: Patch should be good to go, please let me know if you see any issues. If I get a +1, I plan to commit this patch to trunk, branch-3.0 and branch-2. Thanks! > Add documentation for the YARN shared cache > --- > > Key: YARN-2960 > URL: https://issues.apache.org/jira/browse/YARN-2960 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-2960-trunk-001.patch > > > Add documentation around the architecture, api's and administration of the > YARN shared cache. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-1016) Define a HDFS based repository that allows YARN services to share resources
[ https://issues.apache.org/jira/browse/YARN-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo resolved YARN-1016. Resolution: Duplicate Resolving this as a duplicate of YARN-1492. Please let me know if you think otherwise. > Define a HDFS based repository that allows YARN services to share resources > --- > > Key: YARN-1016 > URL: https://issues.apache.org/jira/browse/YARN-1016 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Affects Versions: 3.0.0-alpha1 >Reporter: Kam Kasravi > > YARN services both short and long lived can benefit from a resource repo > rather than packaging resources within the YARN client to be extracted and > used by the Application Master and (later) the containers. Standardizing a > resource repo will provide performance benefits as well. The repo should be > similar to maven or ivy repo's so discovery and versioning are built-in. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-1492: --- Release Note: The YARN Shared Cache provides the facility to upload and manage shared application resources to HDFS in a safe and scalable manner. YARN applications can leverage resources uploaded by other applications or previous runs of the same application without having to re-upload and localize identical files multiple times. This will save network resources and reduce YARN application startup time. > truly shared cache for jars (jobjar/libjar) > --- > > Key: YARN-1492 > URL: https://issues.apache.org/jira/browse/YARN-1492 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.0.4-alpha >Reporter: Sangjin Lee >Assignee: Chris Trezzo > Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, > shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, > shared_cache_design_v5.pdf, shared_cache_design_v6.pdf, > YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, > YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, > YARN-1492-all-trunk-v5.patch > > > Currently there is the distributed cache that enables you to cache jars and > files so that attempts from the same job can reuse them. However, sharing is > limited with the distributed cache because it is normally on a per-job basis. > On a large cluster, sometimes copying of jobjars and libjars becomes so > prevalent that it consumes a large portion of the network bandwidth, not to > speak of defeating the purpose of "bringing compute to where data is". This > is wasteful because in most cases code doesn't change much across many jobs. > I'd like to propose and discuss feasibility of introducing a truly shared > cache so that multiple jobs from multiple users can share and cache jars. > This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2960) Add documentation for the YARN shared cache
[ https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-2960: --- Attachment: YARN-2960-trunk-001.patch Trunk v1 attached. > Add documentation for the YARN shared cache > --- > > Key: YARN-2960 > URL: https://issues.apache.org/jira/browse/YARN-2960 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-2960-trunk-001.patch > > > Add documentation around the architecture, api's and administration of the > YARN shared cache. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16188808#comment-16188808 ] Chris Trezzo commented on YARN-1492: Please let me know if you have any concerns about this. Thanks! > truly shared cache for jars (jobjar/libjar) > --- > > Key: YARN-1492 > URL: https://issues.apache.org/jira/browse/YARN-1492 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.0.4-alpha >Reporter: Sangjin Lee >Assignee: Chris Trezzo > Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, > shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, > shared_cache_design_v5.pdf, shared_cache_design_v6.pdf, > YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, > YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, > YARN-1492-all-trunk-v5.patch > > > Currently there is the distributed cache that enables you to cache jars and > files so that attempts from the same job can reuse them. However, sharing is > limited with the distributed cache because it is normally on a per-job basis. > On a large cluster, sometimes copying of jobjars and libjars becomes so > prevalent that it consumes a large portion of the network bandwidth, not to > speak of defeating the purpose of "bringing compute to where data is". This > is wasteful because in most cases code doesn't change much across many jobs. > I'd like to propose and discuss feasibility of introducing a truly shared > cache so that multiple jobs from multiple users can share and cache jars. > This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16188801#comment-16188801 ] Chris Trezzo edited comment on YARN-1492 at 10/2/17 8:50 PM: - [~asuresh] [~subru] I have set the target version for this jira back to 2.9.0. The only jira that is left for this first phase is the documentation patch (YARN-2960) and the startup script patch (YARN-4858). Both should be able to make 2.9.0. The rest of the feature is already in branch-2. I have split out some of the major features that still need to be finished in the shared cache into a phase 2 jira (YARN-7282). That being said, the core parts of this feature are committed and ready to be used in deployments that do not need phase 2 features. was (Author: ctrezzo): [~asuresh] [~subru] I have set the target version for this jira back to 2.9.0. The only jira that is left for this first phase is the documentation patch and YARN-4858. Both should be able to make 2.9.0. The rest of the feature is already in branch-2. I have split out some of the major features that still need to be finished in the shared cache into a phase 2 jira (YARN-7282). That being said, the core parts of this feature are committed and ready to be used in deployments that do not need phase 2 features. > truly shared cache for jars (jobjar/libjar) > --- > > Key: YARN-1492 > URL: https://issues.apache.org/jira/browse/YARN-1492 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.0.4-alpha >Reporter: Sangjin Lee >Assignee: Chris Trezzo > Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, > shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, > shared_cache_design_v5.pdf, shared_cache_design_v6.pdf, > YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, > YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, > YARN-1492-all-trunk-v5.patch > > > Currently there is the distributed cache that enables you to cache jars and > files so that attempts from the same job can reuse them. However, sharing is > limited with the distributed cache because it is normally on a per-job basis. > On a large cluster, sometimes copying of jobjars and libjars becomes so > prevalent that it consumes a large portion of the network bandwidth, not to > speak of defeating the purpose of "bringing compute to where data is". This > is wasteful because in most cases code doesn't change much across many jobs. > I'd like to propose and discuss feasibility of introducing a truly shared > cache so that multiple jobs from multiple users can share and cache jars. > This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16188801#comment-16188801 ] Chris Trezzo commented on YARN-1492: [~asuresh] [~subru] I have set the target version for this jira back to 2.9.0. The only jira that is left for this first phase is the documentation patch and YARN-4858. Both should be able to make 2.9.0. The rest of the feature is already in branch-2. I have split out some of the major features that still need to be finished in the shared cache into a phase 2 jira (YARN-7282). That being said, the core parts of this feature are committed and ready to be used in deployments that do not need phase 2 features. > truly shared cache for jars (jobjar/libjar) > --- > > Key: YARN-1492 > URL: https://issues.apache.org/jira/browse/YARN-1492 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.0.4-alpha >Reporter: Sangjin Lee >Assignee: Chris Trezzo > Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, > shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, > shared_cache_design_v5.pdf, shared_cache_design_v6.pdf, > YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, > YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, > YARN-1492-all-trunk-v5.patch > > > Currently there is the distributed cache that enables you to cache jars and > files so that attempts from the same job can reuse them. However, sharing is > limited with the distributed cache because it is normally on a per-job basis. > On a large cluster, sometimes copying of jobjars and libjars becomes so > prevalent that it consumes a large portion of the network bandwidth, not to > speak of defeating the purpose of "bringing compute to where data is". This > is wasteful because in most cases code doesn't change much across many jobs. > I'd like to propose and discuss feasibility of introducing a truly shared > cache so that multiple jobs from multiple users can share and cache jars. > This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-1492: --- Target Version/s: 2.9.0 (was: 3.1.0) > truly shared cache for jars (jobjar/libjar) > --- > > Key: YARN-1492 > URL: https://issues.apache.org/jira/browse/YARN-1492 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.0.4-alpha >Reporter: Sangjin Lee >Assignee: Chris Trezzo > Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, > shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, > shared_cache_design_v5.pdf, shared_cache_design_v6.pdf, > YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, > YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, > YARN-1492-all-trunk-v5.patch > > > Currently there is the distributed cache that enables you to cache jars and > files so that attempts from the same job can reuse them. However, sharing is > limited with the distributed cache because it is normally on a per-job basis. > On a large cluster, sometimes copying of jobjars and libjars becomes so > prevalent that it consumes a large portion of the network bandwidth, not to > speak of defeating the purpose of "bringing compute to where data is". This > is wasteful because in most cases code doesn't change much across many jobs. > I'd like to propose and discuss feasibility of introducing a truly shared > cache so that multiple jobs from multiple users can share and cache jars. > This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-5727) Improve YARN shared cache support for LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo reassigned YARN-5727: -- Assignee: (was: Chris Trezzo) > Improve YARN shared cache support for LinuxContainerExecutor > > > Key: YARN-5727 > URL: https://issues.apache.org/jira/browse/YARN-5727 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo > Attachments: YARN-5727-Design-v1.pdf > > > When running LinuxContainerExecutor in a secure mode > ({{yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users}} set > to {{false}}), all localized files are owned by the user that owns the > container which localized the resource. This presents a problem for the > shared cache when a YARN application requests a resource to be uploaded to > the shared cache that has a non-public visibility. The shared cache uploader > (running as the node manager user) does not have access to the localized > files and can not compute the checksum of the file or upload it to the cache. > The solution should ideally satisfy the following three requirements: > # Localized files should still be safe/secure. Other users that run > containers should not be able to modify, or delete the publicly localized > files of others. > # The node manager user should be able to access these files for the purpose > of checksumming and uploading to the shared cache without being a privileged > user. > # The solution should avoid making unnecessary copies of the localized files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5727) Improve YARN shared cache support for LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-5727: --- Issue Type: Sub-task (was: Bug) Parent: YARN-7282 > Improve YARN shared cache support for LinuxContainerExecutor > > > Key: YARN-5727 > URL: https://issues.apache.org/jira/browse/YARN-5727 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-5727-Design-v1.pdf > > > When running LinuxContainerExecutor in a secure mode > ({{yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users}} set > to {{false}}), all localized files are owned by the user that owns the > container which localized the resource. This presents a problem for the > shared cache when a YARN application requests a resource to be uploaded to > the shared cache that has a non-public visibility. The shared cache uploader > (running as the node manager user) does not have access to the localized > files and can not compute the checksum of the file or upload it to the cache. > The solution should ideally satisfy the following three requirements: > # Localized files should still be safe/secure. Other users that run > containers should not be able to modify, or delete the publicly localized > files of others. > # The node manager user should be able to access these files for the purpose > of checksumming and uploading to the shared cache without being a privileged > user. > # The solution should avoid making unnecessary copies of the localized files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2781) support more flexible policy for uploading in shared cache
[ https://issues.apache.org/jira/browse/YARN-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-2781: --- Issue Type: Sub-task (was: New Feature) Parent: YARN-7282 > support more flexible policy for uploading in shared cache > -- > > Key: YARN-2781 > URL: https://issues.apache.org/jira/browse/YARN-2781 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sangjin Lee > > Today all resources are always uploaded as long as the client wants to upload > it. We may want to implement a feature where the shared cache manager can > instruct the node managers not to upload under some circumstances. > Some examples may be uploading a resource if it is seen more than N number of > times. > This doesn't need to be included in the first version of the shared cache. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6097) Add support for directories in the Shared Cache
[ https://issues.apache.org/jira/browse/YARN-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-6097: --- Issue Type: Sub-task (was: Bug) Parent: YARN-7282 > Add support for directories in the Shared Cache > --- > > Key: YARN-6097 > URL: https://issues.apache.org/jira/browse/YARN-6097 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo > > Add support for directories in the shared cache. > If a LocalResource URL points to a directory, the directory structure is > preserved during localization on the node manager. Currently, the shared > cache does not support directories and will fail to upload the URL to the > cache if shouldBeUploadedToSharedCache is set to true. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2663) Race condintion in shared cache CleanerTask during deletion of resource
[ https://issues.apache.org/jira/browse/YARN-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-2663: --- Parent Issue: YARN-7282 (was: YARN-1492) > Race condintion in shared cache CleanerTask during deletion of resource > --- > > Key: YARN-2663 > URL: https://issues.apache.org/jira/browse/YARN-2663 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Priority: Blocker > > In CleanerTask, store.removeResource(key) and > removeResourceFromCacheFileSystem(path) do not happen together in atomic > fashion. > Since resources could be uploaded with different file names, the SCM could > receive a notification to add a resource to the SCM between the two > operations. Thus, we have a scenario where the cleaner service deletes the > entry from the scm, receives a notification from the uploader (adding the > entry back into the scm) and then deletes the file from HDFS. > Cleaner code that deletes resource: > {code} > if (store.isResourceEvictable(key, resource)) { > try { > /* >* TODO: There is a race condition between store.removeResource(key) >* and removeResourceFromCacheFileSystem(path) operations because > they >* do not happen atomically and resources can be uploaded with >* different file names by the node managers. >*/ > // remove the resource from scm (checks for appIds as well) > if (store.removeResource(key)) { > // remove the resource from the file system > boolean deleted = removeResourceFromCacheFileSystem(path); > if (deleted) { > resourceStatus = ResourceStatus.DELETED; > } else { > LOG.error("Failed to remove path from the file system." > + " Skipping this resource: " + path); > resourceStatus = ResourceStatus.ERROR; > } > } else { > // we did not delete the resource because it contained application > // ids > resourceStatus = ResourceStatus.PROCESSED; > } > } catch (IOException e) { > LOG.error( > "Failed to remove path from the file system. Skipping this > resource: " > + path, e); > resourceStatus = ResourceStatus.ERROR; > } > } else { > resourceStatus = ResourceStatus.PROCESSED; > } > {code} > Uploader code that uploads resource: > {code} > // create the temporary file > tempPath = new Path(directoryPath, getTemporaryFileName(actualPath)); > if (!uploadFile(actualPath, tempPath)) { > LOG.warn("Could not copy the file to the shared cache at " + > tempPath); > return false; > } > // set the permission so that it is readable but not writable > // TODO should I create the file with the right permission so I save the > // permission call? > fs.setPermission(tempPath, FILE_PERMISSION); > // rename it to the final filename > Path finalPath = new Path(directoryPath, actualPath.getName()); > if (!fs.rename(tempPath, finalPath)) { > LOG.warn("The file already exists under " + finalPath + > ". Ignoring this attempt."); > deleteTempFile(tempPath); > return false; > } > // notify the SCM > if (!notifySharedCacheManager(checksumVal, actualPath.getName())) { > // the shared cache manager rejected the upload (as it is likely > // uploaded under a different name > // clean up this file and exit > fs.delete(finalPath, false); > return false; > } > {code} > One solution is to have the UploaderService always rename the resource file > to the checksum of the resource plus the extension. With this fix we will > never receive a notify for the resource before the delete from the FS has > happened because the rename on the node manager will fail. If the node > manager uploads the file after it is deleted from the FS, we are ok and the > resource will simply get added back to the scm once a notification is > received. > The classpath at the MapReduce layer is still usable because we leverage > links to preserve the original client file name. > The downside is that now the shared cache files in HDFS are less readable. > This could be mitigated with an added admin command to the SCM that gives a > list of filenames associated with a checksum or vice versa. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.or
[jira] [Updated] (YARN-2774) shared cache service should authorize calls properly
[ https://issues.apache.org/jira/browse/YARN-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-2774: --- Parent Issue: YARN-7282 (was: YARN-1492) > shared cache service should authorize calls properly > > > Key: YARN-2774 > URL: https://issues.apache.org/jira/browse/YARN-2774 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sangjin Lee > > The shared cache manager (SCM) services should authorize calls properly. > Currently, the uploader service (done in YARN-2186) does not authorize calls > to notify the SCM on newly uploaded resource. Proper security/authorization > needs to be done in this RPC call. Also, the use/release calls (done in > YARN-2188) and the scmAdmin commands (done in YARN-2189) are not properly > authorized. The SCM UI done in YARN-2203 as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7282) Shared Cache Phase 2
Chris Trezzo created YARN-7282: -- Summary: Shared Cache Phase 2 Key: YARN-7282 URL: https://issues.apache.org/jira/browse/YARN-7282 Project: Hadoop YARN Issue Type: Improvement Reporter: Chris Trezzo Phase 2 will address more features that need to be built as part of the shared cache project. See YARN-1492 for the first release of the shared cache. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7001) If shared cache upload is terminated in the middle, the temp file will never be deleted
[ https://issues.apache.org/jira/browse/YARN-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16187584#comment-16187584 ] Chris Trezzo commented on YARN-7001: Also thanks [~Sen Zhao] for the patch! > If shared cache upload is terminated in the middle, the temp file will never > be deleted > --- > > Key: YARN-7001 > URL: https://issues.apache.org/jira/browse/YARN-7001 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Sen Zhao > Attachments: YARN-7001.001.patch, YARN-7001.002.patch, > YARN-7001.003.patch, YARN-7001.004.patch > > > There is a missing deleteTempFile(tempPath); > {code} > tempPath = new Path(directoryPath, getTemporaryFileName(actualPath)); > if (!uploadFile(actualPath, tempPath)) { > LOG.warn("Could not copy the file to the shared cache at " + > tempPath); > return false; > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7001) If shared cache upload is terminated in the middle, the temp file will never be deleted
[ https://issues.apache.org/jira/browse/YARN-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16187583#comment-16187583 ] Chris Trezzo commented on YARN-7001: Looking at this patch, I am not entirely sure if this fixes the issue. I am thinking about these two scenarios: # If {{uploadFile}} returns false: {{FileUtil.copy}} has returned false. If we look into that method, I think the only way it will return false is if the file has not been created yet, since we pass in deleteSource as false. In this case, we do not need a deleteTempFile call. # If {{uploadFile}} throws an IOException: Here we might have an issue. If copy throws an IOException after it created the tmp file, but before it finished writing it, we may be stranding tmp files. It seems like we would want a try/catch around the uploadFile. If we get an IOException, we would want to delete the tmp file if it exists. In reality, we could be stranding tmp files if the node manager fails at any point between the file creation in uploadFile and the file rename later in the method. In practice, this doesn't seem to be an issue because the time between those points is small. Maybe we could add a try/finally around those two points where we attempt to delete the tmp file in the finally? That at least covers the case where there is an unexpected exception. Let me know if you think I have missed something. Thanks! > If shared cache upload is terminated in the middle, the temp file will never > be deleted > --- > > Key: YARN-7001 > URL: https://issues.apache.org/jira/browse/YARN-7001 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Sen Zhao > Attachments: YARN-7001.001.patch, YARN-7001.002.patch, > YARN-7001.003.patch, YARN-7001.004.patch > > > There is a missing deleteTempFile(tempPath); > {code} > tempPath = new Path(directoryPath, getTemporaryFileName(actualPath)); > if (!uploadFile(actualPath, tempPath)) { > LOG.warn("Could not copy the file to the shared cache at " + > tempPath); > return false; > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7250) Update Shared cache client api to use URLs
[ https://issues.apache.org/jira/browse/YARN-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183463#comment-16183463 ] Chris Trezzo commented on YARN-7250: Thank you [~vrushalic] for the review! I will wait until tomorrow to commit, just in case there are any other comments. Otherwise, I plan to commit to trunk, branch-3.0 and branch-2. > Update Shared cache client api to use URLs > -- > > Key: YARN-7250 > URL: https://issues.apache.org/jira/browse/YARN-7250 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Minor > Attachments: YARN-7250-trunk-001.patch > > > We should make the SharedCacheClient use api more consistent with other YARN > api methods. We can do this by doing two things: > # Update the SharedCacheClient#use api so that it returns a URL instead of a > Path. Currently yarn developers have to convert the path to a URL when > creating a LocalResources. It would be much smoother if they could just use a > URL passed to them by the shared cache client. > # Remove the portion of the client that deals with fragments as this is not > consistent with the rest of YARN. This functionality is bleeding in from the > MapReduce layer, which uses fragments to keep track of destination file > names. YARN's api does not use fragments. Instead the ContainerLaunchContext > expects a Map localResources, where the strings are > the destination file names. We should let the YARN application handle > destination file names however it wants instead of pushing this into the > shared cache api. Additionally, fragments are a clunky way to handle this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7250) Update Shared cache client api to use URLs
[ https://issues.apache.org/jira/browse/YARN-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183443#comment-16183443 ] Chris Trezzo commented on YARN-7250: Also to clarify, this is a client side only change. The protobuf/rpc between the client and the SCM is staying the same. > Update Shared cache client api to use URLs > -- > > Key: YARN-7250 > URL: https://issues.apache.org/jira/browse/YARN-7250 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Minor > Attachments: YARN-7250-trunk-001.patch > > > We should make the SharedCacheClient use api more consistent with other YARN > api methods. We can do this by doing two things: > # Update the SharedCacheClient#use api so that it returns a URL instead of a > Path. Currently yarn developers have to convert the path to a URL when > creating a LocalResources. It would be much smoother if they could just use a > URL passed to them by the shared cache client. > # Remove the portion of the client that deals with fragments as this is not > consistent with the rest of YARN. This functionality is bleeding in from the > MapReduce layer, which uses fragments to keep track of destination file > names. YARN's api does not use fragments. Instead the ContainerLaunchContext > expects a Map localResources, where the strings are > the destination file names. We should let the YARN application handle > destination file names however it wants instead of pushing this into the > shared cache api. Additionally, fragments are a clunky way to handle this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7253) Shared Cache Manager daemon command listed as admin subcmd in yarn script
[ https://issues.apache.org/jira/browse/YARN-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181276#comment-16181276 ] Chris Trezzo commented on YARN-7253: Thanks [~vrushalic]! I will commit later today to trunk and branch-3.0. > Shared Cache Manager daemon command listed as admin subcmd in yarn script > - > > Key: YARN-7253 > URL: https://issues.apache.org/jira/browse/YARN-7253 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-beta1, 3.0.0-alpha4 >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Trivial > Attachments: YARN-7253-trunk-001.patch > > > Currently the command to start the shared cache manager daemon is listed as > an admin command in the yarn script usage: > {noformat} > SUBCOMMAND is one of: > Admin Commands: > daemonlogget/set the log level for each daemon > node prints node report(s) > rmadmin admin tools > scmadmin SharedCacheManager admin tools > sharedcachemanager run the SharedCacheManager daemon > {noformat} > It should be a daemon command. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7253) Shared Cache Manager daemon command listed as admin subcmd in yarn script
[ https://issues.apache.org/jira/browse/YARN-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-7253: --- Affects Version/s: 3.0.0-alpha4 > Shared Cache Manager daemon command listed as admin subcmd in yarn script > - > > Key: YARN-7253 > URL: https://issues.apache.org/jira/browse/YARN-7253 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-beta1, 3.0.0-alpha4 >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Trivial > Attachments: YARN-7253-trunk-001.patch > > > Currently the command to start the shared cache manager daemon is listed as > an admin command in the yarn script usage: > {noformat} > SUBCOMMAND is one of: > Admin Commands: > daemonlogget/set the log level for each daemon > node prints node report(s) > rmadmin admin tools > scmadmin SharedCacheManager admin tools > sharedcachemanager run the SharedCacheManager daemon > {noformat} > It should be a daemon command. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7253) Shared Cache Manager daemon command listed as admin subcmd in yarn script
[ https://issues.apache.org/jira/browse/YARN-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-7253: --- Affects Version/s: 3.0.0-beta1 > Shared Cache Manager daemon command listed as admin subcmd in yarn script > - > > Key: YARN-7253 > URL: https://issues.apache.org/jira/browse/YARN-7253 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-beta1 >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Trivial > Attachments: YARN-7253-trunk-001.patch > > > Currently the command to start the shared cache manager daemon is listed as > an admin command in the yarn script usage: > {noformat} > SUBCOMMAND is one of: > Admin Commands: > daemonlogget/set the log level for each daemon > node prints node report(s) > rmadmin admin tools > scmadmin SharedCacheManager admin tools > sharedcachemanager run the SharedCacheManager daemon > {noformat} > It should be a daemon command. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7253) Shared Cache Manager daemon command listed as admin subcmd in yarn script
[ https://issues.apache.org/jira/browse/YARN-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo reassigned YARN-7253: -- Assignee: Chris Trezzo > Shared Cache Manager daemon command listed as admin subcmd in yarn script > - > > Key: YARN-7253 > URL: https://issues.apache.org/jira/browse/YARN-7253 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Trivial > Attachments: YARN-7253-trunk-001.patch > > > Currently the command to start the shared cache manager daemon is listed as > an admin command in the yarn script usage: > {noformat} > SUBCOMMAND is one of: > Admin Commands: > daemonlogget/set the log level for each daemon > node prints node report(s) > rmadmin admin tools > scmadmin SharedCacheManager admin tools > sharedcachemanager run the SharedCacheManager daemon > {noformat} > It should be a daemon command. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7253) Shared Cache Manager daemon command listed as admin subcmd in yarn script
[ https://issues.apache.org/jira/browse/YARN-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-7253: --- Attachment: YARN-7253-trunk-001.patch Trunk v1 patch attached. > Shared Cache Manager daemon command listed as admin subcmd in yarn script > - > > Key: YARN-7253 > URL: https://issues.apache.org/jira/browse/YARN-7253 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chris Trezzo >Priority: Trivial > Attachments: YARN-7253-trunk-001.patch > > > Currently the command to start the shared cache manager daemon is listed as > an admin command in the yarn script usage: > {noformat} > SUBCOMMAND is one of: > Admin Commands: > daemonlogget/set the log level for each daemon > node prints node report(s) > rmadmin admin tools > scmadmin SharedCacheManager admin tools > sharedcachemanager run the SharedCacheManager daemon > {noformat} > It should be a daemon command. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7253) Shared Cache Manager daemon command listed as admin subcmd in yarn script
[ https://issues.apache.org/jira/browse/YARN-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-7253: --- Description: Currently the command to start the shared cache manager daemon is listed as an admin command in the yarn script usage: {noformat} SUBCOMMAND is one of: Admin Commands: daemonlogget/set the log level for each daemon node prints node report(s) rmadmin admin tools scmadmin SharedCacheManager admin tools sharedcachemanager run the SharedCacheManager daemon {noformat} It should be a daemon command. was: Currently the command to start the shared cache manager daemon is listed as an admin command in the yarn script: {noformat} SUBCOMMAND is one of: Admin Commands: daemonlogget/set the log level for each daemon node prints node report(s) rmadmin admin tools scmadmin SharedCacheManager admin tools sharedcachemanager run the SharedCacheManager daemon {noformat} It should be a daemon command. > Shared Cache Manager daemon command listed as admin subcmd in yarn script > - > > Key: YARN-7253 > URL: https://issues.apache.org/jira/browse/YARN-7253 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chris Trezzo >Priority: Trivial > > Currently the command to start the shared cache manager daemon is listed as > an admin command in the yarn script usage: > {noformat} > SUBCOMMAND is one of: > Admin Commands: > daemonlogget/set the log level for each daemon > node prints node report(s) > rmadmin admin tools > scmadmin SharedCacheManager admin tools > sharedcachemanager run the SharedCacheManager daemon > {noformat} > It should be a daemon command. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7253) Shared Cache Manager daemon command listed as admin subcmd in yarn script
[ https://issues.apache.org/jira/browse/YARN-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-7253: --- Priority: Trivial (was: Minor) > Shared Cache Manager daemon command listed as admin subcmd in yarn script > - > > Key: YARN-7253 > URL: https://issues.apache.org/jira/browse/YARN-7253 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chris Trezzo >Priority: Trivial > > Currently the command to start the shared cache manager daemon is listed as > an admin command in the yarn script: > {noformat} > SUBCOMMAND is one of: > Admin Commands: > daemonlogget/set the log level for each daemon > node prints node report(s) > rmadmin admin tools > scmadmin SharedCacheManager admin tools > sharedcachemanager run the SharedCacheManager daemon > {noformat} > It should be a daemon command. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7253) Shared Cache Manager daemon command listed as admin subcmd in yarn script
Chris Trezzo created YARN-7253: -- Summary: Shared Cache Manager daemon command listed as admin subcmd in yarn script Key: YARN-7253 URL: https://issues.apache.org/jira/browse/YARN-7253 Project: Hadoop YARN Issue Type: Bug Reporter: Chris Trezzo Priority: Minor Currently the command to start the shared cache manager daemon is listed as an admin command in the yarn script: {noformat} SUBCOMMAND is one of: Admin Commands: daemonlogget/set the log level for each daemon node prints node report(s) rmadmin admin tools scmadmin SharedCacheManager admin tools sharedcachemanager run the SharedCacheManager daemon {noformat} It should be a daemon command. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7250) Update Shared cache client api to use URLs
[ https://issues.apache.org/jira/browse/YARN-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179819#comment-16179819 ] Chris Trezzo commented on YARN-7250: The TestNMClient failure is happening on trunk as well, and is unrelated to this patch. The same goes for the TestAMRMClient timeout. The patch should be good to go. The patch is technically an incompatible change, but this api is marked unstable and this feature is still in an alpha state, so there should be no issue. My intention is to, pending review, check this into trunk, branch-3.0 and branch-2. > Update Shared cache client api to use URLs > -- > > Key: YARN-7250 > URL: https://issues.apache.org/jira/browse/YARN-7250 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Minor > Attachments: YARN-7250-trunk-001.patch > > > We should make the SharedCacheClient use api more consistent with other YARN > api methods. We can do this by doing two things: > # Update the SharedCacheClient#use api so that it returns a URL instead of a > Path. Currently yarn developers have to convert the path to a URL when > creating a LocalResources. It would be much smoother if they could just use a > URL passed to them by the shared cache client. > # Remove the portion of the client that deals with fragments as this is not > consistent with the rest of YARN. This functionality is bleeding in from the > MapReduce layer, which uses fragments to keep track of destination file > names. YARN's api does not use fragments. Instead the ContainerLaunchContext > expects a Map localResources, where the strings are > the destination file names. We should let the YARN application handle > destination file names however it wants instead of pushing this into the > shared cache api. Additionally, fragments are a clunky way to handle this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7250) Update Shared cache client api to use URLs
[ https://issues.apache.org/jira/browse/YARN-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-7250: --- Description: We should make the SharedCacheClient use api more consistent with other YARN api methods. We can do this by doing two things: # Update the SharedCacheClient#use api so that it returns a URL instead of a Path. Currently yarn developers have to convert the path to a URL when creating a LocalResources. It would be much smoother if they could just use a URL passed to them by the shared cache client. # Remove the portion of the client that deals with fragments as this is not consistent with the rest of YARN. This functionality is bleeding in from the MapReduce layer, which uses fragments to keep track of destination file names. YARN's api does not use fragments. Instead the ContainerLaunchContext expects a Map localResources, where the strings are the destination file names. We should let the YARN application handle destination file names however it wants instead of pushing this into the shared cache api. Additionally, fragments are a clunky way to handle this. was: We should make the SharedCacheClient use api more consistent with other YARN api methods. We can do this by doing two things: # Update the SharedCacheClient#use api so that it returns a URL instead of a Path. Currently yarn developers have to convert the path to a URL when creating a LocalResources. It would be much smoother if they could just use a URL passed to them by the shared cache client. # Remove the portion of the client that deals with fragments as this is not consistent with the rest of YARN. This functionality is bleeding in from the MapReduce layer, which uses fragments to keep track of destination file names. YARN's api does not use fragments. Instead the ContainerLaunchContext expects a Map localResources, where the strings are the destination file names. We should let the YARN application handle destination file names however it wants instead of pushing this into the shared cache api. Additionally, fragments is a clunky way to handle this. > Update Shared cache client api to use URLs > -- > > Key: YARN-7250 > URL: https://issues.apache.org/jira/browse/YARN-7250 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Minor > Attachments: YARN-7250-trunk-001.patch > > > We should make the SharedCacheClient use api more consistent with other YARN > api methods. We can do this by doing two things: > # Update the SharedCacheClient#use api so that it returns a URL instead of a > Path. Currently yarn developers have to convert the path to a URL when > creating a LocalResources. It would be much smoother if they could just use a > URL passed to them by the shared cache client. > # Remove the portion of the client that deals with fragments as this is not > consistent with the rest of YARN. This functionality is bleeding in from the > MapReduce layer, which uses fragments to keep track of destination file > names. YARN's api does not use fragments. Instead the ContainerLaunchContext > expects a Map localResources, where the strings are > the destination file names. We should let the YARN application handle > destination file names however it wants instead of pushing this into the > shared cache api. Additionally, fragments are a clunky way to handle this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7250) Update Shared cache client api to use URLs
[ https://issues.apache.org/jira/browse/YARN-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-7250: --- Attachment: YARN-7250-trunk-001.patch Attached v1 patch. > Update Shared cache client api to use URLs > -- > > Key: YARN-7250 > URL: https://issues.apache.org/jira/browse/YARN-7250 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Minor > Attachments: YARN-7250-trunk-001.patch > > > We should make the SharedCacheClient use api more consistent with other YARN > api methods. We can do this by doing two things: > # Update the SharedCacheClient#use api so that it returns a URL instead of a > Path. Currently yarn developers have to convert the path to a URL when > creating a LocalResources. It would be much smoother if they could just use a > URL passed to them by the shared cache client. > # Remove the portion of the client that deals with fragments as this is not > consistent with the rest of YARN. This functionality is bleeding in from the > MapReduce layer, which uses fragments to keep track of destination file > names. YARN's api does not use fragments. Instead the ContainerLaunchContext > expects a Map localResources, where the strings are > the destination file names. We should let the YARN application handle > destination file names however it wants instead of pushing this into the > shared cache api. Additionally, fragments is a clunky way to handle this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7250) Update Shared cache client api to use URLs
Chris Trezzo created YARN-7250: -- Summary: Update Shared cache client api to use URLs Key: YARN-7250 URL: https://issues.apache.org/jira/browse/YARN-7250 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Priority: Minor We should make the SharedCacheClient use api more consistent with other YARN api methods. We can do this by doing two things: # Update the SharedCacheClient#use api so that it returns a URL instead of a Path. Currently yarn developers have to convert the path to a URL when creating a LocalResources. It would be much smoother if they could just use a URL passed to them by the shared cache client. # Remove the portion of the client that deals with fragments as this is not consistent with the rest of YARN. This functionality is bleeding in from the MapReduce layer, which uses fragments to keep track of destination file names. YARN's api does not use fragments. Instead the ContainerLaunchContext expects a Map localResources, where the strings are the destination file names. We should let the YARN application handle destination file names however it wants instead of pushing this into the shared cache api. Additionally, fragments is a clunky way to handle this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081151#comment-16081151 ] Chris Trezzo commented on YARN-1492: bq. Could you explain how the shared cache leverage the node manager local cache in detail? The shared cache leverages the local cache via the normal LocalResource API. The YARN application specifies a shared cache path that it received from the shared cache as the LocalResource URI. bq. Are those shared jars marked as PUBLIC? Yes, currently all resources in the shared cache are world readable, so they are in that sense public. However, at the node manager level you could set the visibilities to PRIVATE or APPLICATION. bq. Could you point me the source code that handle this? The shared cache uses the normal localization code path (see ResourceLocalizationService). For shared cache specific parts to upload a resource to the cache see SharedCacheUploader. If you want to see an example of how a YARN application can implement support for the shared cache, see MAPREDUCE-5951 for how map reduce does it. > truly shared cache for jars (jobjar/libjar) > --- > > Key: YARN-1492 > URL: https://issues.apache.org/jira/browse/YARN-1492 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.0.4-alpha >Reporter: Sangjin Lee >Assignee: Chris Trezzo > Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, > shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, > shared_cache_design_v5.pdf, shared_cache_design_v6.pdf, > YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, > YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, > YARN-1492-all-trunk-v5.patch > > > Currently there is the distributed cache that enables you to cache jars and > files so that attempts from the same job can reuse them. However, sharing is > limited with the distributed cache because it is normally on a per-job basis. > On a large cluster, sometimes copying of jobjars and libjars becomes so > prevalent that it consumes a large portion of the network bandwidth, not to > speak of defeating the purpose of "bringing compute to where data is". This > is wasteful because in most cases code doesn't change much across many jobs. > I'd like to propose and discuss feasibility of introducing a truly shared > cache so that multiple jobs from multiple users can share and cache jars. > This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
[ https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961163#comment-15961163 ] Chris Trezzo commented on YARN-5797: Thanks everyone for the reviews and [~mingma] for the commit! > Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches > -- > > Key: YARN-5797 > URL: https://issues.apache.org/jira/browse/YARN-5797 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Fix For: 2.9.0, 3.0.0-alpha3 > > Attachments: YARN-5797-branch-2.001.patch, YARN-5797-trunk.002.patch, > YARN-5797-trunk.003.patch, YARN-5797-trunk-v1.patch > > > Add new metrics to the node manager around the local cache sizes and how much > is being cleaned from them on a regular bases. For example, we can expose > information contained in the {{LocalCacheCleanerStats}} class. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
[ https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957929#comment-15957929 ] Chris Trezzo commented on YARN-5797: Branch-2 test failures are unrelated. From my perspective the patch is good to go for trunk and branch-2. Thanks! > Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches > -- > > Key: YARN-5797 > URL: https://issues.apache.org/jira/browse/YARN-5797 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-5797-branch-2.001.patch, YARN-5797-trunk.002.patch, > YARN-5797-trunk.003.patch, YARN-5797-trunk-v1.patch > > > Add new metrics to the node manager around the local cache sizes and how much > is being cleaned from them on a regular bases. For example, we can expose > information contained in the {{LocalCacheCleanerStats}} class. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
[ https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957794#comment-15957794 ] Chris Trezzo edited comment on YARN-5797 at 4/5/17 9:43 PM: Also attached is a v1 branch-2 patch. The only conflict was in TestResourceLocalizationService.java and it was due to a difference in formatting between the branches on one line. was (Author: ctrezzo): Also attached is a v1 branch-2 patch. The only conflict was in TestResourceLocalization and it was due to a difference in formatting between the branches on one line. > Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches > -- > > Key: YARN-5797 > URL: https://issues.apache.org/jira/browse/YARN-5797 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-5797-branch-2.001.patch, YARN-5797-trunk.002.patch, > YARN-5797-trunk.003.patch, YARN-5797-trunk-v1.patch > > > Add new metrics to the node manager around the local cache sizes and how much > is being cleaned from them on a regular bases. For example, we can expose > information contained in the {{LocalCacheCleanerStats}} class. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
[ https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-5797: --- Attachment: YARN-5797-branch-2.001.patch Also attached is a v1 branch-2 patch. The only conflict was in TestResourceLocalization and it was due to a difference in formatting between the branches on one line. > Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches > -- > > Key: YARN-5797 > URL: https://issues.apache.org/jira/browse/YARN-5797 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-5797-branch-2.001.patch, YARN-5797-trunk.002.patch, > YARN-5797-trunk.003.patch, YARN-5797-trunk-v1.patch > > > Add new metrics to the node manager around the local cache sizes and how much > is being cleaned from them on a regular bases. For example, we can expose > information contained in the {{LocalCacheCleanerStats}} class. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
[ https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-5797: --- Attachment: YARN-5797-trunk.003.patch Attached is a v3 patch that is rebased on trunk. > Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches > -- > > Key: YARN-5797 > URL: https://issues.apache.org/jira/browse/YARN-5797 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-5797-trunk.002.patch, YARN-5797-trunk.003.patch, > YARN-5797-trunk-v1.patch > > > Add new metrics to the node manager around the local cache sizes and how much > is being cleaned from them on a regular bases. For example, we can expose > information contained in the {{LocalCacheCleanerStats}} class. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6004) Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer so that it is less than 150 lines
[ https://issues.apache.org/jira/browse/YARN-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957380#comment-15957380 ] Chris Trezzo commented on YARN-6004: Thanks [~mingma]! > Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer > so that it is less than 150 lines > -- > > Key: YARN-6004 > URL: https://issues.apache.org/jira/browse/YARN-6004 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Trivial > Labels: newbie > Fix For: 2.9.0, 3.0.0-alpha3 > > Attachments: YARN-6004-branch-2.001.patch, YARN-6004-trunk.001.patch, > YARN-6004-trunk.002.patch > > > The TestResourceLocalizationService#testDownloadingResourcesOnContainerKill > method is over 150 lines: > bq. > ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java:1128: > @Test(timeout = 2):3: Method length is 242 lines (max allowed is 150). > This method needs to be refactored and broken up into smaller methods. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6004) Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer so that it is less than 150 lines
[ https://issues.apache.org/jira/browse/YARN-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956103#comment-15956103 ] Chris Trezzo commented on YARN-6004: These unit tests fail locally on branch-2 without the patch. As far as I can tell these failures are unrelated. The patches should be good to go for trunk and branch-2. > Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer > so that it is less than 150 lines > -- > > Key: YARN-6004 > URL: https://issues.apache.org/jira/browse/YARN-6004 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Trivial > Labels: newbie > Attachments: YARN-6004-branch-2.001.patch, YARN-6004-trunk.001.patch, > YARN-6004-trunk.002.patch > > > The TestResourceLocalizationService#testDownloadingResourcesOnContainerKill > method is over 150 lines: > bq. > ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java:1128: > @Test(timeout = 2):3: Method length is 242 lines (max allowed is 150). > This method needs to be refactored and broken up into smaller methods. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
[ https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956049#comment-15956049 ] Chris Trezzo commented on YARN-5797: As soon as the patch for YARN-6004 is in, I will rebase for trunk and branch-2. > Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches > -- > > Key: YARN-5797 > URL: https://issues.apache.org/jira/browse/YARN-5797 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-5797-trunk.002.patch, YARN-5797-trunk-v1.patch > > > Add new metrics to the node manager around the local cache sizes and how much > is being cleaned from them on a regular bases. For example, we can expose > information contained in the {{LocalCacheCleanerStats}} class. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6004) Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer so that it is less than 150 lines
[ https://issues.apache.org/jira/browse/YARN-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-6004: --- Attachment: YARN-6004-branch-2.001.patch Thanks [~mingma]! Attached is a branch-2 v1 patch. Note that the reason why this does not blow up in trunk is because: "starting in Java SE 8, a local class can access local variables and parameters of the enclosing block that are final or effectively final. A variable or parameter whose value is never changed after it is initialized is effectively final." Because of this, the final modifier is unnecessary. See java documentation: http://docs.oracle.com/javase/tutorial/java/javaOO/localclasses.html > Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer > so that it is less than 150 lines > -- > > Key: YARN-6004 > URL: https://issues.apache.org/jira/browse/YARN-6004 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Trivial > Labels: newbie > Attachments: YARN-6004-branch-2.001.patch, YARN-6004-trunk.001.patch, > YARN-6004-trunk.002.patch > > > The TestResourceLocalizationService#testDownloadingResourcesOnContainerKill > method is over 150 lines: > bq. > ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java:1128: > @Test(timeout = 2):3: Method length is 242 lines (max allowed is 150). > This method needs to be refactored and broken up into smaller methods. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955930#comment-15955930 ] Chris Trezzo commented on YARN-1492: Hi [~jhung], thanks for the question! bq. will this feature save jars from being relocalized across different jobs on a node? The short answer is yes. YARN applications can leverage this feature to prevent relocalizing the same resources over and over again from both the client to hdfs as well as from hdfs to the node managers. The shared cache leverages checksuming and the node manager local cache to ensure applications can reuse resources that are already localized on node managers. See MAPREDUCE-5951 for mapreduce level support for the shared cache (which will hopefully be committed shortly to trunk and branch-2). Please let me know if you have any more questions. Here is also a slide deck explaining the feature at a high-level: https://www.slideshare.net/ctrezzo/a-secure-public-cache-for-yarn-application-resources-61688793 > truly shared cache for jars (jobjar/libjar) > --- > > Key: YARN-1492 > URL: https://issues.apache.org/jira/browse/YARN-1492 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.0.4-alpha >Reporter: Sangjin Lee >Assignee: Chris Trezzo > Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, > shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, > shared_cache_design_v5.pdf, shared_cache_design_v6.pdf, > YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, > YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, > YARN-1492-all-trunk-v5.patch > > > Currently there is the distributed cache that enables you to cache jars and > files so that attempts from the same job can reuse them. However, sharing is > limited with the distributed cache because it is normally on a per-job basis. > On a large cluster, sometimes copying of jobjars and libjars becomes so > prevalent that it consumes a large portion of the network bandwidth, not to > speak of defeating the purpose of "bringing compute to where data is". This > is wasteful because in most cases code doesn't change much across many jobs. > I'd like to propose and discuss feasibility of introducing a truly shared > cache so that multiple jobs from multiple users can share and cache jars. > This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2960) Add documentation for the YARN shared cache
[ https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1597#comment-1597 ] Chris Trezzo commented on YARN-2960: Circling back and taking a look at this now. > Add documentation for the YARN shared cache > --- > > Key: YARN-2960 > URL: https://issues.apache.org/jira/browse/YARN-2960 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > > Add documentation around the architecture, api's and administration of the > YARN shared cache. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
[ https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948062#comment-15948062 ] Chris Trezzo commented on YARN-5797: Patch is now available for YARN-6004 as well. Thanks! > Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches > -- > > Key: YARN-5797 > URL: https://issues.apache.org/jira/browse/YARN-5797 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-5797-trunk.002.patch, YARN-5797-trunk-v1.patch > > > Add new metrics to the node manager around the local cache sizes and how much > is being cleaned from them on a regular bases. For example, we can expose > information contained in the {{LocalCacheCleanerStats}} class. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6004) Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer so that it is less than 150 lines
[ https://issues.apache.org/jira/browse/YARN-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-6004: --- Attachment: YARN-6004-trunk.002.patch Attached is v2 for trunk to address checkstyle issues. > Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer > so that it is less than 150 lines > -- > > Key: YARN-6004 > URL: https://issues.apache.org/jira/browse/YARN-6004 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Trivial > Labels: newbie > Attachments: YARN-6004-trunk.001.patch, YARN-6004-trunk.002.patch > > > The TestResourceLocalizationService#testDownloadingResourcesOnContainerKill > method is over 150 lines: > bq. > ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java:1128: > @Test(timeout = 2):3: Method length is 242 lines (max allowed is 150). > This method needs to be refactored and broken up into smaller methods. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6004) Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer so that it is less than 150 lines
[ https://issues.apache.org/jira/browse/YARN-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-6004: --- Attachment: YARN-6004-trunk.001.patch Attached is a v1 patch for trunk. This patch breaks the following out into separate methods: # Creation methods for various objects in the test (i.e. Dispatcher, ApplicationBus, ContainerBus...). # App initialization # Localizer initialization # Localizer status mocking # Localization > Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer > so that it is less than 150 lines > -- > > Key: YARN-6004 > URL: https://issues.apache.org/jira/browse/YARN-6004 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Trivial > Labels: newbie > Attachments: YARN-6004-trunk.001.patch > > > The TestResourceLocalizationService#testDownloadingResourcesOnContainerKill > method is over 150 lines: > bq. > ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java:1128: > @Test(timeout = 2):3: Method length is 242 lines (max allowed is 150). > This method needs to be refactored and broken up into smaller methods. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6004) Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer so that it is less than 150 lines
[ https://issues.apache.org/jira/browse/YARN-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo reassigned YARN-6004: -- Assignee: Chris Trezzo (was: luhuichun) > Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer > so that it is less than 150 lines > -- > > Key: YARN-6004 > URL: https://issues.apache.org/jira/browse/YARN-6004 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Trivial > Labels: newbie > > The TestResourceLocalizationService#testDownloadingResourcesOnContainerKill > method is over 150 lines: > bq. > ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java:1128: > @Test(timeout = 2):3: Method length is 242 lines (max allowed is 150). > This method needs to be refactored and broken up into smaller methods. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-2960) Add documentation for the YARN shared cache
[ https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo reassigned YARN-2960: -- Assignee: Chris Trezzo > Add documentation for the YARN shared cache > --- > > Key: YARN-2960 > URL: https://issues.apache.org/jira/browse/YARN-2960 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > > Add documentation around the architecture, api's and administration of the > YARN shared cache. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6117) SharedCacheManager does not start up
[ https://issues.apache.org/jira/browse/YARN-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838853#comment-15838853 ] Chris Trezzo commented on YARN-6117: Thanks [~sjlee0] for the review and commit! > SharedCacheManager does not start up > > > Key: YARN-6117 > URL: https://issues.apache.org/jira/browse/YARN-6117 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3, 3.0.0-alpha2 >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Fix For: 2.9.0, 3.0.0-alpha3 > > Attachments: YARN-6117-trunk.001.patch > > > The webapp directory for the SharedCacheManager is missing and the SCM fails > to start up with the following: > {noformat} > 2017-01-22 00:14:25,162 INFO org.apache.hadoop.service.AbstractService: > Service SharedCacheManager failed in state STARTED; cause: > org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server > org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server > at > org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:330) > at > org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:377) > at > org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:373) > at > org.apache.hadoop.yarn.server.sharedcachemanager.webapp.SCMWebServer.serviceStart(SCMWebServer.java:65) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:157) > Caused by: java.io.FileNotFoundException: webapps/sharedcache not found in > CLASSPATH > at > org.apache.hadoop.http.HttpServer2.getWebAppsPath(HttpServer2.java:972) > at org.apache.hadoop.http.HttpServer2.(HttpServer2.java:478) > at org.apache.hadoop.http.HttpServer2.(HttpServer2.java:117) > at > org.apache.hadoop.http.HttpServer2$Builder.build(HttpServer2.java:392) > at > org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:291) > ... 7 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3637) Handle localization sym-linking correctly at the YARN level
[ https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838851#comment-15838851 ] Chris Trezzo commented on YARN-3637: Thanks [~sjlee0] for the commit/review and thanks [~templedf] for the review! > Handle localization sym-linking correctly at the YARN level > --- > > Key: YARN-3637 > URL: https://issues.apache.org/jira/browse/YARN-3637 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Fix For: 2.9.0, 3.0.0-alpha3 > > Attachments: YARN-3637-trunk.001.patch, YARN-3637-trunk.002.patch, > YARN-3637-trunk.003.patch > > > The shared cache needs to handle resource sym-linking at the YARN layer. > Currently, we let the application layer (i.e. mapreduce) handle this, but it > is probably better for all applications if it is handled transparently. > Here is the scenario: > Imagine two separate jars (with unique checksums) that have the same name > job.jar. > They are stored in the shared cache as two separate resources: > checksum1/job.jar > checksum2/job.jar > A new application tries to use both of these resources, but internally refers > to them as different names: > foo.jar maps to checksum1 > bar.jar maps to checksum2 > When the shared cache returns the path to the resources, both resources are > named the same (i.e. job.jar). Because of this, when the resources are > localized one of them clobbers the other. This is because both symlinks in > the container_id directory are the same name (i.e. job.jar) even though they > point to two separate resource directories. > Originally we tackled this in the MapReduce client by using the fragment > portion of the resource url. This, however, seems like something that should > be solved at the YARN layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3637) Handle localization sym-linking correctly at the YARN level
[ https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838708#comment-15838708 ] Chris Trezzo commented on YARN-3637: I would say trunk and branch-2 are sufficient. Thanks [~sjlee0]! > Handle localization sym-linking correctly at the YARN level > --- > > Key: YARN-3637 > URL: https://issues.apache.org/jira/browse/YARN-3637 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-3637-trunk.001.patch, YARN-3637-trunk.002.patch, > YARN-3637-trunk.003.patch > > > The shared cache needs to handle resource sym-linking at the YARN layer. > Currently, we let the application layer (i.e. mapreduce) handle this, but it > is probably better for all applications if it is handled transparently. > Here is the scenario: > Imagine two separate jars (with unique checksums) that have the same name > job.jar. > They are stored in the shared cache as two separate resources: > checksum1/job.jar > checksum2/job.jar > A new application tries to use both of these resources, but internally refers > to them as different names: > foo.jar maps to checksum1 > bar.jar maps to checksum2 > When the shared cache returns the path to the resources, both resources are > named the same (i.e. job.jar). Because of this, when the resources are > localized one of them clobbers the other. This is because both symlinks in > the container_id directory are the same name (i.e. job.jar) even though they > point to two separate resource directories. > Originally we tackled this in the MapReduce client by using the fragment > portion of the resource url. This, however, seems like something that should > be solved at the YARN layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6117) SharedCacheManager does not start up
[ https://issues.apache.org/jira/browse/YARN-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836777#comment-15836777 ] Chris Trezzo commented on YARN-6117: branch-2.7 might make sense as well, since this is in the 2.7.x line. > SharedCacheManager does not start up > > > Key: YARN-6117 > URL: https://issues.apache.org/jira/browse/YARN-6117 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3, 3.0.0-alpha2 >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Fix For: 3.0.0-alpha3 > > Attachments: YARN-6117-trunk.001.patch > > > The webapp directory for the SharedCacheManager is missing and the SCM fails > to start up with the following: > {noformat} > 2017-01-22 00:14:25,162 INFO org.apache.hadoop.service.AbstractService: > Service SharedCacheManager failed in state STARTED; cause: > org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server > org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server > at > org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:330) > at > org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:377) > at > org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:373) > at > org.apache.hadoop.yarn.server.sharedcachemanager.webapp.SCMWebServer.serviceStart(SCMWebServer.java:65) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:157) > Caused by: java.io.FileNotFoundException: webapps/sharedcache not found in > CLASSPATH > at > org.apache.hadoop.http.HttpServer2.getWebAppsPath(HttpServer2.java:972) > at org.apache.hadoop.http.HttpServer2.(HttpServer2.java:478) > at org.apache.hadoop.http.HttpServer2.(HttpServer2.java:117) > at > org.apache.hadoop.http.HttpServer2$Builder.build(HttpServer2.java:392) > at > org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:291) > ... 7 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6117) SharedCacheManager does not start up
[ https://issues.apache.org/jira/browse/YARN-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836482#comment-15836482 ] Chris Trezzo commented on YARN-6117: Thanks [~sjlee0]! I think it would be good to backport to branch-2 and hopefully branch-2.8 so that we have a working shared cache in future 2.x releases. Let me know what you think. > SharedCacheManager does not start up > > > Key: YARN-6117 > URL: https://issues.apache.org/jira/browse/YARN-6117 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3, 3.0.0-alpha2 >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Fix For: 3.0.0-alpha3 > > Attachments: YARN-6117-trunk.001.patch > > > The webapp directory for the SharedCacheManager is missing and the SCM fails > to start up with the following: > {noformat} > 2017-01-22 00:14:25,162 INFO org.apache.hadoop.service.AbstractService: > Service SharedCacheManager failed in state STARTED; cause: > org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server > org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server > at > org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:330) > at > org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:377) > at > org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:373) > at > org.apache.hadoop.yarn.server.sharedcachemanager.webapp.SCMWebServer.serviceStart(SCMWebServer.java:65) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:157) > Caused by: java.io.FileNotFoundException: webapps/sharedcache not found in > CLASSPATH > at > org.apache.hadoop.http.HttpServer2.getWebAppsPath(HttpServer2.java:972) > at org.apache.hadoop.http.HttpServer2.(HttpServer2.java:478) > at org.apache.hadoop.http.HttpServer2.(HttpServer2.java:117) > at > org.apache.hadoop.http.HttpServer2$Builder.build(HttpServer2.java:392) > at > org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:291) > ... 7 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3637) Handle localization sym-linking correctly at the YARN level
[ https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-3637: --- Attachment: YARN-3637-trunk.003.patch Thanks [~sjlee0]! Attached is a v3 patch that updates use so that it only appends a fragment when the resourceName is different than the path name provided by the shared cache. > Handle localization sym-linking correctly at the YARN level > --- > > Key: YARN-3637 > URL: https://issues.apache.org/jira/browse/YARN-3637 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-3637-trunk.001.patch, YARN-3637-trunk.002.patch, > YARN-3637-trunk.003.patch > > > The shared cache needs to handle resource sym-linking at the YARN layer. > Currently, we let the application layer (i.e. mapreduce) handle this, but it > is probably better for all applications if it is handled transparently. > Here is the scenario: > Imagine two separate jars (with unique checksums) that have the same name > job.jar. > They are stored in the shared cache as two separate resources: > checksum1/job.jar > checksum2/job.jar > A new application tries to use both of these resources, but internally refers > to them as different names: > foo.jar maps to checksum1 > bar.jar maps to checksum2 > When the shared cache returns the path to the resources, both resources are > named the same (i.e. job.jar). Because of this, when the resources are > localized one of them clobbers the other. This is because both symlinks in > the container_id directory are the same name (i.e. job.jar) even though they > point to two separate resource directories. > Originally we tackled this in the MapReduce client by using the fragment > portion of the resource url. This, however, seems like something that should > be solved at the YARN layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-3637) Handle localization sym-linking correctly at the YARN level
[ https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832580#comment-15832580 ] Chris Trezzo edited comment on YARN-3637 at 1/21/17 12:08 AM: -- -It might be better to overload the use() method instead of replacing it.- -[~templedf] Thinking about your previous comment some more, I may have missed your point the first time. I now realize that the overridden use method can simply honor the fragment portion of the url. If there is no fragment, then we can just use the original path's name as a new fragment to preserve the resource name. This can provide the same functionality without the extra parameter. I will fix the patch and post a new version. Let me know if you had something different in mind. Thanks again!- I spoke too soon again... I am back to my original thought [above|https://issues.apache.org/jira/browse/YARN-3637?focusedCommentId=15829214&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15829214]. The use method takes a checksum and an appId, so we still need some way of passing in the suggested name. I will leave the patch as is for now. was (Author: ctrezzo): -It might be better to overload the use() method instead of replacing it.- -[~templedf] Thinking about your previous comment some more, I may have missed your point the first time. I now realize that the overridden use method can simply honor the fragment portion of the url. If there is no fragment, then we can just use the original path's name as a new fragment to preserve the resource name. This can provide the same functionality without the extra parameter. I will fix the patch and post a new version. Let me know if you had something different in mind. Thanks again!- I spoke too soon again... I am back to my original thought above. The use method takes a checksum and an appId, so we still need some way of passing in the suggested name. I will leave the patch as is for now. > Handle localization sym-linking correctly at the YARN level > --- > > Key: YARN-3637 > URL: https://issues.apache.org/jira/browse/YARN-3637 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-3637-trunk.001.patch, YARN-3637-trunk.002.patch > > > The shared cache needs to handle resource sym-linking at the YARN layer. > Currently, we let the application layer (i.e. mapreduce) handle this, but it > is probably better for all applications if it is handled transparently. > Here is the scenario: > Imagine two separate jars (with unique checksums) that have the same name > job.jar. > They are stored in the shared cache as two separate resources: > checksum1/job.jar > checksum2/job.jar > A new application tries to use both of these resources, but internally refers > to them as different names: > foo.jar maps to checksum1 > bar.jar maps to checksum2 > When the shared cache returns the path to the resources, both resources are > named the same (i.e. job.jar). Because of this, when the resources are > localized one of them clobbers the other. This is because both symlinks in > the container_id directory are the same name (i.e. job.jar) even though they > point to two separate resource directories. > Originally we tackled this in the MapReduce client by using the fragment > portion of the resource url. This, however, seems like something that should > be solved at the YARN layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-3637) Handle localization sym-linking correctly at the YARN level
[ https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832580#comment-15832580 ] Chris Trezzo edited comment on YARN-3637 at 1/21/17 12:07 AM: -- -It might be better to overload the use() method instead of replacing it.- -[~templedf] Thinking about your previous comment some more, I may have missed your point the first time. I now realize that the overridden use method can simply honor the fragment portion of the url. If there is no fragment, then we can just use the original path's name as a new fragment to preserve the resource name. This can provide the same functionality without the extra parameter. I will fix the patch and post a new version. Let me know if you had something different in mind. Thanks again!- I spoke too soon again... I am back to my original thought above. The use method takes a checksum and an appId, so we still need some way of passing in the suggested name. I will leave the patch as is for now. was (Author: ctrezzo): bq. It might be better to overload the use() method instead of replacing it. [~templedf] Thinking about your previous comment some more, I may have missed your point the first time. I now realize that the overridden use method can simply honor the fragment portion of the url. If there is no fragment, then we can just use the original path's name as a new fragment to preserve the resource name. This can provide the same functionality without the extra parameter. I will fix the patch and post a new version. Let me know if you had something different in mind. Thanks again! > Handle localization sym-linking correctly at the YARN level > --- > > Key: YARN-3637 > URL: https://issues.apache.org/jira/browse/YARN-3637 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-3637-trunk.001.patch, YARN-3637-trunk.002.patch > > > The shared cache needs to handle resource sym-linking at the YARN layer. > Currently, we let the application layer (i.e. mapreduce) handle this, but it > is probably better for all applications if it is handled transparently. > Here is the scenario: > Imagine two separate jars (with unique checksums) that have the same name > job.jar. > They are stored in the shared cache as two separate resources: > checksum1/job.jar > checksum2/job.jar > A new application tries to use both of these resources, but internally refers > to them as different names: > foo.jar maps to checksum1 > bar.jar maps to checksum2 > When the shared cache returns the path to the resources, both resources are > named the same (i.e. job.jar). Because of this, when the resources are > localized one of them clobbers the other. This is because both symlinks in > the container_id directory are the same name (i.e. job.jar) even though they > point to two separate resource directories. > Originally we tackled this in the MapReduce client by using the fragment > portion of the resource url. This, however, seems like something that should > be solved at the YARN layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3637) Handle localization sym-linking correctly at the YARN level
[ https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832580#comment-15832580 ] Chris Trezzo commented on YARN-3637: bq. It might be better to overload the use() method instead of replacing it. [~templedf] Thinking about your previous comment some more, I may have missed your point the first time. I now realize that the overridden use method can simply honor the fragment portion of the url. If there is no fragment, then we can just use the original path's name as a new fragment to preserve the resource name. This can provide the same functionality without the extra parameter. I will fix the patch and post a new version. Let me know if you had something different in mind. Thanks again! > Handle localization sym-linking correctly at the YARN level > --- > > Key: YARN-3637 > URL: https://issues.apache.org/jira/browse/YARN-3637 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-3637-trunk.001.patch, YARN-3637-trunk.002.patch > > > The shared cache needs to handle resource sym-linking at the YARN layer. > Currently, we let the application layer (i.e. mapreduce) handle this, but it > is probably better for all applications if it is handled transparently. > Here is the scenario: > Imagine two separate jars (with unique checksums) that have the same name > job.jar. > They are stored in the shared cache as two separate resources: > checksum1/job.jar > checksum2/job.jar > A new application tries to use both of these resources, but internally refers > to them as different names: > foo.jar maps to checksum1 > bar.jar maps to checksum2 > When the shared cache returns the path to the resources, both resources are > named the same (i.e. job.jar). Because of this, when the resources are > localized one of them clobbers the other. This is because both symlinks in > the container_id directory are the same name (i.e. job.jar) even though they > point to two separate resource directories. > Originally we tackled this in the MapReduce client by using the fragment > portion of the resource url. This, however, seems like something that should > be solved at the YARN layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
[ https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15830957#comment-15830957 ] Chris Trezzo commented on YARN-5797: Bump for a review/commit. Thanks! > Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches > -- > > Key: YARN-5797 > URL: https://issues.apache.org/jira/browse/YARN-5797 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-5797-trunk.002.patch, YARN-5797-trunk-v1.patch > > > Add new metrics to the node manager around the local cache sizes and how much > is being cleaned from them on a regular bases. For example, we can expose > information contained in the {{LocalCacheCleanerStats}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3637) Handle localization sym-linking correctly at the YARN level
[ https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-3637: --- Attachment: YARN-3637-trunk.002.patch Thanks [~templedf] for the review! Attached is a v2 patch addressing your comments. The one thing that deviates from your suggestions is overriding the use method. I kept it the way it was, but adjusted the comments to make it seem required to provide a resourceName. From an API design standpoint, I tried to come up with a use case where someone would not want to specify a resource name. By not specifying a resource name the user is essentially saying: "I don't care what the resource is named and regardless of the name it will not conflict with any other resources the container localizes." The only situation I can come up with where that is true is if it is the only resource they are using in the container (i.e. the only symlink that gets created). Outside of this case, the path provided by the use method without a resourceName might create unintended behavior due to naming conflicts when YARN localization creates the container resource symlinks. I could add a null check for resourceName as well if we want to make this stronger. As for compatibility, this is an unstable api for a feature that is not GA yet, so hopefully it is OK to change the API. Let me know your thoughts around this. Thanks! > Handle localization sym-linking correctly at the YARN level > --- > > Key: YARN-3637 > URL: https://issues.apache.org/jira/browse/YARN-3637 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-3637-trunk.001.patch, YARN-3637-trunk.002.patch > > > The shared cache needs to handle resource sym-linking at the YARN layer. > Currently, we let the application layer (i.e. mapreduce) handle this, but it > is probably better for all applications if it is handled transparently. > Here is the scenario: > Imagine two separate jars (with unique checksums) that have the same name > job.jar. > They are stored in the shared cache as two separate resources: > checksum1/job.jar > checksum2/job.jar > A new application tries to use both of these resources, but internally refers > to them as different names: > foo.jar maps to checksum1 > bar.jar maps to checksum2 > When the shared cache returns the path to the resources, both resources are > named the same (i.e. job.jar). Because of this, when the resources are > localized one of them clobbers the other. This is because both symlinks in > the container_id directory are the same name (i.e. job.jar) even though they > point to two separate resource directories. > Originally we tackled this in the MapReduce client by using the fragment > portion of the resource url. This, however, seems like something that should > be solved at the YARN layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6097) Add support for directories in the Shared Cache
Chris Trezzo created YARN-6097: -- Summary: Add support for directories in the Shared Cache Key: YARN-6097 URL: https://issues.apache.org/jira/browse/YARN-6097 Project: Hadoop YARN Issue Type: Bug Reporter: Chris Trezzo Add support for directories in the shared cache. If a LocalResource URL points to a directory, the directory structure is preserved during localization on the node manager. Currently, the shared cache does not support directories and will fail to upload the URL to the cache if shouldBeUploadedToSharedCache is set to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3637) Handle localization sym-linking correctly at the YARN level
[ https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-3637: --- Attachment: YARN-3637-trunk.001.patch Attached is a v01 patch for handling symlink names and fragments as part of the shared cache yarn api. The major part of the patch adds a new parameter to the use api call. This allows a user to specify a preferred name for a resources even if the name of the resource in the shared cache is different. With this additional parameter, the user can avoid naming conflicts that happen when using resources from the shared cache. Note that this patch does not solve the existing problem in YARN where resource symlinks get clobbered if two resources are specified with the same name. Furthermore, this approach assumes the path returned is going to be used to create a LocalResource and is leveraging the way YARN localization uses the fragment portion of a URI. I think this makes it slightly easier for developers to implement shared cache support in their YARN application by abstracting away symlink/fragment management. Thoughts [~sjlee0] or anyone else? > Handle localization sym-linking correctly at the YARN level > --- > > Key: YARN-3637 > URL: https://issues.apache.org/jira/browse/YARN-3637 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-3637-trunk.001.patch > > > The shared cache needs to handle resource sym-linking at the YARN layer. > Currently, we let the application layer (i.e. mapreduce) handle this, but it > is probably better for all applications if it is handled transparently. > Here is the scenario: > Imagine two separate jars (with unique checksums) that have the same name > job.jar. > They are stored in the shared cache as two separate resources: > checksum1/job.jar > checksum2/job.jar > A new application tries to use both of these resources, but internally refers > to them as different names: > foo.jar maps to checksum1 > bar.jar maps to checksum2 > When the shared cache returns the path to the resources, both resources are > named the same (i.e. job.jar). Because of this, when the resources are > localized one of them clobbers the other. This is because both symlinks in > the container_id directory are the same name (i.e. job.jar) even though they > point to two separate resource directories. > Originally we tackled this in the MapReduce client by using the fragment > portion of the resource url. This, however, seems like something that should > be solved at the YARN layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
[ https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749064#comment-15749064 ] Chris Trezzo commented on YARN-5797: Jira filed: YARN-6004 > Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches > -- > > Key: YARN-5797 > URL: https://issues.apache.org/jira/browse/YARN-5797 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-5797-trunk-v1.patch, YARN-5797-trunk.002.patch > > > Add new metrics to the node manager around the local cache sizes and how much > is being cleaned from them on a regular bases. For example, we can expose > information contained in the {{LocalCacheCleanerStats}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6004) Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer so that it is less than 150 lines
Chris Trezzo created YARN-6004: -- Summary: Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer so that it is less than 150 lines Key: YARN-6004 URL: https://issues.apache.org/jira/browse/YARN-6004 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: Chris Trezzo Priority: Trivial The TestResourceLocalizationService#testDownloadingResourcesOnContainerKill method is over 150 lines: bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java:1128: @Test(timeout = 2):3: Method length is 242 lines (max allowed is 150). This method needs to be refactored and broken up into smaller methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
[ https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749056#comment-15749056 ] Chris Trezzo commented on YARN-5797: The checkstyle warning is because the {{TestResourceLocalizationService#testDownloadingResourcesOnContainerKill}} method is over 150 lines: bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java:1128: @Test(timeout = 2):3: Method length is 242 lines (max allowed is 150). This patch barely touches the method (i.e. changes 1 line) so I think it would be wrong to refactor this method as part of this patch. I will file a jira to address the method refactor. Thanks! > Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches > -- > > Key: YARN-5797 > URL: https://issues.apache.org/jira/browse/YARN-5797 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-5797-trunk-v1.patch, YARN-5797-trunk.002.patch > > > Add new metrics to the node manager around the local cache sizes and how much > is being cleaned from them on a regular bases. For example, we can expose > information contained in the {{LocalCacheCleanerStats}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
[ https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-5797: --- Attachment: YARN-5797-trunk.002.patch Attaching v2 to fix checkstyle issues. Once I get a +1 for the trunk patch I will create another version for branch-2. Thanks! > Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches > -- > > Key: YARN-5797 > URL: https://issues.apache.org/jira/browse/YARN-5797 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-5797-trunk-v1.patch, YARN-5797-trunk.002.patch > > > Add new metrics to the node manager around the local cache sizes and how much > is being cleaned from them on a regular bases. For example, we can expose > information contained in the {{LocalCacheCleanerStats}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
[ https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630595#comment-15630595 ] Chris Trezzo commented on YARN-5797: Note that the patch exposes the following metrics about the cache cleanup: # cacheSizeBeforeClean - The local cache size (public and private) before clean in Bytes # totalBytesDeleted - # of total bytes deleted from the public and private local cache # publicBytesDeleted - # of bytes deleted from the public local cache # privateBytesDeleted - # of bytes deleted from the private local cache {{LocalCacheCleanerStats}} also exposes the individual amounts deleted (in bytes) from each user private cache. I wasn't quite sure of a good way to expose this via metrics, so I left it out of the current patch. > Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches > -- > > Key: YARN-5797 > URL: https://issues.apache.org/jira/browse/YARN-5797 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-5797-trunk-v1.patch > > > Add new metrics to the node manager around the local cache sizes and how much > is being cleaned from them on a regular bases. For example, we can expose > information contained in the {{LocalCacheCleanerStats}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
[ https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624305#comment-15624305 ] Chris Trezzo edited comment on YARN-5797 at 11/1/16 4:27 AM: - Attaching v1 patch to get a qa run. Summary: # Added metrics to {{NodeManagerMetrics}} that will expose stats from {{LocalCacheCleanerStats}}. # Adjusted {{ResourceLocalizationService}} constructor to take a {{NodeManagerMetrics}} param. Also adjusted {{handleCacheCleanup}} to update the new metrics. # Adjusted {{TestLocalCacheCleanup}} to cover metrics as well. # Refactored other unit tests to adjust for change in {{ResourceLocalizationService}} constructor. was (Author: ctrezzo): Attaching v1 patch to get a qa run. Summary: # Added metrics that expose stats from {{LocalCacheCleanerStats}}. # Adjusted {{TestLocalCacheCleanup}} to cover metrics as well. # Refactored other unit tests to adjust for change in {{ResourceLocalizationService}} constructor to pass in {{NodeManagerMetrics}}. > Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches > -- > > Key: YARN-5797 > URL: https://issues.apache.org/jira/browse/YARN-5797 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-5797-trunk-v1.patch > > > Add new metrics to the node manager around the local cache sizes and how much > is being cleaned from them on a regular bases. For example, we can expose > information contained in the {{LocalCacheCleanerStats}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
[ https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-5797: --- Attachment: YARN-5797-trunk-v1.patch Attaching v1 patch to get a qa run. Summary: # Added metrics that expose stats from {{LocalCacheCleanerStats}}. # Adjusted {{TestLocalCacheCleanup}} to cover metrics as well. # Refactored other unit tests to adjust for change in {{ResourceLocalizationService}} constructor to pass in {{NodeManagerMetrics}}. > Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches > -- > > Key: YARN-5797 > URL: https://issues.apache.org/jira/browse/YARN-5797 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-5797-trunk-v1.patch > > > Add new metrics to the node manager around the local cache sizes and how much > is being cleaned from them on a regular bases. For example, we can expose > information contained in the {{LocalCacheCleanerStats}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches
[ https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-5767: --- Release Note: This issue fixes a bug in how resources are evicted from the PUBLIC and PRIVATE yarn local caches used by the node manager for resource localization. In summary, the caches are now properly cleaned based on an LRU policy across both the public and private caches. (was: This issue fixes a bug in how resources were evicted from the PUBLIC and PRIVATE yarn local caches used by the node manager for resource localization. In summary, the caches are now properly cleaned based on an LRU policy across both the public and private caches.) > Fix the order that resources are cleaned up from the local Public/Private > caches > > > Key: YARN-5767 > URL: https://issues.apache.org/jira/browse/YARN-5767 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1 >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Labels: oct16-easy > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: YARN-5767-trunk-v1.patch, YARN-5767-trunk-v2.patch, > YARN-5767-trunk-v3.patch, YARN-5767-trunk-v4.patch > > > If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can > see that public resources are added to the {{ResourceRetentionSet}} first > followed by private resources: > {code:java} > private void handleCacheCleanup(LocalizationEvent event) { > ResourceRetentionSet retain = > new ResourceRetentionSet(delService, cacheTargetSize); > retain.addResources(publicRsrc); > if (LOG.isDebugEnabled()) { > LOG.debug("Resource cleanup (public) " + retain); > } > for (LocalResourcesTracker t : privateRsrc.values()) { > retain.addResources(t); > if (LOG.isDebugEnabled()) { > LOG.debug("Resource cleanup " + t.getUser() + ":" + retain); > } > } > //TODO Check if appRsrcs should also be added to the retention set. > } > {code} > Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see > that this means public resources are deleted first until the target cache > size is met: > {code:java} > public void addResources(LocalResourcesTracker newTracker) { > for (LocalizedResource resource : newTracker) { > currentSize += resource.getSize(); > if (resource.getRefCount() > 0) { > // always retain resources in use > continue; > } > retain.put(resource, newTracker); > } > for (Iterator> i = > retain.entrySet().iterator(); >currentSize - delSize > targetSize && i.hasNext();) { > Map.Entry rsrc = i.next(); > LocalizedResource resource = rsrc.getKey(); > LocalResourcesTracker tracker = rsrc.getValue(); > if (tracker.remove(resource, delService)) { > delSize += resource.getSize(); > i.remove(); > } > } > } > {code} > The result of this is that resources in the private cache are only deleted in > the cases where: > # The cache size is larger than the target cache size and the public cache is > empty. > # The cache size is larger than the target cache size and everything in the > public cache is being used by a running container. > For clusters that primarily use the public cache (i.e. make use of the shared > cache), this means that the most commonly used resources can be deleted > before old resources in the private cache. Furthermore, the private cache can > continue to grow over time causing more and more churn in the public cache. > Additionally, the same problem exists within the private cache. Since > resources are added to the retention set on a user by user basis, resources > will get cleaned up one user at a time in the order that privateRsrc.values() > returns the LocalResourcesTracker. So if user1 has 10MB in their cache and > user2 has 100MB in their cache and the target size of the cache is 50MB, > user1 could potentially have their entire cache removed before anything is > deleted from the user2 cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches
[ https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615991#comment-15615991 ] Chris Trezzo commented on YARN-5767: I have filed a followup jira for the metrics: YARN-5797 > Fix the order that resources are cleaned up from the local Public/Private > caches > > > Key: YARN-5767 > URL: https://issues.apache.org/jira/browse/YARN-5767 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1 >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Labels: oct16-easy > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: YARN-5767-trunk-v1.patch, YARN-5767-trunk-v2.patch, > YARN-5767-trunk-v3.patch, YARN-5767-trunk-v4.patch > > > If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can > see that public resources are added to the {{ResourceRetentionSet}} first > followed by private resources: > {code:java} > private void handleCacheCleanup(LocalizationEvent event) { > ResourceRetentionSet retain = > new ResourceRetentionSet(delService, cacheTargetSize); > retain.addResources(publicRsrc); > if (LOG.isDebugEnabled()) { > LOG.debug("Resource cleanup (public) " + retain); > } > for (LocalResourcesTracker t : privateRsrc.values()) { > retain.addResources(t); > if (LOG.isDebugEnabled()) { > LOG.debug("Resource cleanup " + t.getUser() + ":" + retain); > } > } > //TODO Check if appRsrcs should also be added to the retention set. > } > {code} > Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see > that this means public resources are deleted first until the target cache > size is met: > {code:java} > public void addResources(LocalResourcesTracker newTracker) { > for (LocalizedResource resource : newTracker) { > currentSize += resource.getSize(); > if (resource.getRefCount() > 0) { > // always retain resources in use > continue; > } > retain.put(resource, newTracker); > } > for (Iterator> i = > retain.entrySet().iterator(); >currentSize - delSize > targetSize && i.hasNext();) { > Map.Entry rsrc = i.next(); > LocalizedResource resource = rsrc.getKey(); > LocalResourcesTracker tracker = rsrc.getValue(); > if (tracker.remove(resource, delService)) { > delSize += resource.getSize(); > i.remove(); > } > } > } > {code} > The result of this is that resources in the private cache are only deleted in > the cases where: > # The cache size is larger than the target cache size and the public cache is > empty. > # The cache size is larger than the target cache size and everything in the > public cache is being used by a running container. > For clusters that primarily use the public cache (i.e. make use of the shared > cache), this means that the most commonly used resources can be deleted > before old resources in the private cache. Furthermore, the private cache can > continue to grow over time causing more and more churn in the public cache. > Additionally, the same problem exists within the private cache. Since > resources are added to the retention set on a user by user basis, resources > will get cleaned up one user at a time in the order that privateRsrc.values() > returns the LocalResourcesTracker. So if user1 has 10MB in their cache and > user2 has 100MB in their cache and the target size of the cache is 50MB, > user1 could potentially have their entire cache removed before anything is > deleted from the user2 cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
Chris Trezzo created YARN-5797: -- Summary: Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches Key: YARN-5797 URL: https://issues.apache.org/jira/browse/YARN-5797 Project: Hadoop YARN Issue Type: Improvement Reporter: Chris Trezzo Assignee: Chris Trezzo Add new metrics to the node manager around the local cache sizes and how much is being cleaned from them on a regular bases. For example, we can expose information contained in the {{LocalCacheCleanerStats}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches
[ https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-5767: --- Release Note: This issue fixes a bug in how resources were evicted from the PUBLIC and PRIVATE yarn local caches used by the node manager for resource localization. In summary, the caches are now properly cleaned based on an LRU policy across both the public and private caches. > Fix the order that resources are cleaned up from the local Public/Private > caches > > > Key: YARN-5767 > URL: https://issues.apache.org/jira/browse/YARN-5767 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1 >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Labels: oct16-easy > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: YARN-5767-trunk-v1.patch, YARN-5767-trunk-v2.patch, > YARN-5767-trunk-v3.patch, YARN-5767-trunk-v4.patch > > > If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can > see that public resources are added to the {{ResourceRetentionSet}} first > followed by private resources: > {code:java} > private void handleCacheCleanup(LocalizationEvent event) { > ResourceRetentionSet retain = > new ResourceRetentionSet(delService, cacheTargetSize); > retain.addResources(publicRsrc); > if (LOG.isDebugEnabled()) { > LOG.debug("Resource cleanup (public) " + retain); > } > for (LocalResourcesTracker t : privateRsrc.values()) { > retain.addResources(t); > if (LOG.isDebugEnabled()) { > LOG.debug("Resource cleanup " + t.getUser() + ":" + retain); > } > } > //TODO Check if appRsrcs should also be added to the retention set. > } > {code} > Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see > that this means public resources are deleted first until the target cache > size is met: > {code:java} > public void addResources(LocalResourcesTracker newTracker) { > for (LocalizedResource resource : newTracker) { > currentSize += resource.getSize(); > if (resource.getRefCount() > 0) { > // always retain resources in use > continue; > } > retain.put(resource, newTracker); > } > for (Iterator> i = > retain.entrySet().iterator(); >currentSize - delSize > targetSize && i.hasNext();) { > Map.Entry rsrc = i.next(); > LocalizedResource resource = rsrc.getKey(); > LocalResourcesTracker tracker = rsrc.getValue(); > if (tracker.remove(resource, delService)) { > delSize += resource.getSize(); > i.remove(); > } > } > } > {code} > The result of this is that resources in the private cache are only deleted in > the cases where: > # The cache size is larger than the target cache size and the public cache is > empty. > # The cache size is larger than the target cache size and everything in the > public cache is being used by a running container. > For clusters that primarily use the public cache (i.e. make use of the shared > cache), this means that the most commonly used resources can be deleted > before old resources in the private cache. Furthermore, the private cache can > continue to grow over time causing more and more churn in the public cache. > Additionally, the same problem exists within the private cache. Since > resources are added to the retention set on a user by user basis, resources > will get cleaned up one user at a time in the order that privateRsrc.values() > returns the LocalResourcesTracker. So if user1 has 10MB in their cache and > user2 has 100MB in their cache and the target size of the cache is 50MB, > user1 could potentially have their entire cache removed before anything is > deleted from the user2 cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches
[ https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615974#comment-15615974 ] Chris Trezzo commented on YARN-5767: Thanks [~jlowe]! I will add release notes to the issue as well. > Fix the order that resources are cleaned up from the local Public/Private > caches > > > Key: YARN-5767 > URL: https://issues.apache.org/jira/browse/YARN-5767 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1 >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Labels: oct16-easy > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: YARN-5767-trunk-v1.patch, YARN-5767-trunk-v2.patch, > YARN-5767-trunk-v3.patch, YARN-5767-trunk-v4.patch > > > If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can > see that public resources are added to the {{ResourceRetentionSet}} first > followed by private resources: > {code:java} > private void handleCacheCleanup(LocalizationEvent event) { > ResourceRetentionSet retain = > new ResourceRetentionSet(delService, cacheTargetSize); > retain.addResources(publicRsrc); > if (LOG.isDebugEnabled()) { > LOG.debug("Resource cleanup (public) " + retain); > } > for (LocalResourcesTracker t : privateRsrc.values()) { > retain.addResources(t); > if (LOG.isDebugEnabled()) { > LOG.debug("Resource cleanup " + t.getUser() + ":" + retain); > } > } > //TODO Check if appRsrcs should also be added to the retention set. > } > {code} > Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see > that this means public resources are deleted first until the target cache > size is met: > {code:java} > public void addResources(LocalResourcesTracker newTracker) { > for (LocalizedResource resource : newTracker) { > currentSize += resource.getSize(); > if (resource.getRefCount() > 0) { > // always retain resources in use > continue; > } > retain.put(resource, newTracker); > } > for (Iterator> i = > retain.entrySet().iterator(); >currentSize - delSize > targetSize && i.hasNext();) { > Map.Entry rsrc = i.next(); > LocalizedResource resource = rsrc.getKey(); > LocalResourcesTracker tracker = rsrc.getValue(); > if (tracker.remove(resource, delService)) { > delSize += resource.getSize(); > i.remove(); > } > } > } > {code} > The result of this is that resources in the private cache are only deleted in > the cases where: > # The cache size is larger than the target cache size and the public cache is > empty. > # The cache size is larger than the target cache size and everything in the > public cache is being used by a running container. > For clusters that primarily use the public cache (i.e. make use of the shared > cache), this means that the most commonly used resources can be deleted > before old resources in the private cache. Furthermore, the private cache can > continue to grow over time causing more and more churn in the public cache. > Additionally, the same problem exists within the private cache. Since > resources are added to the retention set on a user by user basis, resources > will get cleaned up one user at a time in the order that privateRsrc.values() > returns the LocalResourcesTracker. So if user1 has 10MB in their cache and > user2 has 100MB in their cache and the target size of the cache is 50MB, > user1 could potentially have their entire cache removed before anything is > deleted from the user2 cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4950) configure parallel-tests for yarn-client and yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-4950: --- Labels: oct16-hard (was: oct16-easy) > configure parallel-tests for yarn-client and yarn-server-resourcemanager > > > Key: YARN-4950 > URL: https://issues.apache.org/jira/browse/YARN-4950 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 3.0.0-alpha1 >Reporter: Allen Wittenauer >Priority: Critical > Labels: oct16-hard > Attachments: YARN-4950.00.patch > > > Unit tests for yarn-client and yarn-server-resourcemanager take over an hour > each. The parallel-tests profile should be configured to reduce the > execution time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5027) NM should clean up app log dirs after NM restart
[ https://issues.apache.org/jira/browse/YARN-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613303#comment-15613303 ] Chris Trezzo commented on YARN-5027: bq. The exists check is on the root log dir, not the app log dirs. Ah. Gotcha. +1 from me as well. > NM should clean up app log dirs after NM restart > - > > Key: YARN-5027 > URL: https://issues.apache.org/jira/browse/YARN-5027 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: sandflee >Assignee: sandflee > Labels: oct16-easy > Attachments: YARN-5027.01.patch > > > If nm start without recovery enabled, there may be many deprecated app log > dir in NM log dirs, NM should clean up them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5027) NM should clean up app log dirs after NM restart
[ https://issues.apache.org/jira/browse/YARN-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613251#comment-15613251 ] Chris Trezzo edited comment on YARN-5027 at 10/27/16 9:28 PM: -- Does the current patch have the potential to leak {{application\_\*\_\*\_DEL\_\*}} directories? In {{ResourceLocalizationService#cleanupLogDir}} would we want to check if the renamed \_DEL\_ directory exists in the case where the original log directory doesn't exist and delete if necessary? was (Author: ctrezzo): Does the current patch have the potential to leak {{application\_\*\_\*\_DEL\_\*}} directories? In {{ResourceLocalizationService#cleanupLogDir}} would we want to check if the renamed _DEL_ directory exists in the case where the original log directory doesn't exist and delete if necessary? > NM should clean up app log dirs after NM restart > - > > Key: YARN-5027 > URL: https://issues.apache.org/jira/browse/YARN-5027 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: sandflee >Assignee: sandflee > Labels: oct16-easy > Attachments: YARN-5027.01.patch > > > If nm start without recovery enabled, there may be many deprecated app log > dir in NM log dirs, NM should clean up them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5027) NM should clean up app log dirs after NM restart
[ https://issues.apache.org/jira/browse/YARN-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613251#comment-15613251 ] Chris Trezzo commented on YARN-5027: Does the current patch have the potential to leak {{application_*_*_DEL_*}} directories? In {{ResourceLocalizationService#cleanupLogDir}} would we want to check if the renamed _DEL_ directory exists in the case where the original log directory doesn't exist and delete if necessary? > NM should clean up app log dirs after NM restart > - > > Key: YARN-5027 > URL: https://issues.apache.org/jira/browse/YARN-5027 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: sandflee >Assignee: sandflee > Labels: oct16-easy > Attachments: YARN-5027.01.patch > > > If nm start without recovery enabled, there may be many deprecated app log > dir in NM log dirs, NM should clean up them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5027) NM should clean up app log dirs after NM restart
[ https://issues.apache.org/jira/browse/YARN-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613251#comment-15613251 ] Chris Trezzo edited comment on YARN-5027 at 10/27/16 9:20 PM: -- Does the current patch have the potential to leak {{application\_\*\_\*\_DEL\_\*}} directories? In {{ResourceLocalizationService#cleanupLogDir}} would we want to check if the renamed _DEL_ directory exists in the case where the original log directory doesn't exist and delete if necessary? was (Author: ctrezzo): Does the current patch have the potential to leak {{application_*_*_DEL_*}} directories? In {{ResourceLocalizationService#cleanupLogDir}} would we want to check if the renamed _DEL_ directory exists in the case where the original log directory doesn't exist and delete if necessary? > NM should clean up app log dirs after NM restart > - > > Key: YARN-5027 > URL: https://issues.apache.org/jira/browse/YARN-5027 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: sandflee >Assignee: sandflee > Labels: oct16-easy > Attachments: YARN-5027.01.patch > > > If nm start without recovery enabled, there may be many deprecated app log > dir in NM log dirs, NM should clean up them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5258) Document Use of Docker with LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-5258: --- Labels: oct-16-easy (was: ) > Document Use of Docker with LinuxContainerExecutor > -- > > Key: YARN-5258 > URL: https://issues.apache.org/jira/browse/YARN-5258 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Labels: oct-16-easy > Attachments: YARN-5258.001.patch, YARN-5258.002.patch > > > There aren't currently any docs that explain how to configure Docker and all > of its various options aside from reading all of the JIRAs. We need to > document the configuration, use, and troubleshooting, along with helpful > examples. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4950) configure parallel-tests for yarn-client and yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-4950: --- Labels: oct-16-easy (was: ) > configure parallel-tests for yarn-client and yarn-server-resourcemanager > > > Key: YARN-4950 > URL: https://issues.apache.org/jira/browse/YARN-4950 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 3.0.0-alpha1 >Reporter: Allen Wittenauer >Priority: Critical > Labels: oct-16-easy > Attachments: YARN-4950.00.patch > > > Unit tests for yarn-client and yarn-server-resourcemanager take over an hour > each. The parallel-tests profile should be configured to reduce the > execution time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4948) Support node labels store in zookeeper
[ https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-4948: --- Labels: oct16-hard (was: ) > Support node labels store in zookeeper > -- > > Key: YARN-4948 > URL: https://issues.apache.org/jira/browse/YARN-4948 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: jialei weng >Assignee: jialei weng > Labels: oct16-hard > Attachments: YARN-4948.001.patch, YARN-4948.002.patch, > YARN-4948.003.patch, YARN-4948.006.patch, YARN-4948.007.patch > > > Support node labels store in zookeeper. The main scenario for this is to give > a way to decouple yarn with HDFS. Since nodelabel is a very important data > for yarn, if hdfs down, yarn will fail to start up,too. So it is meaningful > for make yarn much independence when user serve both yarn and HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4907) Make all MockRM#waitForState consistent.
[ https://issues.apache.org/jira/browse/YARN-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-4907: --- Component/s: resourcemanager > Make all MockRM#waitForState consistent. > - > > Key: YARN-4907 > URL: https://issues.apache.org/jira/browse/YARN-4907 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Yufei Gu >Assignee: Yufei Gu > Labels: oct16-medium > Attachments: YARN-4907.001.patch > > > There are some inconsistencies among these {{waitForState}} in {{MockRM}}: > 1. Some {{waitForState}} return a boolean while others don't. > 2. Some {{waitForState}} don't have a timeout, they can wait for ever. > 3. Some {{waitForState}} use LOG.info and others use {{System.out.println}} > to print messages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4907) Make all MockRM#waitForState consistent.
[ https://issues.apache.org/jira/browse/YARN-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-4907: --- Labels: oct16-medium (was: ) > Make all MockRM#waitForState consistent. > - > > Key: YARN-4907 > URL: https://issues.apache.org/jira/browse/YARN-4907 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Yufei Gu >Assignee: Yufei Gu > Labels: oct16-medium > Attachments: YARN-4907.001.patch > > > There are some inconsistencies among these {{waitForState}} in {{MockRM}}: > 1. Some {{waitForState}} return a boolean while others don't. > 2. Some {{waitForState}} don't have a timeout, they can wait for ever. > 3. Some {{waitForState}} use LOG.info and others use {{System.out.println}} > to print messages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4900) SLS MRAMSimulator should include scheduledMappers/Reducers when re-request failed tasks
[ https://issues.apache.org/jira/browse/YARN-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-4900: --- Labels: oct16-medium (was: ) > SLS MRAMSimulator should include scheduledMappers/Reducers when re-request > failed tasks > --- > > Key: YARN-4900 > URL: https://issues.apache.org/jira/browse/YARN-4900 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Reporter: Wangda Tan >Assignee: Wangda Tan > Labels: oct16-medium > Attachments: YARN-4900.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4900) SLS MRAMSimulator should include scheduledMappers/Reducers when re-request failed tasks
[ https://issues.apache.org/jira/browse/YARN-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-4900: --- Component/s: scheduler-load-simulator > SLS MRAMSimulator should include scheduledMappers/Reducers when re-request > failed tasks > --- > > Key: YARN-4900 > URL: https://issues.apache.org/jira/browse/YARN-4900 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4900.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4899) Queue metrics of SLS capacity scheduler only activated after app submit to the queue
[ https://issues.apache.org/jira/browse/YARN-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-4899: --- Labels: oct16-medium (was: ) Component/s: capacity scheduler > Queue metrics of SLS capacity scheduler only activated after app submit to > the queue > > > Key: YARN-4899 > URL: https://issues.apache.org/jira/browse/YARN-4899 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Labels: oct16-medium > Attachments: YARN-4899.1.patch > > > We should start recording queue metrics since cluster start. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches
[ https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610147#comment-15610147 ] Chris Trezzo commented on YARN-5767: Two notes: # Instead of removing {{LocalCacheCleanerStats.getUserDelSizes()}} I made it return an unmodifiable map. That way, users of the class still have access to the data (outside of the toString method) and it is still protected. # I wound up removing both {{LRUComparator.equals}} and {{LRUComparator.hashCode}}. I figure we don't need to override them since the methods were just using the default implementation anyways. My intention is to file a followup jira that adds metrics that expose the statistics from {{LocalCacheCleanerStats}}. > Fix the order that resources are cleaned up from the local Public/Private > caches > > > Key: YARN-5767 > URL: https://issues.apache.org/jira/browse/YARN-5767 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1 >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-5767-trunk-v1.patch, YARN-5767-trunk-v2.patch, > YARN-5767-trunk-v3.patch, YARN-5767-trunk-v4.patch > > > If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can > see that public resources are added to the {{ResourceRetentionSet}} first > followed by private resources: > {code:java} > private void handleCacheCleanup(LocalizationEvent event) { > ResourceRetentionSet retain = > new ResourceRetentionSet(delService, cacheTargetSize); > retain.addResources(publicRsrc); > if (LOG.isDebugEnabled()) { > LOG.debug("Resource cleanup (public) " + retain); > } > for (LocalResourcesTracker t : privateRsrc.values()) { > retain.addResources(t); > if (LOG.isDebugEnabled()) { > LOG.debug("Resource cleanup " + t.getUser() + ":" + retain); > } > } > //TODO Check if appRsrcs should also be added to the retention set. > } > {code} > Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see > that this means public resources are deleted first until the target cache > size is met: > {code:java} > public void addResources(LocalResourcesTracker newTracker) { > for (LocalizedResource resource : newTracker) { > currentSize += resource.getSize(); > if (resource.getRefCount() > 0) { > // always retain resources in use > continue; > } > retain.put(resource, newTracker); > } > for (Iterator> i = > retain.entrySet().iterator(); >currentSize - delSize > targetSize && i.hasNext();) { > Map.Entry rsrc = i.next(); > LocalizedResource resource = rsrc.getKey(); > LocalResourcesTracker tracker = rsrc.getValue(); > if (tracker.remove(resource, delService)) { > delSize += resource.getSize(); > i.remove(); > } > } > } > {code} > The result of this is that resources in the private cache are only deleted in > the cases where: > # The cache size is larger than the target cache size and the public cache is > empty. > # The cache size is larger than the target cache size and everything in the > public cache is being used by a running container. > For clusters that primarily use the public cache (i.e. make use of the shared > cache), this means that the most commonly used resources can be deleted > before old resources in the private cache. Furthermore, the private cache can > continue to grow over time causing more and more churn in the public cache. > Additionally, the same problem exists within the private cache. Since > resources are added to the retention set on a user by user basis, resources > will get cleaned up one user at a time in the order that privateRsrc.values() > returns the LocalResourcesTracker. So if user1 has 10MB in their cache and > user2 has 100MB in their cache and the target size of the cache is 50MB, > user1 could potentially have their entire cache removed before anything is > deleted from the user2 cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches
[ https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-5767: --- Attachment: YARN-5767-trunk-v4.patch Attached is a v4 patch for trunk addressing all comments from the reviews. Thanks! > Fix the order that resources are cleaned up from the local Public/Private > caches > > > Key: YARN-5767 > URL: https://issues.apache.org/jira/browse/YARN-5767 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1 >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-5767-trunk-v1.patch, YARN-5767-trunk-v2.patch, > YARN-5767-trunk-v3.patch, YARN-5767-trunk-v4.patch > > > If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can > see that public resources are added to the {{ResourceRetentionSet}} first > followed by private resources: > {code:java} > private void handleCacheCleanup(LocalizationEvent event) { > ResourceRetentionSet retain = > new ResourceRetentionSet(delService, cacheTargetSize); > retain.addResources(publicRsrc); > if (LOG.isDebugEnabled()) { > LOG.debug("Resource cleanup (public) " + retain); > } > for (LocalResourcesTracker t : privateRsrc.values()) { > retain.addResources(t); > if (LOG.isDebugEnabled()) { > LOG.debug("Resource cleanup " + t.getUser() + ":" + retain); > } > } > //TODO Check if appRsrcs should also be added to the retention set. > } > {code} > Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see > that this means public resources are deleted first until the target cache > size is met: > {code:java} > public void addResources(LocalResourcesTracker newTracker) { > for (LocalizedResource resource : newTracker) { > currentSize += resource.getSize(); > if (resource.getRefCount() > 0) { > // always retain resources in use > continue; > } > retain.put(resource, newTracker); > } > for (Iterator> i = > retain.entrySet().iterator(); >currentSize - delSize > targetSize && i.hasNext();) { > Map.Entry rsrc = i.next(); > LocalizedResource resource = rsrc.getKey(); > LocalResourcesTracker tracker = rsrc.getValue(); > if (tracker.remove(resource, delService)) { > delSize += resource.getSize(); > i.remove(); > } > } > } > {code} > The result of this is that resources in the private cache are only deleted in > the cases where: > # The cache size is larger than the target cache size and the public cache is > empty. > # The cache size is larger than the target cache size and everything in the > public cache is being used by a running container. > For clusters that primarily use the public cache (i.e. make use of the shared > cache), this means that the most commonly used resources can be deleted > before old resources in the private cache. Furthermore, the private cache can > continue to grow over time causing more and more churn in the public cache. > Additionally, the same problem exists within the private cache. Since > resources are added to the retention set on a user by user basis, resources > will get cleaned up one user at a time in the order that privateRsrc.values() > returns the LocalResourcesTracker. So if user1 has 10MB in their cache and > user2 has 100MB in their cache and the target size of the cache is 50MB, > user1 could potentially have their entire cache removed before anything is > deleted from the user2 cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org