[jira] [Commented] (YARN-5727) Improve YARN shared cache support for LinuxContainerExecutor

2018-08-13 Thread Chris Trezzo (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578769#comment-16578769
 ] 

Chris Trezzo commented on YARN-5727:


Assigning to [~wzzdreamer]. He will start working on a new draft for the design 
doc.

> Improve YARN shared cache support for LinuxContainerExecutor
> 
>
> Key: YARN-5727
> URL: https://issues.apache.org/jira/browse/YARN-5727
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: zhenzhao wang
>Priority: Major
> Attachments: YARN-5727-Design-v1.pdf
>
>
> When running LinuxContainerExecutor in a secure mode 
> ({{yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users}} set 
> to {{false}}), all localized files are owned by the user that owns the 
> container which localized the resource. This presents a problem for the 
> shared cache when a YARN application requests a resource to be uploaded to 
> the shared cache that has a non-public visibility. The shared cache uploader 
> (running as the node manager user) does not have access to the localized 
> files and can not compute the checksum of the file or upload it to the cache. 
> The solution should ideally satisfy the following three requirements:
> # Localized files should still be safe/secure. Other users that run 
> containers should not be able to modify, or delete the publicly localized 
> files of others.
> # The node manager user should be able to access these files for the purpose 
> of checksumming and uploading to the shared cache without being a privileged 
> user.
> # The solution should avoid making unnecessary copies of the localized files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5727) Improve YARN shared cache support for LinuxContainerExecutor

2018-08-13 Thread Chris Trezzo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo reassigned YARN-5727:
--

Assignee: zhenzhao wang

> Improve YARN shared cache support for LinuxContainerExecutor
> 
>
> Key: YARN-5727
> URL: https://issues.apache.org/jira/browse/YARN-5727
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: zhenzhao wang
>Priority: Major
> Attachments: YARN-5727-Design-v1.pdf
>
>
> When running LinuxContainerExecutor in a secure mode 
> ({{yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users}} set 
> to {{false}}), all localized files are owned by the user that owns the 
> container which localized the resource. This presents a problem for the 
> shared cache when a YARN application requests a resource to be uploaded to 
> the shared cache that has a non-public visibility. The shared cache uploader 
> (running as the node manager user) does not have access to the localized 
> files and can not compute the checksum of the file or upload it to the cache. 
> The solution should ideally satisfy the following three requirements:
> # Localized files should still be safe/secure. Other users that run 
> containers should not be able to modify, or delete the publicly localized 
> files of others.
> # The node manager user should be able to access these files for the purpose 
> of checksumming and uploading to the shared cache without being a privileged 
> user.
> # The solution should avoid making unnecessary copies of the localized files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5727) Improve YARN shared cache support for LinuxContainerExecutor

2018-08-13 Thread Chris Trezzo (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578701#comment-16578701
 ] 

Chris Trezzo commented on YARN-5727:


As stated above, I no longer think that the v1 design attached is the right 
idea. It takes the approach of assuming that the permissions issue is a problem 
at the YARN layer. In fact, I think this is more a problem with the way map 
reduce supports the shared cache. Currently the shared cache only supports 
public resources (i.e. all resources uploaded to the shared cache will be world 
readable). The problem is that MapReduce is localizing all of the job resources 
into the user cache instead of a public one. YARN is then put in a place where 
it is essentially changing permissions for some resources. Ideally, MapReduce 
would initially localize resources intended for the shared cache into the 
public cache to begin with. This would allow the shared cache uploader to 
checksum and upload the resources even in the case of linux container executor.

> Improve YARN shared cache support for LinuxContainerExecutor
> 
>
> Key: YARN-5727
> URL: https://issues.apache.org/jira/browse/YARN-5727
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Priority: Major
> Attachments: YARN-5727-Design-v1.pdf
>
>
> When running LinuxContainerExecutor in a secure mode 
> ({{yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users}} set 
> to {{false}}), all localized files are owned by the user that owns the 
> container which localized the resource. This presents a problem for the 
> shared cache when a YARN application requests a resource to be uploaded to 
> the shared cache that has a non-public visibility. The shared cache uploader 
> (running as the node manager user) does not have access to the localized 
> files and can not compute the checksum of the file or upload it to the cache. 
> The solution should ideally satisfy the following three requirements:
> # Localized files should still be safe/secure. Other users that run 
> containers should not be able to modify, or delete the publicly localized 
> files of others.
> # The node manager user should be able to access these files for the purpose 
> of checksumming and uploading to the shared cache without being a privileged 
> user.
> # The solution should avoid making unnecessary copies of the localized files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7694) Optionally run shared cache manager as part of the resource manager

2018-01-02 Thread Chris Trezzo (JIRA)
Chris Trezzo created YARN-7694:
--

 Summary: Optionally run shared cache manager as part of the 
resource manager
 Key: YARN-7694
 URL: https://issues.apache.org/jira/browse/YARN-7694
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo


Currently the shared cache manager is its own stand-alone daemon. It is a YARN 
composite service. Ideally, the shared cache manager could optionally be run as 
part of the resource manager. This way administrators would have to manage one 
less daemon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2017-10-05 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo resolved YARN-1492.

   Resolution: Fixed
Fix Version/s: 3.0.0
   2.9.0

I am resolving this issue as all the parts are committed to trunk, branch-3.0 
and branch-2. Thanks everyone who helped with reviews and design feedback! I am 
glad to see this feature reach this milestone. I am also looking forward to 
collaborating on shared cache phase two features in YARN-7282.

> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
> Fix For: 2.9.0, 3.0.0
>
> Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf, 
> YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, 
> YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, 
> YARN-1492-all-trunk-v5.patch
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2960) Add documentation for the YARN shared cache

2017-10-05 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193397#comment-16193397
 ] 

Chris Trezzo commented on YARN-2960:


Committed to trunk, branch-3.0, branch-2. Thanks!

> Add documentation for the YARN shared cache
> ---
>
> Key: YARN-2960
> URL: https://issues.apache.org/jira/browse/YARN-2960
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2960-trunk-001.patch, YARN-2960-trunk-002.patch, 
> YARN-2960-trunk-003.patch, YARN-2960-trunk-004.patch
>
>
> Add documentation around the architecture, api's and administration of the 
> YARN shared cache.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-2960) Add documentation for the YARN shared cache

2017-10-05 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193371#comment-16193371
 ] 

Chris Trezzo edited comment on YARN-2960 at 10/5/17 6:19 PM:
-

I unfortunately made a small mistake while cherry-picking the patch to 
branch-3.0. I reverted the cherry-pick and then cherry-picked it again. Because 
of this there are three commits in the log:
{noformat}
commit e5af16cf6cd54c8358af066d1ec677378bc3029d
Author: Chris Trezzo 
Date:   Thu Oct 5 10:38:41 2017 -0700

YARN-2960. Add documentation for the YARN shared cache.

(cherry picked from commit 7e76f85bc68166b01b51fcf6ba4b3fd9281d4a03)

commit 4b9b66f921c671b6426d1bc912cca056cb2532c4
Author: Chris Trezzo 
Date:   Thu Oct 5 11:11:02 2017 -0700

Revert "YARN-2960. Add documentation for the YARN shared cache."

This reverts commit 54a01c28cc153872aa7eed68000ab0ddf010054a.

commit 54a01c28cc153872aa7eed68000ab0ddf010054a
Author: Chris Trezzo 
Date:   Thu Oct 5 10:38:41 2017 -0700

YARN-2960. Add documentation for the YARN shared cache.

(cherry picked from commit 7e76f85bc68166b01b51fcf6ba4b3fd9281d4a03)
{noformat}

Everything is correct now.


was (Author: ctrezzo):
I unfortunately made a small mistake while cherry-picking the patch to 
branch-3.0. I reverted the cherry-pick and then cherry-picked it again. Because 
of this there are three commits in the log:
{noformat}
commit e5af16cf6cd54c8358af066d1ec677378bc3029d
Author: Chris Trezzo 
Date:   Thu Oct 5 10:38:41 2017 -0700

YARN-2960. Add documentation for the YARN shared cache.

(cherry picked from commit 7e76f85bc68166b01b51fcf6ba4b3fd9281d4a03)

commit 4b9b66f921c671b6426d1bc912cca056cb2532c4
Author: Chris Trezzo 
Date:   Thu Oct 5 11:11:02 2017 -0700

Revert "YARN-2960. Add documentation for the YARN shared cache."

This reverts commit 54a01c28cc153872aa7eed68000ab0ddf010054a.

commit 54a01c28cc153872aa7eed68000ab0ddf010054a
Author: Chris Trezzo 
Date:   Thu Oct 5 10:38:41 2017 -0700

YARN-2960. Add documentation for the YARN shared cache.

(cherry picked from commit 7e76f85bc68166b01b51fcf6ba4b3fd9281d4a03)
{noformat}

> Add documentation for the YARN shared cache
> ---
>
> Key: YARN-2960
> URL: https://issues.apache.org/jira/browse/YARN-2960
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2960-trunk-001.patch, YARN-2960-trunk-002.patch, 
> YARN-2960-trunk-003.patch, YARN-2960-trunk-004.patch
>
>
> Add documentation around the architecture, api's and administration of the 
> YARN shared cache.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2960) Add documentation for the YARN shared cache

2017-10-05 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193371#comment-16193371
 ] 

Chris Trezzo commented on YARN-2960:


I unfortunately made a small mistake while cherry-picking the patch to 
branch-3.0. I reverted the cherry-pick and then cherry-picked it again. Because 
of this there are three commits in the log:
{noformat}
commit e5af16cf6cd54c8358af066d1ec677378bc3029d
Author: Chris Trezzo 
Date:   Thu Oct 5 10:38:41 2017 -0700

YARN-2960. Add documentation for the YARN shared cache.

(cherry picked from commit 7e76f85bc68166b01b51fcf6ba4b3fd9281d4a03)

commit 4b9b66f921c671b6426d1bc912cca056cb2532c4
Author: Chris Trezzo 
Date:   Thu Oct 5 11:11:02 2017 -0700

Revert "YARN-2960. Add documentation for the YARN shared cache."

This reverts commit 54a01c28cc153872aa7eed68000ab0ddf010054a.

commit 54a01c28cc153872aa7eed68000ab0ddf010054a
Author: Chris Trezzo 
Date:   Thu Oct 5 10:38:41 2017 -0700

YARN-2960. Add documentation for the YARN shared cache.

(cherry picked from commit 7e76f85bc68166b01b51fcf6ba4b3fd9281d4a03)
{noformat}

> Add documentation for the YARN shared cache
> ---
>
> Key: YARN-2960
> URL: https://issues.apache.org/jira/browse/YARN-2960
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2960-trunk-001.patch, YARN-2960-trunk-002.patch, 
> YARN-2960-trunk-003.patch, YARN-2960-trunk-004.patch
>
>
> Add documentation around the architecture, api's and administration of the 
> YARN shared cache.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2960) Add documentation for the YARN shared cache

2017-10-04 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192221#comment-16192221
 ] 

Chris Trezzo commented on YARN-2960:


Thanks [~mingma]! I will commit to trunk, branch-3.0 and branch-2.

> Add documentation for the YARN shared cache
> ---
>
> Key: YARN-2960
> URL: https://issues.apache.org/jira/browse/YARN-2960
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2960-trunk-001.patch, YARN-2960-trunk-002.patch, 
> YARN-2960-trunk-003.patch, YARN-2960-trunk-004.patch
>
>
> Add documentation around the architecture, api's and administration of the 
> YARN shared cache.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2960) Add documentation for the YARN shared cache

2017-10-04 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2960:
---
Attachment: YARN-2960-trunk-004.patch

Attached v4 to add italics around parameters in the setup instructions.

> Add documentation for the YARN shared cache
> ---
>
> Key: YARN-2960
> URL: https://issues.apache.org/jira/browse/YARN-2960
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2960-trunk-001.patch, YARN-2960-trunk-002.patch, 
> YARN-2960-trunk-003.patch, YARN-2960-trunk-004.patch
>
>
> Add documentation around the architecture, api's and administration of the 
> YARN shared cache.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2960) Add documentation for the YARN shared cache

2017-10-04 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2960:
---
Attachment: YARN-2960-trunk-003.patch

Thanks [~mingma]! Attached is a v3 patch. I put all of the configs in a 
markdown table. I did leave the config and setup sub-sections separate within 
the administration section. I added a comment in the setup to reference the 
configs in the following section. I mainly wanted to keep the setup steps the 
minimum amount of setup, versus the config section which is a reference for all 
configuration parameters that are available. Let me know if there is anything 
else. Thanks again!

> Add documentation for the YARN shared cache
> ---
>
> Key: YARN-2960
> URL: https://issues.apache.org/jira/browse/YARN-2960
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2960-trunk-001.patch, YARN-2960-trunk-002.patch, 
> YARN-2960-trunk-003.patch
>
>
> Add documentation around the architecture, api's and administration of the 
> YARN shared cache.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2960) Add documentation for the YARN shared cache

2017-10-03 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2960:
---
Attachment: YARN-2960-trunk-002.patch

Thanks for the review [~mingma]! Attached is a trunk v2 patch to address your 
comments. Please let me know if there is anything else.

> Add documentation for the YARN shared cache
> ---
>
> Key: YARN-2960
> URL: https://issues.apache.org/jira/browse/YARN-2960
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2960-trunk-001.patch, YARN-2960-trunk-002.patch
>
>
> Add documentation around the architecture, api's and administration of the 
> YARN shared cache.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2960) Add documentation for the YARN shared cache

2017-10-02 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189050#comment-16189050
 ] 

Chris Trezzo commented on YARN-2960:


Patch should be good to go, please let me know if you see any issues. If I get 
a +1, I plan to commit this patch to trunk, branch-3.0 and branch-2. Thanks!

> Add documentation for the YARN shared cache
> ---
>
> Key: YARN-2960
> URL: https://issues.apache.org/jira/browse/YARN-2960
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2960-trunk-001.patch
>
>
> Add documentation around the architecture, api's and administration of the 
> YARN shared cache.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-1016) Define a HDFS based repository that allows YARN services to share resources

2017-10-02 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo resolved YARN-1016.

Resolution: Duplicate

Resolving this as a duplicate of YARN-1492. Please let me know if you think 
otherwise.

> Define a HDFS based repository that allows YARN services to share resources
> ---
>
> Key: YARN-1016
> URL: https://issues.apache.org/jira/browse/YARN-1016
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Affects Versions: 3.0.0-alpha1
>Reporter: Kam Kasravi
>
> YARN services both short and long lived can benefit from a resource repo 
> rather than packaging resources within the YARN client to be extracted and 
> used by the Application Master and (later) the containers. Standardizing a 
> resource repo will provide performance benefits as well. The repo should be 
> similar to maven or ivy repo's so discovery and versioning are built-in.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2017-10-02 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-1492:
---
Release Note: The YARN Shared Cache provides the facility to upload and 
manage shared application resources to HDFS in a safe and scalable manner. YARN 
applications can leverage resources uploaded by other applications or previous 
runs of the same application without having to re-­upload and localize 
identical files multiple times. This will save network resources and reduce 
YARN application startup time.

> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
> Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf, 
> YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, 
> YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, 
> YARN-1492-all-trunk-v5.patch
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2960) Add documentation for the YARN shared cache

2017-10-02 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2960:
---
Attachment: YARN-2960-trunk-001.patch

Trunk v1 attached.

> Add documentation for the YARN shared cache
> ---
>
> Key: YARN-2960
> URL: https://issues.apache.org/jira/browse/YARN-2960
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2960-trunk-001.patch
>
>
> Add documentation around the architecture, api's and administration of the 
> YARN shared cache.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2017-10-02 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16188808#comment-16188808
 ] 

Chris Trezzo commented on YARN-1492:


Please let me know if you have any concerns about this. Thanks!

> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
> Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf, 
> YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, 
> YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, 
> YARN-1492-all-trunk-v5.patch
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2017-10-02 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16188801#comment-16188801
 ] 

Chris Trezzo edited comment on YARN-1492 at 10/2/17 8:50 PM:
-

[~asuresh] [~subru] I have set the target version for this jira back to 2.9.0. 
The only jira that is left for this first phase is the documentation patch 
(YARN-2960) and the startup script patch (YARN-4858). Both should be able to 
make 2.9.0. The rest of the feature is already in branch-2. I have split out 
some of the major features that still need to be finished in the shared cache 
into a phase 2 jira (YARN-7282). That being said, the core parts of this 
feature are committed and ready to be used in deployments that do not need 
phase 2 features.


was (Author: ctrezzo):
[~asuresh] [~subru] I have set the target version for this jira back to 2.9.0. 
The only jira that is left for this first phase is the documentation patch and 
YARN-4858. Both should be able to make 2.9.0. The rest of the feature is 
already in branch-2. I have split out some of the major features that still 
need to be finished in the shared cache into a phase 2 jira (YARN-7282). That 
being said, the core parts of this feature are committed and ready to be used 
in deployments that do not need phase 2 features.

> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
> Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf, 
> YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, 
> YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, 
> YARN-1492-all-trunk-v5.patch
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2017-10-02 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16188801#comment-16188801
 ] 

Chris Trezzo commented on YARN-1492:


[~asuresh] [~subru] I have set the target version for this jira back to 2.9.0. 
The only jira that is left for this first phase is the documentation patch and 
YARN-4858. Both should be able to make 2.9.0. The rest of the feature is 
already in branch-2. I have split out some of the major features that still 
need to be finished in the shared cache into a phase 2 jira (YARN-7282). That 
being said, the core parts of this feature are committed and ready to be used 
in deployments that do not need phase 2 features.

> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
> Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf, 
> YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, 
> YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, 
> YARN-1492-all-trunk-v5.patch
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2017-10-02 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-1492:
---
Target Version/s: 2.9.0  (was: 3.1.0)

> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
> Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf, 
> YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, 
> YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, 
> YARN-1492-all-trunk-v5.patch
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5727) Improve YARN shared cache support for LinuxContainerExecutor

2017-10-02 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo reassigned YARN-5727:
--

Assignee: (was: Chris Trezzo)

> Improve YARN shared cache support for LinuxContainerExecutor
> 
>
> Key: YARN-5727
> URL: https://issues.apache.org/jira/browse/YARN-5727
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
> Attachments: YARN-5727-Design-v1.pdf
>
>
> When running LinuxContainerExecutor in a secure mode 
> ({{yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users}} set 
> to {{false}}), all localized files are owned by the user that owns the 
> container which localized the resource. This presents a problem for the 
> shared cache when a YARN application requests a resource to be uploaded to 
> the shared cache that has a non-public visibility. The shared cache uploader 
> (running as the node manager user) does not have access to the localized 
> files and can not compute the checksum of the file or upload it to the cache. 
> The solution should ideally satisfy the following three requirements:
> # Localized files should still be safe/secure. Other users that run 
> containers should not be able to modify, or delete the publicly localized 
> files of others.
> # The node manager user should be able to access these files for the purpose 
> of checksumming and uploading to the shared cache without being a privileged 
> user.
> # The solution should avoid making unnecessary copies of the localized files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5727) Improve YARN shared cache support for LinuxContainerExecutor

2017-10-02 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-5727:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-7282

> Improve YARN shared cache support for LinuxContainerExecutor
> 
>
> Key: YARN-5727
> URL: https://issues.apache.org/jira/browse/YARN-5727
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-5727-Design-v1.pdf
>
>
> When running LinuxContainerExecutor in a secure mode 
> ({{yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users}} set 
> to {{false}}), all localized files are owned by the user that owns the 
> container which localized the resource. This presents a problem for the 
> shared cache when a YARN application requests a resource to be uploaded to 
> the shared cache that has a non-public visibility. The shared cache uploader 
> (running as the node manager user) does not have access to the localized 
> files and can not compute the checksum of the file or upload it to the cache. 
> The solution should ideally satisfy the following three requirements:
> # Localized files should still be safe/secure. Other users that run 
> containers should not be able to modify, or delete the publicly localized 
> files of others.
> # The node manager user should be able to access these files for the purpose 
> of checksumming and uploading to the shared cache without being a privileged 
> user.
> # The solution should avoid making unnecessary copies of the localized files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2781) support more flexible policy for uploading in shared cache

2017-10-02 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2781:
---
Issue Type: Sub-task  (was: New Feature)
Parent: YARN-7282

> support more flexible policy for uploading in shared cache
> --
>
> Key: YARN-2781
> URL: https://issues.apache.org/jira/browse/YARN-2781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sangjin Lee
>
> Today all resources are always uploaded as long as the client wants to upload 
> it. We may want to implement a feature where the shared cache manager can 
> instruct the node managers not to upload under some circumstances.
> Some examples may be uploading a resource if it is seen more than N number of 
> times.
> This doesn't need to be included in the first version of the shared cache.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6097) Add support for directories in the Shared Cache

2017-10-02 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-6097:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-7282

> Add support for directories in the Shared Cache
> ---
>
> Key: YARN-6097
> URL: https://issues.apache.org/jira/browse/YARN-6097
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>
> Add support for directories in the shared cache.
> If a LocalResource URL points to a directory, the directory structure is 
> preserved during localization on the node manager. Currently, the shared 
> cache does not support directories and will fail to upload the URL to the 
> cache if shouldBeUploadedToSharedCache is set to true.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2663) Race condintion in shared cache CleanerTask during deletion of resource

2017-10-02 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2663:
---
Parent Issue: YARN-7282  (was: YARN-1492)

> Race condintion in shared cache CleanerTask during deletion of resource
> ---
>
> Key: YARN-2663
> URL: https://issues.apache.org/jira/browse/YARN-2663
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Priority: Blocker
>
> In CleanerTask, store.removeResource(key) and 
> removeResourceFromCacheFileSystem(path) do not happen together in atomic 
> fashion.
> Since resources could be uploaded with different file names, the SCM could 
> receive a notification to add a resource to the SCM between the two 
> operations. Thus, we have a scenario where the cleaner service deletes the 
> entry from the scm, receives a notification from the uploader (adding the 
> entry back into the scm) and then deletes the file from HDFS.
> Cleaner code that deletes resource:
> {code}
>   if (store.isResourceEvictable(key, resource)) {
> try {
>   /*
>* TODO: There is a race condition between store.removeResource(key)
>* and removeResourceFromCacheFileSystem(path) operations because 
> they
>* do not happen atomically and resources can be uploaded with
>* different file names by the node managers.
>*/
>   // remove the resource from scm (checks for appIds as well)
>   if (store.removeResource(key)) {
> // remove the resource from the file system
> boolean deleted = removeResourceFromCacheFileSystem(path);
> if (deleted) {
>   resourceStatus = ResourceStatus.DELETED;
> } else {
>   LOG.error("Failed to remove path from the file system."
>   + " Skipping this resource: " + path);
>   resourceStatus = ResourceStatus.ERROR;
> }
>   } else {
> // we did not delete the resource because it contained application
> // ids
> resourceStatus = ResourceStatus.PROCESSED;
>   }
> } catch (IOException e) {
>   LOG.error(
>   "Failed to remove path from the file system. Skipping this 
> resource: "
>   + path, e);
>   resourceStatus = ResourceStatus.ERROR;
> }
>   } else {
> resourceStatus = ResourceStatus.PROCESSED;
>   }
> {code}
> Uploader code that uploads resource:
> {code}
>   // create the temporary file
>   tempPath = new Path(directoryPath, getTemporaryFileName(actualPath));
>   if (!uploadFile(actualPath, tempPath)) {
> LOG.warn("Could not copy the file to the shared cache at " + 
> tempPath);
> return false;
>   }
>   // set the permission so that it is readable but not writable
>   // TODO should I create the file with the right permission so I save the
>   // permission call?
>   fs.setPermission(tempPath, FILE_PERMISSION);
>   // rename it to the final filename
>   Path finalPath = new Path(directoryPath, actualPath.getName());
>   if (!fs.rename(tempPath, finalPath)) {
> LOG.warn("The file already exists under " + finalPath +
> ". Ignoring this attempt.");
> deleteTempFile(tempPath);
> return false;
>   }
>   // notify the SCM
>   if (!notifySharedCacheManager(checksumVal, actualPath.getName())) {
> // the shared cache manager rejected the upload (as it is likely
> // uploaded under a different name
> // clean up this file and exit
> fs.delete(finalPath, false);
> return false;
>   }
> {code}
> One solution is to have the UploaderService always rename the resource file 
> to the checksum of the resource plus the extension. With this fix we will 
> never receive a notify for the resource before the delete from the FS has 
> happened because the rename on the node manager will fail. If the node 
> manager uploads the file after it is deleted from the FS, we are ok and the 
> resource will simply get added back to the scm once a notification is 
> received.
> The classpath at the MapReduce layer is still usable because we leverage 
> links to preserve the original client file name.
> The downside is that now the shared cache files in HDFS are less readable. 
> This could be mitigated with an added admin command to the SCM that gives a 
> list of filenames associated with a checksum or vice versa.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.or

[jira] [Updated] (YARN-2774) shared cache service should authorize calls properly

2017-10-02 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2774:
---
Parent Issue: YARN-7282  (was: YARN-1492)

> shared cache service should authorize calls properly
> 
>
> Key: YARN-2774
> URL: https://issues.apache.org/jira/browse/YARN-2774
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sangjin Lee
>
> The shared cache manager (SCM) services should authorize calls properly.
> Currently, the uploader service (done in YARN-2186) does not authorize calls 
> to notify the SCM on newly uploaded resource. Proper security/authorization 
> needs to be done in this RPC call. Also, the use/release calls (done in 
> YARN-2188) and the scmAdmin commands (done in YARN-2189) are not properly 
> authorized. The SCM UI done in YARN-2203 as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7282) Shared Cache Phase 2

2017-10-02 Thread Chris Trezzo (JIRA)
Chris Trezzo created YARN-7282:
--

 Summary: Shared Cache Phase 2
 Key: YARN-7282
 URL: https://issues.apache.org/jira/browse/YARN-7282
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Chris Trezzo


Phase 2 will address more features that need to be built as part of the shared 
cache project. See YARN-1492 for the first release of the shared cache.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7001) If shared cache upload is terminated in the middle, the temp file will never be deleted

2017-10-01 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16187584#comment-16187584
 ] 

Chris Trezzo commented on YARN-7001:


Also thanks [~Sen Zhao] for the patch!

> If shared cache upload is terminated in the middle, the temp file will never 
> be deleted
> ---
>
> Key: YARN-7001
> URL: https://issues.apache.org/jira/browse/YARN-7001
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Sen Zhao
> Attachments: YARN-7001.001.patch, YARN-7001.002.patch, 
> YARN-7001.003.patch, YARN-7001.004.patch
>
>
> There is a missing deleteTempFile(tempPath);
> {code}
>   tempPath = new Path(directoryPath, getTemporaryFileName(actualPath));
>   if (!uploadFile(actualPath, tempPath)) {
> LOG.warn("Could not copy the file to the shared cache at " + 
> tempPath);
> return false;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7001) If shared cache upload is terminated in the middle, the temp file will never be deleted

2017-10-01 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16187583#comment-16187583
 ] 

Chris Trezzo commented on YARN-7001:


Looking at this patch, I am not entirely sure if this fixes the issue. I am 
thinking about these two scenarios:
# If {{uploadFile}} returns false: {{FileUtil.copy}} has returned false. If we 
look into that method, I think the only way it will return false is if the file 
has not been created yet, since we pass in deleteSource as false. In this case, 
we do not need a deleteTempFile call.
# If {{uploadFile}} throws an IOException: Here we might have an issue. If copy 
throws an IOException after it created the tmp file, but before it finished 
writing it, we may be stranding tmp files. It seems like we would want a 
try/catch around the uploadFile. If we get an IOException, we would want to 
delete the tmp file if it exists.

In reality, we could be stranding tmp files if the node manager fails at any 
point between the file creation in uploadFile and the file rename later in the 
method. In practice, this doesn't seem to be an issue because the time between 
those points is small. Maybe we could add a try/finally around those two points 
where we attempt to delete the tmp file in the finally? That at least covers 
the case where there is an unexpected exception.

Let me know if you think I have missed something. Thanks!

> If shared cache upload is terminated in the middle, the temp file will never 
> be deleted
> ---
>
> Key: YARN-7001
> URL: https://issues.apache.org/jira/browse/YARN-7001
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Sen Zhao
> Attachments: YARN-7001.001.patch, YARN-7001.002.patch, 
> YARN-7001.003.patch, YARN-7001.004.patch
>
>
> There is a missing deleteTempFile(tempPath);
> {code}
>   tempPath = new Path(directoryPath, getTemporaryFileName(actualPath));
>   if (!uploadFile(actualPath, tempPath)) {
> LOG.warn("Could not copy the file to the shared cache at " + 
> tempPath);
> return false;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7250) Update Shared cache client api to use URLs

2017-09-27 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183463#comment-16183463
 ] 

Chris Trezzo commented on YARN-7250:


Thank you [~vrushalic] for the review! I will wait until tomorrow to commit, 
just in case there are any other comments. Otherwise, I plan to commit to 
trunk, branch-3.0 and branch-2.

> Update Shared cache client api to use URLs
> --
>
> Key: YARN-7250
> URL: https://issues.apache.org/jira/browse/YARN-7250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Attachments: YARN-7250-trunk-001.patch
>
>
> We should make the SharedCacheClient use api more consistent with other YARN 
> api methods. We can do this by doing two things:
> # Update the SharedCacheClient#use api so that it returns a URL instead of a 
> Path. Currently yarn developers have to convert the path to a URL when 
> creating a LocalResources. It would be much smoother if they could just use a 
> URL passed to them by the shared cache client.
> # Remove the portion of the client that deals with fragments as this is not 
> consistent with the rest of YARN. This functionality is bleeding in from the 
> MapReduce layer, which uses fragments to keep track of destination file 
> names. YARN's api does not use fragments. Instead  the ContainerLaunchContext 
> expects a Map localResources, where the strings are 
> the destination file names. We should let the YARN application handle 
> destination file names however it wants instead of pushing this into the 
> shared cache api. Additionally, fragments are a clunky way to handle this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7250) Update Shared cache client api to use URLs

2017-09-27 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183443#comment-16183443
 ] 

Chris Trezzo commented on YARN-7250:


Also to clarify, this is a client side only change. The protobuf/rpc between 
the client and the SCM is staying the same.

> Update Shared cache client api to use URLs
> --
>
> Key: YARN-7250
> URL: https://issues.apache.org/jira/browse/YARN-7250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Attachments: YARN-7250-trunk-001.patch
>
>
> We should make the SharedCacheClient use api more consistent with other YARN 
> api methods. We can do this by doing two things:
> # Update the SharedCacheClient#use api so that it returns a URL instead of a 
> Path. Currently yarn developers have to convert the path to a URL when 
> creating a LocalResources. It would be much smoother if they could just use a 
> URL passed to them by the shared cache client.
> # Remove the portion of the client that deals with fragments as this is not 
> consistent with the rest of YARN. This functionality is bleeding in from the 
> MapReduce layer, which uses fragments to keep track of destination file 
> names. YARN's api does not use fragments. Instead  the ContainerLaunchContext 
> expects a Map localResources, where the strings are 
> the destination file names. We should let the YARN application handle 
> destination file names however it wants instead of pushing this into the 
> shared cache api. Additionally, fragments are a clunky way to handle this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7253) Shared Cache Manager daemon command listed as admin subcmd in yarn script

2017-09-26 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181276#comment-16181276
 ] 

Chris Trezzo commented on YARN-7253:


Thanks [~vrushalic]! I will commit later today to trunk and branch-3.0.

> Shared Cache Manager daemon command listed as admin subcmd in yarn script
> -
>
> Key: YARN-7253
> URL: https://issues.apache.org/jira/browse/YARN-7253
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1, 3.0.0-alpha4
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Trivial
> Attachments: YARN-7253-trunk-001.patch
>
>
> Currently the command to start the shared cache manager daemon is listed as 
> an admin command in the yarn script usage:
> {noformat}
>   SUBCOMMAND is one of:
> Admin Commands:
> daemonlogget/set the log level for each daemon
> node prints node report(s)
> rmadmin  admin tools
> scmadmin SharedCacheManager admin tools
> sharedcachemanager   run the SharedCacheManager daemon
> {noformat}
> It should be a daemon command.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7253) Shared Cache Manager daemon command listed as admin subcmd in yarn script

2017-09-25 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-7253:
---
Affects Version/s: 3.0.0-alpha4

> Shared Cache Manager daemon command listed as admin subcmd in yarn script
> -
>
> Key: YARN-7253
> URL: https://issues.apache.org/jira/browse/YARN-7253
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1, 3.0.0-alpha4
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Trivial
> Attachments: YARN-7253-trunk-001.patch
>
>
> Currently the command to start the shared cache manager daemon is listed as 
> an admin command in the yarn script usage:
> {noformat}
>   SUBCOMMAND is one of:
> Admin Commands:
> daemonlogget/set the log level for each daemon
> node prints node report(s)
> rmadmin  admin tools
> scmadmin SharedCacheManager admin tools
> sharedcachemanager   run the SharedCacheManager daemon
> {noformat}
> It should be a daemon command.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7253) Shared Cache Manager daemon command listed as admin subcmd in yarn script

2017-09-25 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-7253:
---
Affects Version/s: 3.0.0-beta1

> Shared Cache Manager daemon command listed as admin subcmd in yarn script
> -
>
> Key: YARN-7253
> URL: https://issues.apache.org/jira/browse/YARN-7253
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Trivial
> Attachments: YARN-7253-trunk-001.patch
>
>
> Currently the command to start the shared cache manager daemon is listed as 
> an admin command in the yarn script usage:
> {noformat}
>   SUBCOMMAND is one of:
> Admin Commands:
> daemonlogget/set the log level for each daemon
> node prints node report(s)
> rmadmin  admin tools
> scmadmin SharedCacheManager admin tools
> sharedcachemanager   run the SharedCacheManager daemon
> {noformat}
> It should be a daemon command.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7253) Shared Cache Manager daemon command listed as admin subcmd in yarn script

2017-09-25 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo reassigned YARN-7253:
--

Assignee: Chris Trezzo

> Shared Cache Manager daemon command listed as admin subcmd in yarn script
> -
>
> Key: YARN-7253
> URL: https://issues.apache.org/jira/browse/YARN-7253
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Trivial
> Attachments: YARN-7253-trunk-001.patch
>
>
> Currently the command to start the shared cache manager daemon is listed as 
> an admin command in the yarn script usage:
> {noformat}
>   SUBCOMMAND is one of:
> Admin Commands:
> daemonlogget/set the log level for each daemon
> node prints node report(s)
> rmadmin  admin tools
> scmadmin SharedCacheManager admin tools
> sharedcachemanager   run the SharedCacheManager daemon
> {noformat}
> It should be a daemon command.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7253) Shared Cache Manager daemon command listed as admin subcmd in yarn script

2017-09-25 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-7253:
---
Attachment: YARN-7253-trunk-001.patch

Trunk v1 patch attached.

> Shared Cache Manager daemon command listed as admin subcmd in yarn script
> -
>
> Key: YARN-7253
> URL: https://issues.apache.org/jira/browse/YARN-7253
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chris Trezzo
>Priority: Trivial
> Attachments: YARN-7253-trunk-001.patch
>
>
> Currently the command to start the shared cache manager daemon is listed as 
> an admin command in the yarn script usage:
> {noformat}
>   SUBCOMMAND is one of:
> Admin Commands:
> daemonlogget/set the log level for each daemon
> node prints node report(s)
> rmadmin  admin tools
> scmadmin SharedCacheManager admin tools
> sharedcachemanager   run the SharedCacheManager daemon
> {noformat}
> It should be a daemon command.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7253) Shared Cache Manager daemon command listed as admin subcmd in yarn script

2017-09-25 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-7253:
---
Description: 
Currently the command to start the shared cache manager daemon is listed as an 
admin command in the yarn script usage:
{noformat}
  SUBCOMMAND is one of:


Admin Commands:

daemonlogget/set the log level for each daemon
node prints node report(s)
rmadmin  admin tools
scmadmin SharedCacheManager admin tools
sharedcachemanager   run the SharedCacheManager daemon
{noformat}

It should be a daemon command.

  was:
Currently the command to start the shared cache manager daemon is listed as an 
admin command in the yarn script:
{noformat}
  SUBCOMMAND is one of:


Admin Commands:

daemonlogget/set the log level for each daemon
node prints node report(s)
rmadmin  admin tools
scmadmin SharedCacheManager admin tools
sharedcachemanager   run the SharedCacheManager daemon
{noformat}

It should be a daemon command.


> Shared Cache Manager daemon command listed as admin subcmd in yarn script
> -
>
> Key: YARN-7253
> URL: https://issues.apache.org/jira/browse/YARN-7253
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chris Trezzo
>Priority: Trivial
>
> Currently the command to start the shared cache manager daemon is listed as 
> an admin command in the yarn script usage:
> {noformat}
>   SUBCOMMAND is one of:
> Admin Commands:
> daemonlogget/set the log level for each daemon
> node prints node report(s)
> rmadmin  admin tools
> scmadmin SharedCacheManager admin tools
> sharedcachemanager   run the SharedCacheManager daemon
> {noformat}
> It should be a daemon command.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7253) Shared Cache Manager daemon command listed as admin subcmd in yarn script

2017-09-25 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-7253:
---
Priority: Trivial  (was: Minor)

> Shared Cache Manager daemon command listed as admin subcmd in yarn script
> -
>
> Key: YARN-7253
> URL: https://issues.apache.org/jira/browse/YARN-7253
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chris Trezzo
>Priority: Trivial
>
> Currently the command to start the shared cache manager daemon is listed as 
> an admin command in the yarn script:
> {noformat}
>   SUBCOMMAND is one of:
> Admin Commands:
> daemonlogget/set the log level for each daemon
> node prints node report(s)
> rmadmin  admin tools
> scmadmin SharedCacheManager admin tools
> sharedcachemanager   run the SharedCacheManager daemon
> {noformat}
> It should be a daemon command.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7253) Shared Cache Manager daemon command listed as admin subcmd in yarn script

2017-09-25 Thread Chris Trezzo (JIRA)
Chris Trezzo created YARN-7253:
--

 Summary: Shared Cache Manager daemon command listed as admin 
subcmd in yarn script
 Key: YARN-7253
 URL: https://issues.apache.org/jira/browse/YARN-7253
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chris Trezzo
Priority: Minor


Currently the command to start the shared cache manager daemon is listed as an 
admin command in the yarn script:
{noformat}
  SUBCOMMAND is one of:


Admin Commands:

daemonlogget/set the log level for each daemon
node prints node report(s)
rmadmin  admin tools
scmadmin SharedCacheManager admin tools
sharedcachemanager   run the SharedCacheManager daemon
{noformat}

It should be a daemon command.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7250) Update Shared cache client api to use URLs

2017-09-25 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179819#comment-16179819
 ] 

Chris Trezzo commented on YARN-7250:


The TestNMClient failure is happening on trunk as well, and is unrelated to 
this patch. The same goes for the TestAMRMClient timeout.

The patch should be good to go. The patch is technically an incompatible 
change, but this api is marked unstable and this feature is still in an alpha 
state, so there should be no issue. My intention is to, pending review, check 
this into trunk, branch-3.0 and branch-2.

> Update Shared cache client api to use URLs
> --
>
> Key: YARN-7250
> URL: https://issues.apache.org/jira/browse/YARN-7250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Attachments: YARN-7250-trunk-001.patch
>
>
> We should make the SharedCacheClient use api more consistent with other YARN 
> api methods. We can do this by doing two things:
> # Update the SharedCacheClient#use api so that it returns a URL instead of a 
> Path. Currently yarn developers have to convert the path to a URL when 
> creating a LocalResources. It would be much smoother if they could just use a 
> URL passed to them by the shared cache client.
> # Remove the portion of the client that deals with fragments as this is not 
> consistent with the rest of YARN. This functionality is bleeding in from the 
> MapReduce layer, which uses fragments to keep track of destination file 
> names. YARN's api does not use fragments. Instead  the ContainerLaunchContext 
> expects a Map localResources, where the strings are 
> the destination file names. We should let the YARN application handle 
> destination file names however it wants instead of pushing this into the 
> shared cache api. Additionally, fragments are a clunky way to handle this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7250) Update Shared cache client api to use URLs

2017-09-25 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-7250:
---
Description: 
We should make the SharedCacheClient use api more consistent with other YARN 
api methods. We can do this by doing two things:
# Update the SharedCacheClient#use api so that it returns a URL instead of a 
Path. Currently yarn developers have to convert the path to a URL when creating 
a LocalResources. It would be much smoother if they could just use a URL passed 
to them by the shared cache client.
# Remove the portion of the client that deals with fragments as this is not 
consistent with the rest of YARN. This functionality is bleeding in from the 
MapReduce layer, which uses fragments to keep track of destination file names. 
YARN's api does not use fragments. Instead  the ContainerLaunchContext expects 
a Map localResources, where the strings are the 
destination file names. We should let the YARN application handle destination 
file names however it wants instead of pushing this into the shared cache api. 
Additionally, fragments are a clunky way to handle this.

  was:
We should make the SharedCacheClient use api more consistent with other YARN 
api methods. We can do this by doing two things:
# Update the SharedCacheClient#use api so that it returns a URL instead of a 
Path. Currently yarn developers have to convert the path to a URL when creating 
a LocalResources. It would be much smoother if they could just use a URL passed 
to them by the shared cache client.
# Remove the portion of the client that deals with fragments as this is not 
consistent with the rest of YARN. This functionality is bleeding in from the 
MapReduce layer, which uses fragments to keep track of destination file names. 
YARN's api does not use fragments. Instead  the ContainerLaunchContext expects 
a Map localResources, where the strings are the 
destination file names. We should let the YARN application handle destination 
file names however it wants instead of pushing this into the shared cache api. 
Additionally, fragments is a clunky way to handle this.


> Update Shared cache client api to use URLs
> --
>
> Key: YARN-7250
> URL: https://issues.apache.org/jira/browse/YARN-7250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Attachments: YARN-7250-trunk-001.patch
>
>
> We should make the SharedCacheClient use api more consistent with other YARN 
> api methods. We can do this by doing two things:
> # Update the SharedCacheClient#use api so that it returns a URL instead of a 
> Path. Currently yarn developers have to convert the path to a URL when 
> creating a LocalResources. It would be much smoother if they could just use a 
> URL passed to them by the shared cache client.
> # Remove the portion of the client that deals with fragments as this is not 
> consistent with the rest of YARN. This functionality is bleeding in from the 
> MapReduce layer, which uses fragments to keep track of destination file 
> names. YARN's api does not use fragments. Instead  the ContainerLaunchContext 
> expects a Map localResources, where the strings are 
> the destination file names. We should let the YARN application handle 
> destination file names however it wants instead of pushing this into the 
> shared cache api. Additionally, fragments are a clunky way to handle this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7250) Update Shared cache client api to use URLs

2017-09-25 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-7250:
---
Attachment: YARN-7250-trunk-001.patch

Attached v1 patch.

> Update Shared cache client api to use URLs
> --
>
> Key: YARN-7250
> URL: https://issues.apache.org/jira/browse/YARN-7250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Attachments: YARN-7250-trunk-001.patch
>
>
> We should make the SharedCacheClient use api more consistent with other YARN 
> api methods. We can do this by doing two things:
> # Update the SharedCacheClient#use api so that it returns a URL instead of a 
> Path. Currently yarn developers have to convert the path to a URL when 
> creating a LocalResources. It would be much smoother if they could just use a 
> URL passed to them by the shared cache client.
> # Remove the portion of the client that deals with fragments as this is not 
> consistent with the rest of YARN. This functionality is bleeding in from the 
> MapReduce layer, which uses fragments to keep track of destination file 
> names. YARN's api does not use fragments. Instead  the ContainerLaunchContext 
> expects a Map localResources, where the strings are 
> the destination file names. We should let the YARN application handle 
> destination file names however it wants instead of pushing this into the 
> shared cache api. Additionally, fragments is a clunky way to handle this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7250) Update Shared cache client api to use URLs

2017-09-25 Thread Chris Trezzo (JIRA)
Chris Trezzo created YARN-7250:
--

 Summary: Update Shared cache client api to use URLs
 Key: YARN-7250
 URL: https://issues.apache.org/jira/browse/YARN-7250
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
Priority: Minor


We should make the SharedCacheClient use api more consistent with other YARN 
api methods. We can do this by doing two things:
# Update the SharedCacheClient#use api so that it returns a URL instead of a 
Path. Currently yarn developers have to convert the path to a URL when creating 
a LocalResources. It would be much smoother if they could just use a URL passed 
to them by the shared cache client.
# Remove the portion of the client that deals with fragments as this is not 
consistent with the rest of YARN. This functionality is bleeding in from the 
MapReduce layer, which uses fragments to keep track of destination file names. 
YARN's api does not use fragments. Instead  the ContainerLaunchContext expects 
a Map localResources, where the strings are the 
destination file names. We should let the YARN application handle destination 
file names however it wants instead of pushing this into the shared cache api. 
Additionally, fragments is a clunky way to handle this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2017-07-10 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081151#comment-16081151
 ] 

Chris Trezzo commented on YARN-1492:


bq. Could you explain how the shared cache leverage the node manager local 
cache in detail?

The shared cache leverages the local cache via the normal LocalResource API. 
The YARN application specifies a shared cache path that it received from the 
shared cache as the LocalResource URI.

bq. Are those shared jars marked as PUBLIC?

Yes, currently all resources in the shared cache are world readable, so they 
are in that sense public. However, at the node manager level you could set the 
visibilities to PRIVATE or APPLICATION.

bq. Could you point me the source code that handle this?

The shared cache uses the normal localization code path (see 
ResourceLocalizationService). For shared cache specific parts to upload a 
resource to the cache see SharedCacheUploader. If you want to see an example of 
how a YARN application can implement support for the shared cache, see 
MAPREDUCE-5951 for how map reduce does it.

> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
> Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf, 
> YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, 
> YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, 
> YARN-1492-all-trunk-v5.patch
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches

2017-04-07 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961163#comment-15961163
 ] 

Chris Trezzo commented on YARN-5797:


Thanks everyone for the reviews and [~mingma] for the commit!

> Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
> --
>
> Key: YARN-5797
> URL: https://issues.apache.org/jira/browse/YARN-5797
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: YARN-5797-branch-2.001.patch, YARN-5797-trunk.002.patch, 
> YARN-5797-trunk.003.patch, YARN-5797-trunk-v1.patch
>
>
> Add new metrics to the node manager around the local cache sizes and how much 
> is being cleaned from them on a regular bases. For example, we can expose 
> information contained in the {{LocalCacheCleanerStats}} class.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches

2017-04-05 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957929#comment-15957929
 ] 

Chris Trezzo commented on YARN-5797:


Branch-2 test failures are unrelated. From my perspective the patch is good to 
go for trunk and branch-2. Thanks!

> Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
> --
>
> Key: YARN-5797
> URL: https://issues.apache.org/jira/browse/YARN-5797
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-5797-branch-2.001.patch, YARN-5797-trunk.002.patch, 
> YARN-5797-trunk.003.patch, YARN-5797-trunk-v1.patch
>
>
> Add new metrics to the node manager around the local cache sizes and how much 
> is being cleaned from them on a regular bases. For example, we can expose 
> information contained in the {{LocalCacheCleanerStats}} class.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches

2017-04-05 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957794#comment-15957794
 ] 

Chris Trezzo edited comment on YARN-5797 at 4/5/17 9:43 PM:


Also attached is a v1 branch-2 patch. The only conflict was in 
TestResourceLocalizationService.java and it was due to a difference in 
formatting between the branches on one line.


was (Author: ctrezzo):
Also attached is a v1 branch-2 patch. The only conflict was in 
TestResourceLocalization and it was due to a difference in formatting between 
the branches on one line.

> Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
> --
>
> Key: YARN-5797
> URL: https://issues.apache.org/jira/browse/YARN-5797
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-5797-branch-2.001.patch, YARN-5797-trunk.002.patch, 
> YARN-5797-trunk.003.patch, YARN-5797-trunk-v1.patch
>
>
> Add new metrics to the node manager around the local cache sizes and how much 
> is being cleaned from them on a regular bases. For example, we can expose 
> information contained in the {{LocalCacheCleanerStats}} class.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches

2017-04-05 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-5797:
---
Attachment: YARN-5797-branch-2.001.patch

Also attached is a v1 branch-2 patch. The only conflict was in 
TestResourceLocalization and it was due to a difference in formatting between 
the branches on one line.

> Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
> --
>
> Key: YARN-5797
> URL: https://issues.apache.org/jira/browse/YARN-5797
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-5797-branch-2.001.patch, YARN-5797-trunk.002.patch, 
> YARN-5797-trunk.003.patch, YARN-5797-trunk-v1.patch
>
>
> Add new metrics to the node manager around the local cache sizes and how much 
> is being cleaned from them on a regular bases. For example, we can expose 
> information contained in the {{LocalCacheCleanerStats}} class.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches

2017-04-05 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-5797:
---
Attachment: YARN-5797-trunk.003.patch

Attached is a v3 patch that is rebased on trunk.

> Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
> --
>
> Key: YARN-5797
> URL: https://issues.apache.org/jira/browse/YARN-5797
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-5797-trunk.002.patch, YARN-5797-trunk.003.patch, 
> YARN-5797-trunk-v1.patch
>
>
> Add new metrics to the node manager around the local cache sizes and how much 
> is being cleaned from them on a regular bases. For example, we can expose 
> information contained in the {{LocalCacheCleanerStats}} class.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6004) Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer so that it is less than 150 lines

2017-04-05 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957380#comment-15957380
 ] 

Chris Trezzo commented on YARN-6004:


Thanks [~mingma]!

> Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer 
> so that it is less than 150 lines
> --
>
> Key: YARN-6004
> URL: https://issues.apache.org/jira/browse/YARN-6004
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Trivial
>  Labels: newbie
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: YARN-6004-branch-2.001.patch, YARN-6004-trunk.001.patch, 
> YARN-6004-trunk.002.patch
>
>
> The TestResourceLocalizationService#testDownloadingResourcesOnContainerKill 
> method is over 150 lines:
> bq. 
> ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java:1128:
>  @Test(timeout = 2):3: Method length is 242 lines (max allowed is 150).
> This method needs to be refactored and broken up into smaller methods.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6004) Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer so that it is less than 150 lines

2017-04-04 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956103#comment-15956103
 ] 

Chris Trezzo commented on YARN-6004:


These unit tests fail locally on branch-2 without the patch. As far as I can 
tell these failures are unrelated. The patches should be good to go for trunk 
and branch-2.

> Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer 
> so that it is less than 150 lines
> --
>
> Key: YARN-6004
> URL: https://issues.apache.org/jira/browse/YARN-6004
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-6004-branch-2.001.patch, YARN-6004-trunk.001.patch, 
> YARN-6004-trunk.002.patch
>
>
> The TestResourceLocalizationService#testDownloadingResourcesOnContainerKill 
> method is over 150 lines:
> bq. 
> ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java:1128:
>  @Test(timeout = 2):3: Method length is 242 lines (max allowed is 150).
> This method needs to be refactored and broken up into smaller methods.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches

2017-04-04 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956049#comment-15956049
 ] 

Chris Trezzo commented on YARN-5797:


As soon as the patch for YARN-6004 is in, I will rebase for trunk and branch-2.

> Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
> --
>
> Key: YARN-5797
> URL: https://issues.apache.org/jira/browse/YARN-5797
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-5797-trunk.002.patch, YARN-5797-trunk-v1.patch
>
>
> Add new metrics to the node manager around the local cache sizes and how much 
> is being cleaned from them on a regular bases. For example, we can expose 
> information contained in the {{LocalCacheCleanerStats}} class.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6004) Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer so that it is less than 150 lines

2017-04-04 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-6004:
---
Attachment: YARN-6004-branch-2.001.patch

Thanks [~mingma]! Attached is a branch-2 v1 patch.

Note that the reason why this does not blow up in trunk is because: "starting 
in Java SE 8, a local class can access local variables and parameters of the 
enclosing block that are final or effectively final. A variable or parameter 
whose value is never changed after it is initialized is effectively final." 
Because of this, the final modifier is unnecessary.

See java documentation: 
http://docs.oracle.com/javase/tutorial/java/javaOO/localclasses.html

> Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer 
> so that it is less than 150 lines
> --
>
> Key: YARN-6004
> URL: https://issues.apache.org/jira/browse/YARN-6004
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-6004-branch-2.001.patch, YARN-6004-trunk.001.patch, 
> YARN-6004-trunk.002.patch
>
>
> The TestResourceLocalizationService#testDownloadingResourcesOnContainerKill 
> method is over 150 lines:
> bq. 
> ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java:1128:
>  @Test(timeout = 2):3: Method length is 242 lines (max allowed is 150).
> This method needs to be refactored and broken up into smaller methods.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2017-04-04 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955930#comment-15955930
 ] 

Chris Trezzo commented on YARN-1492:


Hi [~jhung], thanks for the question!

bq. will this feature save jars from being relocalized across different jobs on 
a node?

The short answer is yes. YARN applications can leverage this feature to prevent 
relocalizing the same resources over and over again from both the client to 
hdfs as well as from hdfs to the node managers. The shared cache leverages 
checksuming and the node manager local cache to ensure applications can reuse 
resources that are already localized on node managers. See MAPREDUCE-5951 for 
mapreduce level support for the shared cache (which will hopefully be committed 
shortly to trunk and branch-2).

Please let me know if you have any more questions. Here is also a slide deck 
explaining the feature at a high-level: 
https://www.slideshare.net/ctrezzo/a-secure-public-cache-for-yarn-application-resources-61688793

> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
> Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf, 
> YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, 
> YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, 
> YARN-1492-all-trunk-v5.patch
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2960) Add documentation for the YARN shared cache

2017-04-04 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1597#comment-1597
 ] 

Chris Trezzo commented on YARN-2960:


Circling back and taking a look at this now.

> Add documentation for the YARN shared cache
> ---
>
> Key: YARN-2960
> URL: https://issues.apache.org/jira/browse/YARN-2960
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>
> Add documentation around the architecture, api's and administration of the 
> YARN shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches

2017-03-29 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948062#comment-15948062
 ] 

Chris Trezzo commented on YARN-5797:


Patch is now available for YARN-6004 as well. Thanks!

> Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
> --
>
> Key: YARN-5797
> URL: https://issues.apache.org/jira/browse/YARN-5797
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-5797-trunk.002.patch, YARN-5797-trunk-v1.patch
>
>
> Add new metrics to the node manager around the local cache sizes and how much 
> is being cleaned from them on a regular bases. For example, we can expose 
> information contained in the {{LocalCacheCleanerStats}} class.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6004) Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer so that it is less than 150 lines

2017-03-29 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-6004:
---
Attachment: YARN-6004-trunk.002.patch

Attached is v2 for trunk to address checkstyle issues.

> Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer 
> so that it is less than 150 lines
> --
>
> Key: YARN-6004
> URL: https://issues.apache.org/jira/browse/YARN-6004
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-6004-trunk.001.patch, YARN-6004-trunk.002.patch
>
>
> The TestResourceLocalizationService#testDownloadingResourcesOnContainerKill 
> method is over 150 lines:
> bq. 
> ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java:1128:
>  @Test(timeout = 2):3: Method length is 242 lines (max allowed is 150).
> This method needs to be refactored and broken up into smaller methods.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6004) Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer so that it is less than 150 lines

2017-03-27 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-6004:
---
Attachment: YARN-6004-trunk.001.patch

Attached is a v1 patch for trunk. This patch breaks the following out into 
separate methods:
# Creation methods for various objects in the test (i.e. Dispatcher, 
ApplicationBus, ContainerBus...).
# App initialization
# Localizer initialization
# Localizer status mocking
# Localization

> Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer 
> so that it is less than 150 lines
> --
>
> Key: YARN-6004
> URL: https://issues.apache.org/jira/browse/YARN-6004
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-6004-trunk.001.patch
>
>
> The TestResourceLocalizationService#testDownloadingResourcesOnContainerKill 
> method is over 150 lines:
> bq. 
> ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java:1128:
>  @Test(timeout = 2):3: Method length is 242 lines (max allowed is 150).
> This method needs to be refactored and broken up into smaller methods.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6004) Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer so that it is less than 150 lines

2017-03-27 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo reassigned YARN-6004:
--

Assignee: Chris Trezzo  (was: luhuichun)

> Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer 
> so that it is less than 150 lines
> --
>
> Key: YARN-6004
> URL: https://issues.apache.org/jira/browse/YARN-6004
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Trivial
>  Labels: newbie
>
> The TestResourceLocalizationService#testDownloadingResourcesOnContainerKill 
> method is over 150 lines:
> bq. 
> ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java:1128:
>  @Test(timeout = 2):3: Method length is 242 lines (max allowed is 150).
> This method needs to be refactored and broken up into smaller methods.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-2960) Add documentation for the YARN shared cache

2017-03-13 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo reassigned YARN-2960:
--

Assignee: Chris Trezzo

> Add documentation for the YARN shared cache
> ---
>
> Key: YARN-2960
> URL: https://issues.apache.org/jira/browse/YARN-2960
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>
> Add documentation around the architecture, api's and administration of the 
> YARN shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6117) SharedCacheManager does not start up

2017-01-25 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838853#comment-15838853
 ] 

Chris Trezzo commented on YARN-6117:


Thanks [~sjlee0] for the review and commit!

> SharedCacheManager does not start up
> 
>
> Key: YARN-6117
> URL: https://issues.apache.org/jira/browse/YARN-6117
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 3.0.0-alpha2
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: YARN-6117-trunk.001.patch
>
>
> The webapp directory for the SharedCacheManager is missing and the SCM fails 
> to start up with the following:
> {noformat}
> 2017-01-22 00:14:25,162 INFO org.apache.hadoop.service.AbstractService: 
> Service SharedCacheManager failed in state STARTED; cause: 
> org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server
> org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server
> at 
> org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:330)
> at 
> org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:377)
> at 
> org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:373)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.webapp.SCMWebServer.serviceStart(SCMWebServer.java:65)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:157)
> Caused by: java.io.FileNotFoundException: webapps/sharedcache not found in 
> CLASSPATH
> at 
> org.apache.hadoop.http.HttpServer2.getWebAppsPath(HttpServer2.java:972)
> at org.apache.hadoop.http.HttpServer2.(HttpServer2.java:478)
> at org.apache.hadoop.http.HttpServer2.(HttpServer2.java:117)
> at 
> org.apache.hadoop.http.HttpServer2$Builder.build(HttpServer2.java:392)
> at 
> org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:291)
> ... 7 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3637) Handle localization sym-linking correctly at the YARN level

2017-01-25 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838851#comment-15838851
 ] 

Chris Trezzo commented on YARN-3637:


Thanks [~sjlee0] for the commit/review and thanks [~templedf] for the review!

> Handle localization sym-linking correctly at the YARN level
> ---
>
> Key: YARN-3637
> URL: https://issues.apache.org/jira/browse/YARN-3637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: YARN-3637-trunk.001.patch, YARN-3637-trunk.002.patch, 
> YARN-3637-trunk.003.patch
>
>
> The shared cache needs to handle resource sym-linking at the YARN layer. 
> Currently, we let the application layer (i.e. mapreduce) handle this, but it 
> is probably better for all applications if it is handled transparently.
> Here is the scenario:
> Imagine two separate jars (with unique checksums) that have the same name 
> job.jar.
> They are stored in the shared cache as two separate resources:
> checksum1/job.jar
> checksum2/job.jar
> A new application tries to use both of these resources, but internally refers 
> to them as different names:
> foo.jar maps to checksum1
> bar.jar maps to checksum2
> When the shared cache returns the path to the resources, both resources are 
> named the same (i.e. job.jar). Because of this, when the resources are 
> localized one of them clobbers the other. This is because both symlinks in 
> the container_id directory are the same name (i.e. job.jar) even though they 
> point to two separate resource directories.
> Originally we tackled this in the MapReduce client by using the fragment 
> portion of the resource url. This, however, seems like something that should 
> be solved at the YARN layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3637) Handle localization sym-linking correctly at the YARN level

2017-01-25 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838708#comment-15838708
 ] 

Chris Trezzo commented on YARN-3637:


I would say trunk and branch-2 are sufficient. Thanks [~sjlee0]!

> Handle localization sym-linking correctly at the YARN level
> ---
>
> Key: YARN-3637
> URL: https://issues.apache.org/jira/browse/YARN-3637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-3637-trunk.001.patch, YARN-3637-trunk.002.patch, 
> YARN-3637-trunk.003.patch
>
>
> The shared cache needs to handle resource sym-linking at the YARN layer. 
> Currently, we let the application layer (i.e. mapreduce) handle this, but it 
> is probably better for all applications if it is handled transparently.
> Here is the scenario:
> Imagine two separate jars (with unique checksums) that have the same name 
> job.jar.
> They are stored in the shared cache as two separate resources:
> checksum1/job.jar
> checksum2/job.jar
> A new application tries to use both of these resources, but internally refers 
> to them as different names:
> foo.jar maps to checksum1
> bar.jar maps to checksum2
> When the shared cache returns the path to the resources, both resources are 
> named the same (i.e. job.jar). Because of this, when the resources are 
> localized one of them clobbers the other. This is because both symlinks in 
> the container_id directory are the same name (i.e. job.jar) even though they 
> point to two separate resource directories.
> Originally we tackled this in the MapReduce client by using the fragment 
> portion of the resource url. This, however, seems like something that should 
> be solved at the YARN layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6117) SharedCacheManager does not start up

2017-01-24 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836777#comment-15836777
 ] 

Chris Trezzo commented on YARN-6117:


branch-2.7 might make sense as well, since this is in the 2.7.x line.

> SharedCacheManager does not start up
> 
>
> Key: YARN-6117
> URL: https://issues.apache.org/jira/browse/YARN-6117
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 3.0.0-alpha2
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6117-trunk.001.patch
>
>
> The webapp directory for the SharedCacheManager is missing and the SCM fails 
> to start up with the following:
> {noformat}
> 2017-01-22 00:14:25,162 INFO org.apache.hadoop.service.AbstractService: 
> Service SharedCacheManager failed in state STARTED; cause: 
> org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server
> org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server
> at 
> org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:330)
> at 
> org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:377)
> at 
> org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:373)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.webapp.SCMWebServer.serviceStart(SCMWebServer.java:65)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:157)
> Caused by: java.io.FileNotFoundException: webapps/sharedcache not found in 
> CLASSPATH
> at 
> org.apache.hadoop.http.HttpServer2.getWebAppsPath(HttpServer2.java:972)
> at org.apache.hadoop.http.HttpServer2.(HttpServer2.java:478)
> at org.apache.hadoop.http.HttpServer2.(HttpServer2.java:117)
> at 
> org.apache.hadoop.http.HttpServer2$Builder.build(HttpServer2.java:392)
> at 
> org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:291)
> ... 7 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6117) SharedCacheManager does not start up

2017-01-24 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836482#comment-15836482
 ] 

Chris Trezzo commented on YARN-6117:


Thanks [~sjlee0]! I think it would be good to backport to branch-2 and 
hopefully branch-2.8 so that we have a working shared cache in future 2.x 
releases. Let me know what you think.

> SharedCacheManager does not start up
> 
>
> Key: YARN-6117
> URL: https://issues.apache.org/jira/browse/YARN-6117
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 3.0.0-alpha2
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6117-trunk.001.patch
>
>
> The webapp directory for the SharedCacheManager is missing and the SCM fails 
> to start up with the following:
> {noformat}
> 2017-01-22 00:14:25,162 INFO org.apache.hadoop.service.AbstractService: 
> Service SharedCacheManager failed in state STARTED; cause: 
> org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server
> org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server
> at 
> org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:330)
> at 
> org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:377)
> at 
> org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:373)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.webapp.SCMWebServer.serviceStart(SCMWebServer.java:65)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:157)
> Caused by: java.io.FileNotFoundException: webapps/sharedcache not found in 
> CLASSPATH
> at 
> org.apache.hadoop.http.HttpServer2.getWebAppsPath(HttpServer2.java:972)
> at org.apache.hadoop.http.HttpServer2.(HttpServer2.java:478)
> at org.apache.hadoop.http.HttpServer2.(HttpServer2.java:117)
> at 
> org.apache.hadoop.http.HttpServer2$Builder.build(HttpServer2.java:392)
> at 
> org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:291)
> ... 7 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3637) Handle localization sym-linking correctly at the YARN level

2017-01-21 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-3637:
---
Attachment: YARN-3637-trunk.003.patch

Thanks [~sjlee0]! Attached is a v3 patch that updates use so that it only 
appends a fragment when the resourceName is different than the path name 
provided by the shared cache.

> Handle localization sym-linking correctly at the YARN level
> ---
>
> Key: YARN-3637
> URL: https://issues.apache.org/jira/browse/YARN-3637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-3637-trunk.001.patch, YARN-3637-trunk.002.patch, 
> YARN-3637-trunk.003.patch
>
>
> The shared cache needs to handle resource sym-linking at the YARN layer. 
> Currently, we let the application layer (i.e. mapreduce) handle this, but it 
> is probably better for all applications if it is handled transparently.
> Here is the scenario:
> Imagine two separate jars (with unique checksums) that have the same name 
> job.jar.
> They are stored in the shared cache as two separate resources:
> checksum1/job.jar
> checksum2/job.jar
> A new application tries to use both of these resources, but internally refers 
> to them as different names:
> foo.jar maps to checksum1
> bar.jar maps to checksum2
> When the shared cache returns the path to the resources, both resources are 
> named the same (i.e. job.jar). Because of this, when the resources are 
> localized one of them clobbers the other. This is because both symlinks in 
> the container_id directory are the same name (i.e. job.jar) even though they 
> point to two separate resource directories.
> Originally we tackled this in the MapReduce client by using the fragment 
> portion of the resource url. This, however, seems like something that should 
> be solved at the YARN layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-3637) Handle localization sym-linking correctly at the YARN level

2017-01-20 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832580#comment-15832580
 ] 

Chris Trezzo edited comment on YARN-3637 at 1/21/17 12:08 AM:
--

-It might be better to overload the use() method instead of replacing it.-

-[~templedf] Thinking about your previous comment some more, I may have missed 
your point the first time. I now realize that the overridden use method can 
simply honor the fragment portion of the url. If there is no fragment, then we 
can just use the original path's name as a new fragment to preserve the 
resource name. This can provide the same functionality without the extra 
parameter. I will fix the patch and post a new version. Let me know if you had 
something different in mind. Thanks again!-

I spoke too soon again... I am back to my original thought 
[above|https://issues.apache.org/jira/browse/YARN-3637?focusedCommentId=15829214&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15829214].
 The use method takes a checksum and an appId, so we still need some way of 
passing in the suggested name. I will leave the patch as is for now.


was (Author: ctrezzo):
-It might be better to overload the use() method instead of replacing it.-

-[~templedf] Thinking about your previous comment some more, I may have missed 
your point the first time. I now realize that the overridden use method can 
simply honor the fragment portion of the url. If there is no fragment, then we 
can just use the original path's name as a new fragment to preserve the 
resource name. This can provide the same functionality without the extra 
parameter. I will fix the patch and post a new version. Let me know if you had 
something different in mind. Thanks again!-

I spoke too soon again... I am back to my original thought above. The use 
method takes a checksum and an appId, so we still need some way of passing in 
the suggested name. I will leave the patch as is for now.

> Handle localization sym-linking correctly at the YARN level
> ---
>
> Key: YARN-3637
> URL: https://issues.apache.org/jira/browse/YARN-3637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-3637-trunk.001.patch, YARN-3637-trunk.002.patch
>
>
> The shared cache needs to handle resource sym-linking at the YARN layer. 
> Currently, we let the application layer (i.e. mapreduce) handle this, but it 
> is probably better for all applications if it is handled transparently.
> Here is the scenario:
> Imagine two separate jars (with unique checksums) that have the same name 
> job.jar.
> They are stored in the shared cache as two separate resources:
> checksum1/job.jar
> checksum2/job.jar
> A new application tries to use both of these resources, but internally refers 
> to them as different names:
> foo.jar maps to checksum1
> bar.jar maps to checksum2
> When the shared cache returns the path to the resources, both resources are 
> named the same (i.e. job.jar). Because of this, when the resources are 
> localized one of them clobbers the other. This is because both symlinks in 
> the container_id directory are the same name (i.e. job.jar) even though they 
> point to two separate resource directories.
> Originally we tackled this in the MapReduce client by using the fragment 
> portion of the resource url. This, however, seems like something that should 
> be solved at the YARN layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-3637) Handle localization sym-linking correctly at the YARN level

2017-01-20 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832580#comment-15832580
 ] 

Chris Trezzo edited comment on YARN-3637 at 1/21/17 12:07 AM:
--

-It might be better to overload the use() method instead of replacing it.-

-[~templedf] Thinking about your previous comment some more, I may have missed 
your point the first time. I now realize that the overridden use method can 
simply honor the fragment portion of the url. If there is no fragment, then we 
can just use the original path's name as a new fragment to preserve the 
resource name. This can provide the same functionality without the extra 
parameter. I will fix the patch and post a new version. Let me know if you had 
something different in mind. Thanks again!-

I spoke too soon again... I am back to my original thought above. The use 
method takes a checksum and an appId, so we still need some way of passing in 
the suggested name. I will leave the patch as is for now.


was (Author: ctrezzo):
bq. It might be better to overload the use() method instead of replacing it.

[~templedf] Thinking about your previous comment some more, I may have missed 
your point the first time. I now realize that the overridden use method can 
simply honor the fragment portion of the url. If there is no fragment, then we 
can just use the original path's name as a new fragment to preserve the 
resource name. This can provide the same functionality without the extra 
parameter. I will fix the patch and post a new version. Let me know if you had 
something different in mind. Thanks again!

> Handle localization sym-linking correctly at the YARN level
> ---
>
> Key: YARN-3637
> URL: https://issues.apache.org/jira/browse/YARN-3637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-3637-trunk.001.patch, YARN-3637-trunk.002.patch
>
>
> The shared cache needs to handle resource sym-linking at the YARN layer. 
> Currently, we let the application layer (i.e. mapreduce) handle this, but it 
> is probably better for all applications if it is handled transparently.
> Here is the scenario:
> Imagine two separate jars (with unique checksums) that have the same name 
> job.jar.
> They are stored in the shared cache as two separate resources:
> checksum1/job.jar
> checksum2/job.jar
> A new application tries to use both of these resources, but internally refers 
> to them as different names:
> foo.jar maps to checksum1
> bar.jar maps to checksum2
> When the shared cache returns the path to the resources, both resources are 
> named the same (i.e. job.jar). Because of this, when the resources are 
> localized one of them clobbers the other. This is because both symlinks in 
> the container_id directory are the same name (i.e. job.jar) even though they 
> point to two separate resource directories.
> Originally we tackled this in the MapReduce client by using the fragment 
> portion of the resource url. This, however, seems like something that should 
> be solved at the YARN layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3637) Handle localization sym-linking correctly at the YARN level

2017-01-20 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832580#comment-15832580
 ] 

Chris Trezzo commented on YARN-3637:


bq. It might be better to overload the use() method instead of replacing it.

[~templedf] Thinking about your previous comment some more, I may have missed 
your point the first time. I now realize that the overridden use method can 
simply honor the fragment portion of the url. If there is no fragment, then we 
can just use the original path's name as a new fragment to preserve the 
resource name. This can provide the same functionality without the extra 
parameter. I will fix the patch and post a new version. Let me know if you had 
something different in mind. Thanks again!

> Handle localization sym-linking correctly at the YARN level
> ---
>
> Key: YARN-3637
> URL: https://issues.apache.org/jira/browse/YARN-3637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-3637-trunk.001.patch, YARN-3637-trunk.002.patch
>
>
> The shared cache needs to handle resource sym-linking at the YARN layer. 
> Currently, we let the application layer (i.e. mapreduce) handle this, but it 
> is probably better for all applications if it is handled transparently.
> Here is the scenario:
> Imagine two separate jars (with unique checksums) that have the same name 
> job.jar.
> They are stored in the shared cache as two separate resources:
> checksum1/job.jar
> checksum2/job.jar
> A new application tries to use both of these resources, but internally refers 
> to them as different names:
> foo.jar maps to checksum1
> bar.jar maps to checksum2
> When the shared cache returns the path to the resources, both resources are 
> named the same (i.e. job.jar). Because of this, when the resources are 
> localized one of them clobbers the other. This is because both symlinks in 
> the container_id directory are the same name (i.e. job.jar) even though they 
> point to two separate resource directories.
> Originally we tackled this in the MapReduce client by using the fragment 
> portion of the resource url. This, however, seems like something that should 
> be solved at the YARN layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches

2017-01-19 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15830957#comment-15830957
 ] 

Chris Trezzo commented on YARN-5797:


Bump for a review/commit. Thanks!

> Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
> --
>
> Key: YARN-5797
> URL: https://issues.apache.org/jira/browse/YARN-5797
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-5797-trunk.002.patch, YARN-5797-trunk-v1.patch
>
>
> Add new metrics to the node manager around the local cache sizes and how much 
> is being cleaned from them on a regular bases. For example, we can expose 
> information contained in the {{LocalCacheCleanerStats}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3637) Handle localization sym-linking correctly at the YARN level

2017-01-18 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-3637:
---
Attachment: YARN-3637-trunk.002.patch

Thanks [~templedf] for the review! Attached is a v2 patch addressing your 
comments.

The one thing that deviates from your suggestions is overriding the use method. 
I kept it the way it was, but adjusted the comments to make it seem required to 
provide a resourceName. From an API design standpoint, I tried to come up with 
a use case where someone would not want to specify a resource name. By not 
specifying a resource name the user is essentially saying: "I don't care what 
the resource is named and regardless of the name it will not conflict with any 
other resources the container localizes." The only situation I can come up with 
where that is true is if it is the only resource they are using in the 
container (i.e. the only symlink that gets created). Outside of this case, the 
path provided by the use method without a resourceName might create unintended 
behavior due to naming conflicts when YARN localization creates the container 
resource symlinks. I could add a null check for resourceName as well if we want 
to make this stronger.

As for compatibility, this is an unstable api for a feature that is not GA yet, 
so hopefully it is OK to change the API.

Let me know your thoughts around this. Thanks!

> Handle localization sym-linking correctly at the YARN level
> ---
>
> Key: YARN-3637
> URL: https://issues.apache.org/jira/browse/YARN-3637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-3637-trunk.001.patch, YARN-3637-trunk.002.patch
>
>
> The shared cache needs to handle resource sym-linking at the YARN layer. 
> Currently, we let the application layer (i.e. mapreduce) handle this, but it 
> is probably better for all applications if it is handled transparently.
> Here is the scenario:
> Imagine two separate jars (with unique checksums) that have the same name 
> job.jar.
> They are stored in the shared cache as two separate resources:
> checksum1/job.jar
> checksum2/job.jar
> A new application tries to use both of these resources, but internally refers 
> to them as different names:
> foo.jar maps to checksum1
> bar.jar maps to checksum2
> When the shared cache returns the path to the resources, both resources are 
> named the same (i.e. job.jar). Because of this, when the resources are 
> localized one of them clobbers the other. This is because both symlinks in 
> the container_id directory are the same name (i.e. job.jar) even though they 
> point to two separate resource directories.
> Originally we tackled this in the MapReduce client by using the fragment 
> portion of the resource url. This, however, seems like something that should 
> be solved at the YARN layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6097) Add support for directories in the Shared Cache

2017-01-13 Thread Chris Trezzo (JIRA)
Chris Trezzo created YARN-6097:
--

 Summary: Add support for directories in the Shared Cache
 Key: YARN-6097
 URL: https://issues.apache.org/jira/browse/YARN-6097
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chris Trezzo


Add support for directories in the shared cache.

If a LocalResource URL points to a directory, the directory structure is 
preserved during localization on the node manager. Currently, the shared cache 
does not support directories and will fail to upload the URL to the cache if 
shouldBeUploadedToSharedCache is set to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3637) Handle localization sym-linking correctly at the YARN level

2017-01-11 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-3637:
---
Attachment: YARN-3637-trunk.001.patch

Attached is a v01 patch for handling symlink names and fragments as part of the 
shared cache yarn api. The major part of the patch adds a new parameter to the 
use api call. This allows a user to specify a preferred name for a resources 
even if the name of the resource in the shared cache is different. With this 
additional parameter, the user can avoid naming conflicts that happen when 
using resources from the shared cache. Note that this patch does not solve the 
existing problem in YARN where resource symlinks get clobbered if two resources 
are specified with the same name. Furthermore, this approach assumes the path 
returned is going to be used to create a LocalResource and is leveraging the 
way YARN localization uses the fragment portion of a URI.

I think this makes it slightly easier for developers to implement shared cache 
support in their YARN application by abstracting away symlink/fragment 
management. Thoughts [~sjlee0] or anyone else?

> Handle localization sym-linking correctly at the YARN level
> ---
>
> Key: YARN-3637
> URL: https://issues.apache.org/jira/browse/YARN-3637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-3637-trunk.001.patch
>
>
> The shared cache needs to handle resource sym-linking at the YARN layer. 
> Currently, we let the application layer (i.e. mapreduce) handle this, but it 
> is probably better for all applications if it is handled transparently.
> Here is the scenario:
> Imagine two separate jars (with unique checksums) that have the same name 
> job.jar.
> They are stored in the shared cache as two separate resources:
> checksum1/job.jar
> checksum2/job.jar
> A new application tries to use both of these resources, but internally refers 
> to them as different names:
> foo.jar maps to checksum1
> bar.jar maps to checksum2
> When the shared cache returns the path to the resources, both resources are 
> named the same (i.e. job.jar). Because of this, when the resources are 
> localized one of them clobbers the other. This is because both symlinks in 
> the container_id directory are the same name (i.e. job.jar) even though they 
> point to two separate resource directories.
> Originally we tackled this in the MapReduce client by using the fragment 
> portion of the resource url. This, however, seems like something that should 
> be solved at the YARN layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches

2016-12-14 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749064#comment-15749064
 ] 

Chris Trezzo commented on YARN-5797:


Jira filed: YARN-6004

> Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
> --
>
> Key: YARN-5797
> URL: https://issues.apache.org/jira/browse/YARN-5797
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-5797-trunk-v1.patch, YARN-5797-trunk.002.patch
>
>
> Add new metrics to the node manager around the local cache sizes and how much 
> is being cleaned from them on a regular bases. For example, we can expose 
> information contained in the {{LocalCacheCleanerStats}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6004) Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer so that it is less than 150 lines

2016-12-14 Thread Chris Trezzo (JIRA)
Chris Trezzo created YARN-6004:
--

 Summary: Refactor 
TestResourceLocalizationService#testDownloadingResourcesOnContainer so that it 
is less than 150 lines
 Key: YARN-6004
 URL: https://issues.apache.org/jira/browse/YARN-6004
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: Chris Trezzo
Priority: Trivial


The TestResourceLocalizationService#testDownloadingResourcesOnContainerKill 
method is over 150 lines:
bq. 
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java:1128:
 @Test(timeout = 2):3: Method length is 242 lines (max allowed is 150).

This method needs to be refactored and broken up into smaller methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches

2016-12-14 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749056#comment-15749056
 ] 

Chris Trezzo commented on YARN-5797:


The checkstyle warning is because the 
{{TestResourceLocalizationService#testDownloadingResourcesOnContainerKill}}  
method is over 150 lines:
bq. 
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java:1128:
  @Test(timeout = 2):3: Method length is 242 lines (max allowed is 150).

This patch barely touches the method (i.e. changes 1 line) so I think it would 
be wrong to refactor this method as part of this patch. I will file a jira to 
address the method refactor.

Thanks!

> Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
> --
>
> Key: YARN-5797
> URL: https://issues.apache.org/jira/browse/YARN-5797
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-5797-trunk-v1.patch, YARN-5797-trunk.002.patch
>
>
> Add new metrics to the node manager around the local cache sizes and how much 
> is being cleaned from them on a regular bases. For example, we can expose 
> information contained in the {{LocalCacheCleanerStats}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches

2016-12-13 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-5797:
---
Attachment: YARN-5797-trunk.002.patch

Attaching v2 to fix checkstyle issues. Once I get a +1 for the trunk patch I 
will create another version for branch-2. Thanks!

> Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
> --
>
> Key: YARN-5797
> URL: https://issues.apache.org/jira/browse/YARN-5797
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-5797-trunk-v1.patch, YARN-5797-trunk.002.patch
>
>
> Add new metrics to the node manager around the local cache sizes and how much 
> is being cleaned from them on a regular bases. For example, we can expose 
> information contained in the {{LocalCacheCleanerStats}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches

2016-11-02 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630595#comment-15630595
 ] 

Chris Trezzo commented on YARN-5797:


Note that the patch exposes the following metrics about the cache cleanup:
# cacheSizeBeforeClean - The local cache size (public and private) before clean 
in Bytes
# totalBytesDeleted - # of total bytes deleted from the public and private 
local cache
# publicBytesDeleted - # of bytes deleted from the public local cache
# privateBytesDeleted - # of bytes deleted from the private local cache

{{LocalCacheCleanerStats}} also exposes the individual amounts deleted (in 
bytes) from each user private cache. I wasn't quite sure of a good way to 
expose this via metrics, so I left it out of the current patch.

> Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
> --
>
> Key: YARN-5797
> URL: https://issues.apache.org/jira/browse/YARN-5797
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-5797-trunk-v1.patch
>
>
> Add new metrics to the node manager around the local cache sizes and how much 
> is being cleaned from them on a regular bases. For example, we can expose 
> information contained in the {{LocalCacheCleanerStats}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches

2016-10-31 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624305#comment-15624305
 ] 

Chris Trezzo edited comment on YARN-5797 at 11/1/16 4:27 AM:
-

Attaching v1 patch to get a qa run. Summary:

# Added metrics to {{NodeManagerMetrics}} that will expose stats from 
{{LocalCacheCleanerStats}}.
# Adjusted {{ResourceLocalizationService}} constructor to take a 
{{NodeManagerMetrics}} param. Also adjusted {{handleCacheCleanup}} to update 
the new metrics.
# Adjusted {{TestLocalCacheCleanup}} to cover metrics as well.
# Refactored other unit tests to adjust for change in 
{{ResourceLocalizationService}} constructor.


was (Author: ctrezzo):
Attaching v1 patch to get a qa run. Summary:

# Added metrics that expose stats from {{LocalCacheCleanerStats}}.
# Adjusted {{TestLocalCacheCleanup}} to cover metrics as well.
# Refactored other unit tests to adjust for change in 
{{ResourceLocalizationService}} constructor to pass in {{NodeManagerMetrics}}.

> Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
> --
>
> Key: YARN-5797
> URL: https://issues.apache.org/jira/browse/YARN-5797
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-5797-trunk-v1.patch
>
>
> Add new metrics to the node manager around the local cache sizes and how much 
> is being cleaned from them on a regular bases. For example, we can expose 
> information contained in the {{LocalCacheCleanerStats}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches

2016-10-31 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-5797:
---
Attachment: YARN-5797-trunk-v1.patch

Attaching v1 patch to get a qa run. Summary:

# Added metrics that expose stats from {{LocalCacheCleanerStats}}.
# Adjusted {{TestLocalCacheCleanup}} to cover metrics as well.
# Refactored other unit tests to adjust for change in 
{{ResourceLocalizationService}} constructor to pass in {{NodeManagerMetrics}}.

> Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
> --
>
> Key: YARN-5797
> URL: https://issues.apache.org/jira/browse/YARN-5797
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-5797-trunk-v1.patch
>
>
> Add new metrics to the node manager around the local cache sizes and how much 
> is being cleaned from them on a regular bases. For example, we can expose 
> information contained in the {{LocalCacheCleanerStats}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches

2016-10-28 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-5767:
---
Release Note: This issue fixes a bug in how resources are evicted from the 
PUBLIC and PRIVATE yarn local caches used by the node manager for resource 
localization. In summary, the caches are now properly cleaned based on an LRU 
policy across both the public and private caches.  (was: This issue fixes a bug 
in how resources were evicted from the PUBLIC and PRIVATE yarn local caches 
used by the node manager for resource localization. In summary, the caches are 
now properly cleaned based on an LRU policy across both the public and private 
caches.)

> Fix the order that resources are cleaned up from the local Public/Private 
> caches
> 
>
> Key: YARN-5767
> URL: https://issues.apache.org/jira/browse/YARN-5767
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>  Labels: oct16-easy
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: YARN-5767-trunk-v1.patch, YARN-5767-trunk-v2.patch, 
> YARN-5767-trunk-v3.patch, YARN-5767-trunk-v4.patch
>
>
> If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can 
> see that public resources are added to the {{ResourceRetentionSet}} first 
> followed by private resources:
> {code:java}
> private void handleCacheCleanup(LocalizationEvent event) {
>   ResourceRetentionSet retain =
> new ResourceRetentionSet(delService, cacheTargetSize);
>   retain.addResources(publicRsrc);
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Resource cleanup (public) " + retain);
>   }
>   for (LocalResourcesTracker t : privateRsrc.values()) {
> retain.addResources(t);
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Resource cleanup " + t.getUser() + ":" + retain);
> }
>   }
>   //TODO Check if appRsrcs should also be added to the retention set.
> }
> {code}
> Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see 
> that this means public resources are deleted first until the target cache 
> size is met:
> {code:java}
> public void addResources(LocalResourcesTracker newTracker) {
>   for (LocalizedResource resource : newTracker) {
> currentSize += resource.getSize();
> if (resource.getRefCount() > 0) {
>   // always retain resources in use
>   continue;
> }
> retain.put(resource, newTracker);
>   }
>   for (Iterator> i =
>  retain.entrySet().iterator();
>currentSize - delSize > targetSize && i.hasNext();) {
> Map.Entry rsrc = i.next();
> LocalizedResource resource = rsrc.getKey();
> LocalResourcesTracker tracker = rsrc.getValue();
> if (tracker.remove(resource, delService)) {
>   delSize += resource.getSize();
>   i.remove();
> }
>   }
> }
> {code}
> The result of this is that resources in the private cache are only deleted in 
> the cases where:
> # The cache size is larger than the target cache size and the public cache is 
> empty.
> # The cache size is larger than the target cache size and everything in the 
> public cache is being used by a running container.
> For clusters that primarily use the public cache (i.e. make use of the shared 
> cache), this means that the most commonly used resources can be deleted 
> before old resources in the private cache. Furthermore, the private cache can 
> continue to grow over time causing more and more churn in the public cache.
> Additionally, the same problem exists within the private cache. Since 
> resources are added to the retention set on a user by user basis, resources 
> will get cleaned up one user at a time in the order that privateRsrc.values() 
> returns the LocalResourcesTracker. So if user1 has 10MB in their cache and 
> user2 has 100MB in their cache and the target size of the cache is 50MB, 
> user1 could potentially have their entire cache removed before anything is 
> deleted from the user2 cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches

2016-10-28 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615991#comment-15615991
 ] 

Chris Trezzo commented on YARN-5767:


I have filed a followup jira for the metrics: YARN-5797

> Fix the order that resources are cleaned up from the local Public/Private 
> caches
> 
>
> Key: YARN-5767
> URL: https://issues.apache.org/jira/browse/YARN-5767
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>  Labels: oct16-easy
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: YARN-5767-trunk-v1.patch, YARN-5767-trunk-v2.patch, 
> YARN-5767-trunk-v3.patch, YARN-5767-trunk-v4.patch
>
>
> If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can 
> see that public resources are added to the {{ResourceRetentionSet}} first 
> followed by private resources:
> {code:java}
> private void handleCacheCleanup(LocalizationEvent event) {
>   ResourceRetentionSet retain =
> new ResourceRetentionSet(delService, cacheTargetSize);
>   retain.addResources(publicRsrc);
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Resource cleanup (public) " + retain);
>   }
>   for (LocalResourcesTracker t : privateRsrc.values()) {
> retain.addResources(t);
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Resource cleanup " + t.getUser() + ":" + retain);
> }
>   }
>   //TODO Check if appRsrcs should also be added to the retention set.
> }
> {code}
> Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see 
> that this means public resources are deleted first until the target cache 
> size is met:
> {code:java}
> public void addResources(LocalResourcesTracker newTracker) {
>   for (LocalizedResource resource : newTracker) {
> currentSize += resource.getSize();
> if (resource.getRefCount() > 0) {
>   // always retain resources in use
>   continue;
> }
> retain.put(resource, newTracker);
>   }
>   for (Iterator> i =
>  retain.entrySet().iterator();
>currentSize - delSize > targetSize && i.hasNext();) {
> Map.Entry rsrc = i.next();
> LocalizedResource resource = rsrc.getKey();
> LocalResourcesTracker tracker = rsrc.getValue();
> if (tracker.remove(resource, delService)) {
>   delSize += resource.getSize();
>   i.remove();
> }
>   }
> }
> {code}
> The result of this is that resources in the private cache are only deleted in 
> the cases where:
> # The cache size is larger than the target cache size and the public cache is 
> empty.
> # The cache size is larger than the target cache size and everything in the 
> public cache is being used by a running container.
> For clusters that primarily use the public cache (i.e. make use of the shared 
> cache), this means that the most commonly used resources can be deleted 
> before old resources in the private cache. Furthermore, the private cache can 
> continue to grow over time causing more and more churn in the public cache.
> Additionally, the same problem exists within the private cache. Since 
> resources are added to the retention set on a user by user basis, resources 
> will get cleaned up one user at a time in the order that privateRsrc.values() 
> returns the LocalResourcesTracker. So if user1 has 10MB in their cache and 
> user2 has 100MB in their cache and the target size of the cache is 50MB, 
> user1 could potentially have their entire cache removed before anything is 
> deleted from the user2 cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches

2016-10-28 Thread Chris Trezzo (JIRA)
Chris Trezzo created YARN-5797:
--

 Summary: Add metrics to the node manager for cleaning the PUBLIC 
and PRIVATE caches
 Key: YARN-5797
 URL: https://issues.apache.org/jira/browse/YARN-5797
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Chris Trezzo
Assignee: Chris Trezzo


Add new metrics to the node manager around the local cache sizes and how much 
is being cleaned from them on a regular bases. For example, we can expose 
information contained in the {{LocalCacheCleanerStats}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches

2016-10-28 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-5767:
---
Release Note: This issue fixes a bug in how resources were evicted from the 
PUBLIC and PRIVATE yarn local caches used by the node manager for resource 
localization. In summary, the caches are now properly cleaned based on an LRU 
policy across both the public and private caches.

> Fix the order that resources are cleaned up from the local Public/Private 
> caches
> 
>
> Key: YARN-5767
> URL: https://issues.apache.org/jira/browse/YARN-5767
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>  Labels: oct16-easy
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: YARN-5767-trunk-v1.patch, YARN-5767-trunk-v2.patch, 
> YARN-5767-trunk-v3.patch, YARN-5767-trunk-v4.patch
>
>
> If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can 
> see that public resources are added to the {{ResourceRetentionSet}} first 
> followed by private resources:
> {code:java}
> private void handleCacheCleanup(LocalizationEvent event) {
>   ResourceRetentionSet retain =
> new ResourceRetentionSet(delService, cacheTargetSize);
>   retain.addResources(publicRsrc);
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Resource cleanup (public) " + retain);
>   }
>   for (LocalResourcesTracker t : privateRsrc.values()) {
> retain.addResources(t);
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Resource cleanup " + t.getUser() + ":" + retain);
> }
>   }
>   //TODO Check if appRsrcs should also be added to the retention set.
> }
> {code}
> Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see 
> that this means public resources are deleted first until the target cache 
> size is met:
> {code:java}
> public void addResources(LocalResourcesTracker newTracker) {
>   for (LocalizedResource resource : newTracker) {
> currentSize += resource.getSize();
> if (resource.getRefCount() > 0) {
>   // always retain resources in use
>   continue;
> }
> retain.put(resource, newTracker);
>   }
>   for (Iterator> i =
>  retain.entrySet().iterator();
>currentSize - delSize > targetSize && i.hasNext();) {
> Map.Entry rsrc = i.next();
> LocalizedResource resource = rsrc.getKey();
> LocalResourcesTracker tracker = rsrc.getValue();
> if (tracker.remove(resource, delService)) {
>   delSize += resource.getSize();
>   i.remove();
> }
>   }
> }
> {code}
> The result of this is that resources in the private cache are only deleted in 
> the cases where:
> # The cache size is larger than the target cache size and the public cache is 
> empty.
> # The cache size is larger than the target cache size and everything in the 
> public cache is being used by a running container.
> For clusters that primarily use the public cache (i.e. make use of the shared 
> cache), this means that the most commonly used resources can be deleted 
> before old resources in the private cache. Furthermore, the private cache can 
> continue to grow over time causing more and more churn in the public cache.
> Additionally, the same problem exists within the private cache. Since 
> resources are added to the retention set on a user by user basis, resources 
> will get cleaned up one user at a time in the order that privateRsrc.values() 
> returns the LocalResourcesTracker. So if user1 has 10MB in their cache and 
> user2 has 100MB in their cache and the target size of the cache is 50MB, 
> user1 could potentially have their entire cache removed before anything is 
> deleted from the user2 cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches

2016-10-28 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615974#comment-15615974
 ] 

Chris Trezzo commented on YARN-5767:


Thanks [~jlowe]! I will add release notes to the issue as well.

> Fix the order that resources are cleaned up from the local Public/Private 
> caches
> 
>
> Key: YARN-5767
> URL: https://issues.apache.org/jira/browse/YARN-5767
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>  Labels: oct16-easy
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: YARN-5767-trunk-v1.patch, YARN-5767-trunk-v2.patch, 
> YARN-5767-trunk-v3.patch, YARN-5767-trunk-v4.patch
>
>
> If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can 
> see that public resources are added to the {{ResourceRetentionSet}} first 
> followed by private resources:
> {code:java}
> private void handleCacheCleanup(LocalizationEvent event) {
>   ResourceRetentionSet retain =
> new ResourceRetentionSet(delService, cacheTargetSize);
>   retain.addResources(publicRsrc);
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Resource cleanup (public) " + retain);
>   }
>   for (LocalResourcesTracker t : privateRsrc.values()) {
> retain.addResources(t);
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Resource cleanup " + t.getUser() + ":" + retain);
> }
>   }
>   //TODO Check if appRsrcs should also be added to the retention set.
> }
> {code}
> Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see 
> that this means public resources are deleted first until the target cache 
> size is met:
> {code:java}
> public void addResources(LocalResourcesTracker newTracker) {
>   for (LocalizedResource resource : newTracker) {
> currentSize += resource.getSize();
> if (resource.getRefCount() > 0) {
>   // always retain resources in use
>   continue;
> }
> retain.put(resource, newTracker);
>   }
>   for (Iterator> i =
>  retain.entrySet().iterator();
>currentSize - delSize > targetSize && i.hasNext();) {
> Map.Entry rsrc = i.next();
> LocalizedResource resource = rsrc.getKey();
> LocalResourcesTracker tracker = rsrc.getValue();
> if (tracker.remove(resource, delService)) {
>   delSize += resource.getSize();
>   i.remove();
> }
>   }
> }
> {code}
> The result of this is that resources in the private cache are only deleted in 
> the cases where:
> # The cache size is larger than the target cache size and the public cache is 
> empty.
> # The cache size is larger than the target cache size and everything in the 
> public cache is being used by a running container.
> For clusters that primarily use the public cache (i.e. make use of the shared 
> cache), this means that the most commonly used resources can be deleted 
> before old resources in the private cache. Furthermore, the private cache can 
> continue to grow over time causing more and more churn in the public cache.
> Additionally, the same problem exists within the private cache. Since 
> resources are added to the retention set on a user by user basis, resources 
> will get cleaned up one user at a time in the order that privateRsrc.values() 
> returns the LocalResourcesTracker. So if user1 has 10MB in their cache and 
> user2 has 100MB in their cache and the target size of the cache is 50MB, 
> user1 could potentially have their entire cache removed before anything is 
> deleted from the user2 cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4950) configure parallel-tests for yarn-client and yarn-server-resourcemanager

2016-10-27 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-4950:
---
Labels: oct16-hard  (was: oct16-easy)

> configure parallel-tests for yarn-client and yarn-server-resourcemanager
> 
>
> Key: YARN-4950
> URL: https://issues.apache.org/jira/browse/YARN-4950
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Priority: Critical
>  Labels: oct16-hard
> Attachments: YARN-4950.00.patch
>
>
> Unit tests for yarn-client and yarn-server-resourcemanager take over an hour 
> each.  The parallel-tests profile should be configured to reduce the 
> execution time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5027) NM should clean up app log dirs after NM restart

2016-10-27 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613303#comment-15613303
 ] 

Chris Trezzo commented on YARN-5027:


bq. The exists check is on the root log dir, not the app log dirs.
Ah. Gotcha. +1 from me as well.

> NM should clean up app log dirs after NM restart 
> -
>
> Key: YARN-5027
> URL: https://issues.apache.org/jira/browse/YARN-5027
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: sandflee
>Assignee: sandflee
>  Labels: oct16-easy
> Attachments: YARN-5027.01.patch
>
>
> If nm start without recovery enabled, there may be many deprecated app log 
> dir in NM log dirs, NM should clean up them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5027) NM should clean up app log dirs after NM restart

2016-10-27 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613251#comment-15613251
 ] 

Chris Trezzo edited comment on YARN-5027 at 10/27/16 9:28 PM:
--

Does the current patch have the potential to leak 
{{application\_\*\_\*\_DEL\_\*}} directories? In 
{{ResourceLocalizationService#cleanupLogDir}} would we want to check if the 
renamed \_DEL\_ directory exists in the case where the original log directory 
doesn't exist and delete if necessary?


was (Author: ctrezzo):
Does the current patch have the potential to leak 
{{application\_\*\_\*\_DEL\_\*}} directories? In 
{{ResourceLocalizationService#cleanupLogDir}} would we want to check if the 
renamed _DEL_ directory exists in the case where the original log directory 
doesn't exist and delete if necessary?

> NM should clean up app log dirs after NM restart 
> -
>
> Key: YARN-5027
> URL: https://issues.apache.org/jira/browse/YARN-5027
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: sandflee
>Assignee: sandflee
>  Labels: oct16-easy
> Attachments: YARN-5027.01.patch
>
>
> If nm start without recovery enabled, there may be many deprecated app log 
> dir in NM log dirs, NM should clean up them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5027) NM should clean up app log dirs after NM restart

2016-10-27 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613251#comment-15613251
 ] 

Chris Trezzo commented on YARN-5027:


Does the current patch have the potential to leak {{application_*_*_DEL_*}} 
directories? In {{ResourceLocalizationService#cleanupLogDir}} would we want to 
check if the renamed _DEL_ directory exists in the case where the original log 
directory doesn't exist and delete if necessary?

> NM should clean up app log dirs after NM restart 
> -
>
> Key: YARN-5027
> URL: https://issues.apache.org/jira/browse/YARN-5027
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: sandflee
>Assignee: sandflee
>  Labels: oct16-easy
> Attachments: YARN-5027.01.patch
>
>
> If nm start without recovery enabled, there may be many deprecated app log 
> dir in NM log dirs, NM should clean up them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5027) NM should clean up app log dirs after NM restart

2016-10-27 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613251#comment-15613251
 ] 

Chris Trezzo edited comment on YARN-5027 at 10/27/16 9:20 PM:
--

Does the current patch have the potential to leak 
{{application\_\*\_\*\_DEL\_\*}} directories? In 
{{ResourceLocalizationService#cleanupLogDir}} would we want to check if the 
renamed _DEL_ directory exists in the case where the original log directory 
doesn't exist and delete if necessary?


was (Author: ctrezzo):
Does the current patch have the potential to leak {{application_*_*_DEL_*}} 
directories? In {{ResourceLocalizationService#cleanupLogDir}} would we want to 
check if the renamed _DEL_ directory exists in the case where the original log 
directory doesn't exist and delete if necessary?

> NM should clean up app log dirs after NM restart 
> -
>
> Key: YARN-5027
> URL: https://issues.apache.org/jira/browse/YARN-5027
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: sandflee
>Assignee: sandflee
>  Labels: oct16-easy
> Attachments: YARN-5027.01.patch
>
>
> If nm start without recovery enabled, there may be many deprecated app log 
> dir in NM log dirs, NM should clean up them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5258) Document Use of Docker with LinuxContainerExecutor

2016-10-27 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-5258:
---
Labels: oct-16-easy  (was: )

> Document Use of Docker with LinuxContainerExecutor
> --
>
> Key: YARN-5258
> URL: https://issues.apache.org/jira/browse/YARN-5258
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct-16-easy
> Attachments: YARN-5258.001.patch, YARN-5258.002.patch
>
>
> There aren't currently any docs that explain how to configure Docker and all 
> of its various options aside from reading all of the JIRAs.  We need to 
> document the configuration, use, and troubleshooting, along with helpful 
> examples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4950) configure parallel-tests for yarn-client and yarn-server-resourcemanager

2016-10-27 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-4950:
---
Labels: oct-16-easy  (was: )

> configure parallel-tests for yarn-client and yarn-server-resourcemanager
> 
>
> Key: YARN-4950
> URL: https://issues.apache.org/jira/browse/YARN-4950
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Priority: Critical
>  Labels: oct-16-easy
> Attachments: YARN-4950.00.patch
>
>
> Unit tests for yarn-client and yarn-server-resourcemanager take over an hour 
> each.  The parallel-tests profile should be configured to reduce the 
> execution time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4948) Support node labels store in zookeeper

2016-10-27 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-4948:
---
Labels: oct16-hard  (was: )

> Support node labels store in zookeeper
> --
>
> Key: YARN-4948
> URL: https://issues.apache.org/jira/browse/YARN-4948
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: jialei weng
>Assignee: jialei weng
>  Labels: oct16-hard
> Attachments: YARN-4948.001.patch, YARN-4948.002.patch, 
> YARN-4948.003.patch, YARN-4948.006.patch, YARN-4948.007.patch
>
>
> Support node labels store in zookeeper. The main scenario for this is to give 
> a way to decouple yarn with HDFS. Since nodelabel is a very important data 
> for yarn, if hdfs down, yarn will fail to start up,too. So it is meaningful 
> for make yarn much independence when user serve both yarn and HDFS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4907) Make all MockRM#waitForState consistent.

2016-10-27 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-4907:
---
Component/s: resourcemanager

> Make all MockRM#waitForState consistent. 
> -
>
> Key: YARN-4907
> URL: https://issues.apache.org/jira/browse/YARN-4907
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>  Labels: oct16-medium
> Attachments: YARN-4907.001.patch
>
>
> There are some inconsistencies among these {{waitForState}} in {{MockRM}}:
> 1. Some {{waitForState}} return a boolean while others don't.  
> 2. Some {{waitForState}} don't have a timeout, they can wait for ever. 
> 3. Some {{waitForState}} use LOG.info and others use {{System.out.println}} 
> to print messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4907) Make all MockRM#waitForState consistent.

2016-10-27 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-4907:
---
Labels: oct16-medium  (was: )

> Make all MockRM#waitForState consistent. 
> -
>
> Key: YARN-4907
> URL: https://issues.apache.org/jira/browse/YARN-4907
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>  Labels: oct16-medium
> Attachments: YARN-4907.001.patch
>
>
> There are some inconsistencies among these {{waitForState}} in {{MockRM}}:
> 1. Some {{waitForState}} return a boolean while others don't.  
> 2. Some {{waitForState}} don't have a timeout, they can wait for ever. 
> 3. Some {{waitForState}} use LOG.info and others use {{System.out.println}} 
> to print messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4900) SLS MRAMSimulator should include scheduledMappers/Reducers when re-request failed tasks

2016-10-27 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-4900:
---
Labels: oct16-medium  (was: )

> SLS MRAMSimulator should include scheduledMappers/Reducers when re-request 
> failed tasks
> ---
>
> Key: YARN-4900
> URL: https://issues.apache.org/jira/browse/YARN-4900
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>  Labels: oct16-medium
> Attachments: YARN-4900.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4900) SLS MRAMSimulator should include scheduledMappers/Reducers when re-request failed tasks

2016-10-27 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-4900:
---
Component/s: scheduler-load-simulator

> SLS MRAMSimulator should include scheduledMappers/Reducers when re-request 
> failed tasks
> ---
>
> Key: YARN-4900
> URL: https://issues.apache.org/jira/browse/YARN-4900
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4900.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4899) Queue metrics of SLS capacity scheduler only activated after app submit to the queue

2016-10-27 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-4899:
---
 Labels: oct16-medium  (was: )
Component/s: capacity scheduler

> Queue metrics of SLS capacity scheduler only activated after app submit to 
> the queue
> 
>
> Key: YARN-4899
> URL: https://issues.apache.org/jira/browse/YARN-4899
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>  Labels: oct16-medium
> Attachments: YARN-4899.1.patch
>
>
> We should start recording queue metrics since cluster start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches

2016-10-26 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610147#comment-15610147
 ] 

Chris Trezzo commented on YARN-5767:


Two notes:
# Instead of removing {{LocalCacheCleanerStats.getUserDelSizes()}} I made it 
return an unmodifiable map. That way, users of the class still have access to 
the data (outside of the toString method) and it is still protected.
# I wound up removing both {{LRUComparator.equals}} and 
{{LRUComparator.hashCode}}. I figure we don't need to override them since the 
methods were just using the default implementation anyways.

My intention is to file a followup jira that adds metrics that expose the 
statistics from {{LocalCacheCleanerStats}}.

> Fix the order that resources are cleaned up from the local Public/Private 
> caches
> 
>
> Key: YARN-5767
> URL: https://issues.apache.org/jira/browse/YARN-5767
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-5767-trunk-v1.patch, YARN-5767-trunk-v2.patch, 
> YARN-5767-trunk-v3.patch, YARN-5767-trunk-v4.patch
>
>
> If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can 
> see that public resources are added to the {{ResourceRetentionSet}} first 
> followed by private resources:
> {code:java}
> private void handleCacheCleanup(LocalizationEvent event) {
>   ResourceRetentionSet retain =
> new ResourceRetentionSet(delService, cacheTargetSize);
>   retain.addResources(publicRsrc);
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Resource cleanup (public) " + retain);
>   }
>   for (LocalResourcesTracker t : privateRsrc.values()) {
> retain.addResources(t);
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Resource cleanup " + t.getUser() + ":" + retain);
> }
>   }
>   //TODO Check if appRsrcs should also be added to the retention set.
> }
> {code}
> Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see 
> that this means public resources are deleted first until the target cache 
> size is met:
> {code:java}
> public void addResources(LocalResourcesTracker newTracker) {
>   for (LocalizedResource resource : newTracker) {
> currentSize += resource.getSize();
> if (resource.getRefCount() > 0) {
>   // always retain resources in use
>   continue;
> }
> retain.put(resource, newTracker);
>   }
>   for (Iterator> i =
>  retain.entrySet().iterator();
>currentSize - delSize > targetSize && i.hasNext();) {
> Map.Entry rsrc = i.next();
> LocalizedResource resource = rsrc.getKey();
> LocalResourcesTracker tracker = rsrc.getValue();
> if (tracker.remove(resource, delService)) {
>   delSize += resource.getSize();
>   i.remove();
> }
>   }
> }
> {code}
> The result of this is that resources in the private cache are only deleted in 
> the cases where:
> # The cache size is larger than the target cache size and the public cache is 
> empty.
> # The cache size is larger than the target cache size and everything in the 
> public cache is being used by a running container.
> For clusters that primarily use the public cache (i.e. make use of the shared 
> cache), this means that the most commonly used resources can be deleted 
> before old resources in the private cache. Furthermore, the private cache can 
> continue to grow over time causing more and more churn in the public cache.
> Additionally, the same problem exists within the private cache. Since 
> resources are added to the retention set on a user by user basis, resources 
> will get cleaned up one user at a time in the order that privateRsrc.values() 
> returns the LocalResourcesTracker. So if user1 has 10MB in their cache and 
> user2 has 100MB in their cache and the target size of the cache is 50MB, 
> user1 could potentially have their entire cache removed before anything is 
> deleted from the user2 cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches

2016-10-26 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-5767:
---
Attachment: YARN-5767-trunk-v4.patch

Attached is a v4 patch for trunk addressing all comments from the reviews. 
Thanks!

> Fix the order that resources are cleaned up from the local Public/Private 
> caches
> 
>
> Key: YARN-5767
> URL: https://issues.apache.org/jira/browse/YARN-5767
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-5767-trunk-v1.patch, YARN-5767-trunk-v2.patch, 
> YARN-5767-trunk-v3.patch, YARN-5767-trunk-v4.patch
>
>
> If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can 
> see that public resources are added to the {{ResourceRetentionSet}} first 
> followed by private resources:
> {code:java}
> private void handleCacheCleanup(LocalizationEvent event) {
>   ResourceRetentionSet retain =
> new ResourceRetentionSet(delService, cacheTargetSize);
>   retain.addResources(publicRsrc);
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Resource cleanup (public) " + retain);
>   }
>   for (LocalResourcesTracker t : privateRsrc.values()) {
> retain.addResources(t);
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Resource cleanup " + t.getUser() + ":" + retain);
> }
>   }
>   //TODO Check if appRsrcs should also be added to the retention set.
> }
> {code}
> Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see 
> that this means public resources are deleted first until the target cache 
> size is met:
> {code:java}
> public void addResources(LocalResourcesTracker newTracker) {
>   for (LocalizedResource resource : newTracker) {
> currentSize += resource.getSize();
> if (resource.getRefCount() > 0) {
>   // always retain resources in use
>   continue;
> }
> retain.put(resource, newTracker);
>   }
>   for (Iterator> i =
>  retain.entrySet().iterator();
>currentSize - delSize > targetSize && i.hasNext();) {
> Map.Entry rsrc = i.next();
> LocalizedResource resource = rsrc.getKey();
> LocalResourcesTracker tracker = rsrc.getValue();
> if (tracker.remove(resource, delService)) {
>   delSize += resource.getSize();
>   i.remove();
> }
>   }
> }
> {code}
> The result of this is that resources in the private cache are only deleted in 
> the cases where:
> # The cache size is larger than the target cache size and the public cache is 
> empty.
> # The cache size is larger than the target cache size and everything in the 
> public cache is being used by a running container.
> For clusters that primarily use the public cache (i.e. make use of the shared 
> cache), this means that the most commonly used resources can be deleted 
> before old resources in the private cache. Furthermore, the private cache can 
> continue to grow over time causing more and more churn in the public cache.
> Additionally, the same problem exists within the private cache. Since 
> resources are added to the retention set on a user by user basis, resources 
> will get cleaned up one user at a time in the order that privateRsrc.values() 
> returns the LocalResourcesTracker. So if user1 has 10MB in their cache and 
> user2 has 100MB in their cache and the target size of the cache is 50MB, 
> user1 could potentially have their entire cache removed before anything is 
> deleted from the user2 cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   3   >