[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2021-04-07 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316489#comment-17316489
 ] 

Jim Brennan commented on YARN-10475:


[~chaosju] thanks for your comment.  The implementation we provided here is 
using overall cluster utilization vs node utilization to adjust the heartbeat 
so that under-utilized nodes get more scheduling opportunities.  Note that this 
feature was developed internally on branch-2 before the global scheduler was 
added.   It has worked well to help keep our nodes more evenly utilized. 

I think that other metrics for scaling the heartbeat are definitely worth 
exploring, which is why we filed [YARN-10478] to make it pluggable.  That would 
be a good place to make suggestions for alternate approaches.


> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 3.2.3
>
> Attachments: YARN-10475-branch-3.2.003.patch, 
> YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, 
> YARN-10475.003.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2021-04-07 Thread chaosju (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316340#comment-17316340
 ] 

chaosju commented on YARN-10475:


Why adaptive Heartbeat ?
 * Regular heartbeats can overload RM.
 * if RM is overloaded things get worse over time as events queue up.
 * Lower work efficiency as important events at NM/AM need to wait for next 
heartbeat to let RM know of their status.
 * Not every heartbeat from a node or AM may be important. If nodes are running 
full, heartbeats from such nodes would not be useful for application 
scheduling. 
 * RM should be able to control heartbeats sent to itself

How adaptive Heartbeat ?

1.Throttle Heartbeat: 
 * HB interval based on scheduler load (LIGHT, NORMAL, BUSY, HEAVY)
 * Statistics associated with various scheduler events (processing time vs wait 
time in queue) is collected. 
 * RM indicates the next HB interval to NM and AM to throttle the heartbeat.

2. Event based Heartbeat:
 * Send out of band heartbeat to send emergent request such as new resource 
requests, container completion etc. before the heartbeat interval indicated by 
RM. 
 * RM can notify AM when the containers have been allocated so that AM does not 
have to wait for the scheduled heartbeat to get resources.

 
Reference:[https://www.slideshare.net/vsaxenavarun/venturing-into-large-hadoop-clusters]

 

I think that the feature should  think about RM's load.

[~Jim_Brennan]

> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 3.2.3
>
> Attachments: YARN-10475-branch-3.2.003.patch, 
> YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, 
> YARN-10475.003.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-11-02 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17224921#comment-17224921
 ] 

Jim Brennan commented on YARN-10475:


Thanks [~epayne]!

> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3
>
> Attachments: YARN-10475-branch-3.2.003.patch, 
> YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, 
> YARN-10475.003.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-11-02 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17224717#comment-17224717
 ] 

Jim Brennan commented on YARN-10475:


I have filed [YARN-10478] for making this pluggable.

> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: YARN-10475-branch-3.2.003.patch, 
> YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, 
> YARN-10475.003.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-11-01 Thread Bibin Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17224437#comment-17224437
 ] 

Bibin Chundatt commented on YARN-10475:
---

Sure lets have a follow up JIRA to work on making this pluggable..

> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: YARN-10475-branch-3.2.003.patch, 
> YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, 
> YARN-10475.003.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223831#comment-17223831
 ] 

Hadoop QA commented on YARN-10475:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
58s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} markdownlint {color} | {color:blue}  0m  
0s{color} |  | {color:blue} markdownlint was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} |  | {color:green} The patch appears to include 3 new or modified 
test files. {color} |
|| || || || {color:brown} branch-3.2 Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
18s{color} |  | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
34s{color} |  | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
56s{color} |  | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
21s{color} |  | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
42s{color} |  | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
20m 22s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
34s{color} |  | {color:green} branch-3.2 passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
57s{color} |  | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
30s{color} |  | {color:blue} 
branch/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site no findbugs output file 
(findbugsXml.xml) {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} |  | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
 7s{color} |  | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 15m  
3s{color} | 
[/patch-compile-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/277/artifact/out/patch-compile-root.txt]
 | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 15m  3s{color} 
| 
[/patch-compile-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/277/artifact/out/patch-compile-root.txt]
 | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} blanks {color} | {color:red}  0m  
0s{color} | 
[/blanks-eol.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/277/artifact/out/blanks-eol.txt]
 | {color:red} The patch has 1 line(s) that end in blanks. Use git apply 
--whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 33s{color} | 
[/buildtool-patch-checkstyle-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/277/artifact/out/buildtool-patch-checkstyle-root.txt]
 | {color:orange} The patch fails to run checkstyle in root {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
35s{color} | 
[/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/277/artifact/out/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api.txt]
 | {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
19s{color} | 
[/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/277/artifact/out/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt]
 | {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
34s{color} | 
[/patch-mvnsite-h

[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-30 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223818#comment-17223818
 ] 

Eric Payne commented on YARN-10475:
---

Thanks [~Jim_Brennan] for providing resolutions for this issue, and thanks 
[~bibinchundatt] for your reviews.
The changes LGTM.

+1

I am in favor of committing this patch as-is and creating a separate JIRA for 
adding a plug-able architecture for adjusting the heartbeat based on other 
factors.

[~bibinchundatt], I await your opinion.

> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: YARN-10475-branch-3.2.003.patch, 
> YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, 
> YARN-10475.003.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-30 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223733#comment-17223733
 ] 

Jim Brennan commented on YARN-10475:


[~epayne], I have put up patches for branch-3.3 and branch-3.2 as well.

> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: YARN-10475-branch-3.2.003.patch, 
> YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, 
> YARN-10475.003.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-30 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223668#comment-17223668
 ] 

Jim Brennan commented on YARN-10475:


Thanks for the suggestion [~bibinchundatt]!  I think a plugin for calculating 
the heartbeat interval is definitely possible.  The configs as specified I 
think could remain for enabling scaling and setting up the parameters - there 
is nothing specific about cpu utilization in those properties.  Would you be ok 
with a follow-up Jira to move the calculation into a plugin?  Do you have any 
suggestions for alternate calculations?


> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: YARN-10475.001.patch, YARN-10475.002.patch, 
> YARN-10475.003.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-30 Thread Bibin Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223491#comment-17223491
 ] 

Bibin Chundatt commented on YARN-10475:
---

Thank you  [~Jim_Brennan]  working on this.  

Could you make the implementation generics to plugin other policies too. Cpu 
utlization only of the policy which helps in deciding the HB interval. thoughts?


> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: YARN-10475.001.patch, YARN-10475.002.patch, 
> YARN-10475.003.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-29 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223343#comment-17223343
 ] 

Hadoop QA commented on YARN-10475:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
28s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} markdownlint {color} | {color:blue}  0m  
0s{color} |  | {color:blue} markdownlint was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} |  | {color:green} The patch appears to include 3 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} |  | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
24s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m 
10s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
55s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
42s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
47s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
23m 12s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
9s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
13s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
53s{color} |  | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
26s{color} |  | {color:blue} 
branch/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site no findbugs output file 
(findbugsXml.xml) {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} |  | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
43s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 
44s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 20m 
44s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
52s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 
52s{color} |  | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} blanks {color} | {color:red}  0m  
0s{color} | 
[/blanks-eol.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/273/artifact/out/blanks-eol.txt]
 | {color:red} The patch has 1 line(s) that end in blanks. Use git apply 
--whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 41s{color} | 
[/results-checkstyle-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/273/artifact/out/results-checkstyle-root.txt]
 | {color:orange} root: The patch generated 3 new + 412 unchanged - 0 fixed = 
415 total (was 412) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
48s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} xml {color}

[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-29 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223217#comment-17223217
 ] 

Jim Brennan commented on YARN-10475:


Thanks [~epayne]!  I put up patch 003, which adds documentation to 
Nodemanager.md and also fixes a minor typo in yarn-default.xml.


> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: YARN-10475.001.patch, YARN-10475.002.patch, 
> YARN-10475.003.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-29 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223196#comment-17223196
 ] 

Eric Payne commented on YARN-10475:
---

[~Jim_Brennan], Thanks for working on this feature and providing the patch.

The code patch looks good to me. Once you provide the documentation of the new 
properties, I am ready to provide my +1.

> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: YARN-10475.001.patch, YARN-10475.002.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222517#comment-17222517
 ] 

Hadoop QA commented on YARN-10475:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
54s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} |  | {color:green} The patch appears to include 3 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 11m 
41s{color} |  | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
57s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 
58s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
47s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
40s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
40s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
21m 36s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
15s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
11s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
58s{color} |  | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m  
0s{color} |  | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} |  | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
38s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 
47s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 20m 
47s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 
18s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 18m 
18s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 32s{color} | 
[/results-checkstyle-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/266/artifact/out/results-checkstyle-root.txt]
 | {color:orange} root: The patch generated 3 new + 413 unchanged - 0 fixed = 
416 total (was 413) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
53s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} |  | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 32s{color} |  | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
15s{color} |  | {color:green} the patch passed 

[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-28 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222481#comment-17222481
 ] 

Eric Payne commented on YARN-10475:
---

[~Jim_Brennan], please add documentation for the new config properties.

> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: YARN-10475.001.patch, YARN-10475.002.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-28 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1794#comment-1794
 ] 

Jim Brennan commented on YARN-10475:


I put up patch 002 to address checkstyle/javac issues.

> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: YARN-10475.001.patch, YARN-10475.002.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-27 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17221915#comment-17221915
 ] 

Hadoop QA commented on YARN-10475:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} |  | {color:green} The patch appears to include 3 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
52s{color} |  | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
47s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 
51s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
57s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
38s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
33s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
21m  3s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
57s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
6s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m  
2s{color} |  | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m  
5s{color} |  | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} |  | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
41s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 
48s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 20m 
48s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m  
3s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 18m  
3s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m  2s{color} | 
[/results-checkstyle-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/261/artifact/out/results-checkstyle-root.txt]
 | {color:orange} root: The patch generated 13 new + 413 unchanged - 0 fixed = 
426 total (was 413) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
49s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} |  | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 42s{color} |  | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
54s{color} | 
[/results-javadoc-javadoc-hadoop-yarn-pr

[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-27 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17221774#comment-17221774
 ] 

Jim Brennan commented on YARN-10475:


This adds the following {{yarn.resourcemanager.nodemanagers}} configuration 
properties:

{{heartbeat-interval-scaling-enable}}
 * enables heartbeat interval scaling, defaults to false

{{heartbeat-interval-min-ms}}
 * If heart-beat interval scaling is enabled, this is the minimum heart-beat 
interval in milliseconds.

{{heartbeat-interval-max-ms}}
 * If heart-beat interval scaling is enabled, this is the maximum heart-beat 
interval in milliseconds.

{{heartbeat-interval-speedup-factor}}
 * This controls the degree of adjustment when speeding up heartbeat intervals. 
At 1.0, 20% lesser than average CPU utilization will result in a 20% decrease 
in heartbeat interval.

 {{heartbeat-interval-slowdown-factor}}
* This controls the degree of adjustment when slowing down heartbeat intervals. 
At 1.0, 20% greater than average CPU utilization will result in a 20% increase 
in heartbeat interval.


> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org