[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316489#comment-17316489 ] Jim Brennan commented on YARN-10475: [~chaosju] thanks for your comment. The implementation we provided here is using overall cluster utilization vs node utilization to adjust the heartbeat so that under-utilized nodes get more scheduling opportunities. Note that this feature was developed internally on branch-2 before the global scheduler was added. It has worked well to help keep our nodes more evenly utilized. I think that other metrics for scaling the heartbeat are definitely worth exploring, which is why we filed [YARN-10478] to make it pluggable. That would be a good place to make suggestions for alternate approaches. > Scale RM-NM heartbeat interval based on node utilization > > > Key: YARN-10475 > URL: https://issues.apache.org/jira/browse/YARN-10475 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.10.1, 3.4.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 3.2.3 > > Attachments: YARN-10475-branch-3.2.003.patch, > YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, > YARN-10475.003.patch > > > Add the ability to scale the RM-NM heartbeat interval based on node cpu > utilization compared to overall cluster cpu utilization. If a node is > over-utilized compared to the rest of the cluster, it's heartbeat interval > slows down. If it is under-utilized compared to the rest of the cluster, > it's heartbeat interval speeds up. > This is a feature we have been running with internally in production for > several years. It was developed by [~nroberts], based on the observation > that larger faster nodes on our cluster were under-utilized compared to > smaller slower nodes. > This feature is dependent on [YARN-10450], which added cluster-wide > utilization metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316340#comment-17316340 ] chaosju commented on YARN-10475: Why adaptive Heartbeat ? * Regular heartbeats can overload RM. * if RM is overloaded things get worse over time as events queue up. * Lower work efficiency as important events at NM/AM need to wait for next heartbeat to let RM know of their status. * Not every heartbeat from a node or AM may be important. If nodes are running full, heartbeats from such nodes would not be useful for application scheduling. * RM should be able to control heartbeats sent to itself How adaptive Heartbeat ? 1.Throttle Heartbeat: * HB interval based on scheduler load (LIGHT, NORMAL, BUSY, HEAVY) * Statistics associated with various scheduler events (processing time vs wait time in queue) is collected. * RM indicates the next HB interval to NM and AM to throttle the heartbeat. 2. Event based Heartbeat: * Send out of band heartbeat to send emergent request such as new resource requests, container completion etc. before the heartbeat interval indicated by RM. * RM can notify AM when the containers have been allocated so that AM does not have to wait for the scheduled heartbeat to get resources. Reference:[https://www.slideshare.net/vsaxenavarun/venturing-into-large-hadoop-clusters] I think that the feature should think about RM's load. [~Jim_Brennan] > Scale RM-NM heartbeat interval based on node utilization > > > Key: YARN-10475 > URL: https://issues.apache.org/jira/browse/YARN-10475 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.10.1, 3.4.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 3.2.3 > > Attachments: YARN-10475-branch-3.2.003.patch, > YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, > YARN-10475.003.patch > > > Add the ability to scale the RM-NM heartbeat interval based on node cpu > utilization compared to overall cluster cpu utilization. If a node is > over-utilized compared to the rest of the cluster, it's heartbeat interval > slows down. If it is under-utilized compared to the rest of the cluster, > it's heartbeat interval speeds up. > This is a feature we have been running with internally in production for > several years. It was developed by [~nroberts], based on the observation > that larger faster nodes on our cluster were under-utilized compared to > smaller slower nodes. > This feature is dependent on [YARN-10450], which added cluster-wide > utilization metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17224921#comment-17224921 ] Jim Brennan commented on YARN-10475: Thanks [~epayne]! > Scale RM-NM heartbeat interval based on node utilization > > > Key: YARN-10475 > URL: https://issues.apache.org/jira/browse/YARN-10475 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.10.1, 3.4.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3 > > Attachments: YARN-10475-branch-3.2.003.patch, > YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, > YARN-10475.003.patch > > > Add the ability to scale the RM-NM heartbeat interval based on node cpu > utilization compared to overall cluster cpu utilization. If a node is > over-utilized compared to the rest of the cluster, it's heartbeat interval > slows down. If it is under-utilized compared to the rest of the cluster, > it's heartbeat interval speeds up. > This is a feature we have been running with internally in production for > several years. It was developed by [~nroberts], based on the observation > that larger faster nodes on our cluster were under-utilized compared to > smaller slower nodes. > This feature is dependent on [YARN-10450], which added cluster-wide > utilization metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17224717#comment-17224717 ] Jim Brennan commented on YARN-10475: I have filed [YARN-10478] for making this pluggable. > Scale RM-NM heartbeat interval based on node utilization > > > Key: YARN-10475 > URL: https://issues.apache.org/jira/browse/YARN-10475 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.10.1, 3.4.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: YARN-10475-branch-3.2.003.patch, > YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, > YARN-10475.003.patch > > > Add the ability to scale the RM-NM heartbeat interval based on node cpu > utilization compared to overall cluster cpu utilization. If a node is > over-utilized compared to the rest of the cluster, it's heartbeat interval > slows down. If it is under-utilized compared to the rest of the cluster, > it's heartbeat interval speeds up. > This is a feature we have been running with internally in production for > several years. It was developed by [~nroberts], based on the observation > that larger faster nodes on our cluster were under-utilized compared to > smaller slower nodes. > This feature is dependent on [YARN-10450], which added cluster-wide > utilization metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17224437#comment-17224437 ] Bibin Chundatt commented on YARN-10475: --- Sure lets have a follow up JIRA to work on making this pluggable.. > Scale RM-NM heartbeat interval based on node utilization > > > Key: YARN-10475 > URL: https://issues.apache.org/jira/browse/YARN-10475 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.10.1, 3.4.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: YARN-10475-branch-3.2.003.patch, > YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, > YARN-10475.003.patch > > > Add the ability to scale the RM-NM heartbeat interval based on node cpu > utilization compared to overall cluster cpu utilization. If a node is > over-utilized compared to the rest of the cluster, it's heartbeat interval > slows down. If it is under-utilized compared to the rest of the cluster, > it's heartbeat interval speeds up. > This is a feature we have been running with internally in production for > several years. It was developed by [~nroberts], based on the observation > that larger faster nodes on our cluster were under-utilized compared to > smaller slower nodes. > This feature is dependent on [YARN-10450], which added cluster-wide > utilization metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223831#comment-17223831 ] Hadoop QA commented on YARN-10475: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 58s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 1s{color} | | {color:green} No case conflicting files found. {color} | | {color:blue}0{color} | {color:blue} markdownlint {color} | {color:blue} 0m 0s{color} | | {color:blue} markdownlint was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} branch-3.2 Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 18s{color} | | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 34s{color} | | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 56s{color} | | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 21s{color} | | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 42s{color} | | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 20m 22s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 34s{color} | | {color:green} branch-3.2 passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 57s{color} | | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 30s{color} | | {color:blue} branch/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site no findbugs output file (findbugsXml.xml) {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 7s{color} | | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 15m 3s{color} | [/patch-compile-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/277/artifact/out/patch-compile-root.txt] | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 15m 3s{color} | [/patch-compile-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/277/artifact/out/patch-compile-root.txt] | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} blanks {color} | {color:red} 0m 0s{color} | [/blanks-eol.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/277/artifact/out/blanks-eol.txt] | {color:red} The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 33s{color} | [/buildtool-patch-checkstyle-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/277/artifact/out/buildtool-patch-checkstyle-root.txt] | {color:orange} The patch fails to run checkstyle in root {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 35s{color} | [/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/277/artifact/out/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api.txt] | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 19s{color} | [/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/277/artifact/out/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt] | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 34s{color} | [/patch-mvnsite-h
[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223818#comment-17223818 ] Eric Payne commented on YARN-10475: --- Thanks [~Jim_Brennan] for providing resolutions for this issue, and thanks [~bibinchundatt] for your reviews. The changes LGTM. +1 I am in favor of committing this patch as-is and creating a separate JIRA for adding a plug-able architecture for adjusting the heartbeat based on other factors. [~bibinchundatt], I await your opinion. > Scale RM-NM heartbeat interval based on node utilization > > > Key: YARN-10475 > URL: https://issues.apache.org/jira/browse/YARN-10475 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.10.1, 3.4.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: YARN-10475-branch-3.2.003.patch, > YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, > YARN-10475.003.patch > > > Add the ability to scale the RM-NM heartbeat interval based on node cpu > utilization compared to overall cluster cpu utilization. If a node is > over-utilized compared to the rest of the cluster, it's heartbeat interval > slows down. If it is under-utilized compared to the rest of the cluster, > it's heartbeat interval speeds up. > This is a feature we have been running with internally in production for > several years. It was developed by [~nroberts], based on the observation > that larger faster nodes on our cluster were under-utilized compared to > smaller slower nodes. > This feature is dependent on [YARN-10450], which added cluster-wide > utilization metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223733#comment-17223733 ] Jim Brennan commented on YARN-10475: [~epayne], I have put up patches for branch-3.3 and branch-3.2 as well. > Scale RM-NM heartbeat interval based on node utilization > > > Key: YARN-10475 > URL: https://issues.apache.org/jira/browse/YARN-10475 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.10.1, 3.4.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: YARN-10475-branch-3.2.003.patch, > YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, > YARN-10475.003.patch > > > Add the ability to scale the RM-NM heartbeat interval based on node cpu > utilization compared to overall cluster cpu utilization. If a node is > over-utilized compared to the rest of the cluster, it's heartbeat interval > slows down. If it is under-utilized compared to the rest of the cluster, > it's heartbeat interval speeds up. > This is a feature we have been running with internally in production for > several years. It was developed by [~nroberts], based on the observation > that larger faster nodes on our cluster were under-utilized compared to > smaller slower nodes. > This feature is dependent on [YARN-10450], which added cluster-wide > utilization metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223668#comment-17223668 ] Jim Brennan commented on YARN-10475: Thanks for the suggestion [~bibinchundatt]! I think a plugin for calculating the heartbeat interval is definitely possible. The configs as specified I think could remain for enabling scaling and setting up the parameters - there is nothing specific about cpu utilization in those properties. Would you be ok with a follow-up Jira to move the calculation into a plugin? Do you have any suggestions for alternate calculations? > Scale RM-NM heartbeat interval based on node utilization > > > Key: YARN-10475 > URL: https://issues.apache.org/jira/browse/YARN-10475 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.10.1, 3.4.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: YARN-10475.001.patch, YARN-10475.002.patch, > YARN-10475.003.patch > > > Add the ability to scale the RM-NM heartbeat interval based on node cpu > utilization compared to overall cluster cpu utilization. If a node is > over-utilized compared to the rest of the cluster, it's heartbeat interval > slows down. If it is under-utilized compared to the rest of the cluster, > it's heartbeat interval speeds up. > This is a feature we have been running with internally in production for > several years. It was developed by [~nroberts], based on the observation > that larger faster nodes on our cluster were under-utilized compared to > smaller slower nodes. > This feature is dependent on [YARN-10450], which added cluster-wide > utilization metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223491#comment-17223491 ] Bibin Chundatt commented on YARN-10475: --- Thank you [~Jim_Brennan] working on this. Could you make the implementation generics to plugin other policies too. Cpu utlization only of the policy which helps in deciding the HB interval. thoughts? > Scale RM-NM heartbeat interval based on node utilization > > > Key: YARN-10475 > URL: https://issues.apache.org/jira/browse/YARN-10475 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.10.1, 3.4.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: YARN-10475.001.patch, YARN-10475.002.patch, > YARN-10475.003.patch > > > Add the ability to scale the RM-NM heartbeat interval based on node cpu > utilization compared to overall cluster cpu utilization. If a node is > over-utilized compared to the rest of the cluster, it's heartbeat interval > slows down. If it is under-utilized compared to the rest of the cluster, > it's heartbeat interval speeds up. > This is a feature we have been running with internally in production for > several years. It was developed by [~nroberts], based on the observation > that larger faster nodes on our cluster were under-utilized compared to > smaller slower nodes. > This feature is dependent on [YARN-10450], which added cluster-wide > utilization metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223343#comment-17223343 ] Hadoop QA commented on YARN-10475: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 28s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:blue}0{color} | {color:blue} markdownlint {color} | {color:blue} 0m 0s{color} | | {color:blue} markdownlint was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 24s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m 10s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 55s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 42s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 47s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 23m 12s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 9s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 13s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 0m 53s{color} | | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 26s{color} | | {color:blue} branch/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site no findbugs output file (findbugsXml.xml) {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 43s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 44s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 20m 44s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 52s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 52s{color} | | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} blanks {color} | {color:red} 0m 0s{color} | [/blanks-eol.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/273/artifact/out/blanks-eol.txt] | {color:red} The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 41s{color} | [/results-checkstyle-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/273/artifact/out/results-checkstyle-root.txt] | {color:orange} root: The patch generated 3 new + 412 unchanged - 0 fixed = 415 total (was 412) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 48s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} xml {color}
[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223217#comment-17223217 ] Jim Brennan commented on YARN-10475: Thanks [~epayne]! I put up patch 003, which adds documentation to Nodemanager.md and also fixes a minor typo in yarn-default.xml. > Scale RM-NM heartbeat interval based on node utilization > > > Key: YARN-10475 > URL: https://issues.apache.org/jira/browse/YARN-10475 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.10.1, 3.4.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: YARN-10475.001.patch, YARN-10475.002.patch, > YARN-10475.003.patch > > > Add the ability to scale the RM-NM heartbeat interval based on node cpu > utilization compared to overall cluster cpu utilization. If a node is > over-utilized compared to the rest of the cluster, it's heartbeat interval > slows down. If it is under-utilized compared to the rest of the cluster, > it's heartbeat interval speeds up. > This is a feature we have been running with internally in production for > several years. It was developed by [~nroberts], based on the observation > that larger faster nodes on our cluster were under-utilized compared to > smaller slower nodes. > This feature is dependent on [YARN-10450], which added cluster-wide > utilization metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223196#comment-17223196 ] Eric Payne commented on YARN-10475: --- [~Jim_Brennan], Thanks for working on this feature and providing the patch. The code patch looks good to me. Once you provide the documentation of the new properties, I am ready to provide my +1. > Scale RM-NM heartbeat interval based on node utilization > > > Key: YARN-10475 > URL: https://issues.apache.org/jira/browse/YARN-10475 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.10.1, 3.4.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: YARN-10475.001.patch, YARN-10475.002.patch > > > Add the ability to scale the RM-NM heartbeat interval based on node cpu > utilization compared to overall cluster cpu utilization. If a node is > over-utilized compared to the rest of the cluster, it's heartbeat interval > slows down. If it is under-utilized compared to the rest of the cluster, > it's heartbeat interval speeds up. > This is a feature we have been running with internally in production for > several years. It was developed by [~nroberts], based on the observation > that larger faster nodes on our cluster were under-utilized compared to > smaller slower nodes. > This feature is dependent on [YARN-10450], which added cluster-wide > utilization metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222517#comment-17222517 ] Hadoop QA commented on YARN-10475: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 54s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 1s{color} | | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 11m 41s{color} | | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 57s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 58s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 47s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 40s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 40s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 21m 36s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 15s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 11s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 0m 58s{color} | | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 0s{color} | | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s{color} | | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 38s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 47s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 20m 47s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 18s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 18m 18s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} blanks {color} | {color:green} 0m 0s{color} | | {color:green} The patch has no blanks issues. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 32s{color} | [/results-checkstyle-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/266/artifact/out/results-checkstyle-root.txt] | {color:orange} root: The patch generated 3 new + 413 unchanged - 0 fixed = 416 total (was 413) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 53s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 32s{color} | | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 15s{color} | | {color:green} the patch passed
[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222481#comment-17222481 ] Eric Payne commented on YARN-10475: --- [~Jim_Brennan], please add documentation for the new config properties. > Scale RM-NM heartbeat interval based on node utilization > > > Key: YARN-10475 > URL: https://issues.apache.org/jira/browse/YARN-10475 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.10.1, 3.4.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: YARN-10475.001.patch, YARN-10475.002.patch > > > Add the ability to scale the RM-NM heartbeat interval based on node cpu > utilization compared to overall cluster cpu utilization. If a node is > over-utilized compared to the rest of the cluster, it's heartbeat interval > slows down. If it is under-utilized compared to the rest of the cluster, > it's heartbeat interval speeds up. > This is a feature we have been running with internally in production for > several years. It was developed by [~nroberts], based on the observation > that larger faster nodes on our cluster were under-utilized compared to > smaller slower nodes. > This feature is dependent on [YARN-10450], which added cluster-wide > utilization metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1794#comment-1794 ] Jim Brennan commented on YARN-10475: I put up patch 002 to address checkstyle/javac issues. > Scale RM-NM heartbeat interval based on node utilization > > > Key: YARN-10475 > URL: https://issues.apache.org/jira/browse/YARN-10475 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.10.1, 3.4.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: YARN-10475.001.patch, YARN-10475.002.patch > > > Add the ability to scale the RM-NM heartbeat interval based on node cpu > utilization compared to overall cluster cpu utilization. If a node is > over-utilized compared to the rest of the cluster, it's heartbeat interval > slows down. If it is under-utilized compared to the rest of the cluster, > it's heartbeat interval speeds up. > This is a feature we have been running with internally in production for > several years. It was developed by [~nroberts], based on the observation > that larger faster nodes on our cluster were under-utilized compared to > smaller slower nodes. > This feature is dependent on [YARN-10450], which added cluster-wide > utilization metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17221915#comment-17221915 ] Hadoop QA commented on YARN-10475: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 52s{color} | | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 47s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 51s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 57s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 38s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 33s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 21m 3s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 57s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 6s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 2s{color} | | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 5s{color} | | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 41s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 48s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 20m 48s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 3s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 18m 3s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} blanks {color} | {color:green} 0m 0s{color} | | {color:green} The patch has no blanks issues. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 2s{color} | [/results-checkstyle-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/261/artifact/out/results-checkstyle-root.txt] | {color:orange} root: The patch generated 13 new + 413 unchanged - 0 fixed = 426 total (was 413) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 49s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 42s{color} | | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 54s{color} | [/results-javadoc-javadoc-hadoop-yarn-pr
[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17221774#comment-17221774 ] Jim Brennan commented on YARN-10475: This adds the following {{yarn.resourcemanager.nodemanagers}} configuration properties: {{heartbeat-interval-scaling-enable}} * enables heartbeat interval scaling, defaults to false {{heartbeat-interval-min-ms}} * If heart-beat interval scaling is enabled, this is the minimum heart-beat interval in milliseconds. {{heartbeat-interval-max-ms}} * If heart-beat interval scaling is enabled, this is the maximum heart-beat interval in milliseconds. {{heartbeat-interval-speedup-factor}} * This controls the degree of adjustment when speeding up heartbeat intervals. At 1.0, 20% lesser than average CPU utilization will result in a 20% decrease in heartbeat interval. {{heartbeat-interval-slowdown-factor}} * This controls the degree of adjustment when slowing down heartbeat intervals. At 1.0, 20% greater than average CPU utilization will result in a 20% increase in heartbeat interval. > Scale RM-NM heartbeat interval based on node utilization > > > Key: YARN-10475 > URL: https://issues.apache.org/jira/browse/YARN-10475 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.10.1, 3.4.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > > Add the ability to scale the RM-NM heartbeat interval based on node cpu > utilization compared to overall cluster cpu utilization. If a node is > over-utilized compared to the rest of the cluster, it's heartbeat interval > slows down. If it is under-utilized compared to the rest of the cluster, > it's heartbeat interval speeds up. > This is a feature we have been running with internally in production for > several years. It was developed by [~nroberts], based on the observation > that larger faster nodes on our cluster were under-utilized compared to > smaller slower nodes. > This feature is dependent on [YARN-10450], which added cluster-wide > utilization metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org