[ https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316336#comment-17316336 ]
chaosju edited comment on YARN-10450 at 4/7/21, 1:30 PM: --------------------------------------------------------- Why adaptive Heartbeat ? * {color:#ff0000}Regular heartbeats can overload RM.{color} * {color:#ff0000}if RM is overloaded things get worse over time as events queue up.{color} * Lower work efficiency as important events at NM/AM need to wait for next heartbeat to let RM know of their status. * Not every heartbeat from a node or AM may be important. If nodes are running full, heartbeats from such nodes would not be useful for application scheduling. * RM should be able to control heartbeats sent to itself How adaptive Heartbeat ? 1.Throttle Heartbeat: * {color:#ff0000} HB interval based on scheduler load (LIGHT, NORMAL, BUSY, HEAVY){color} * Statistics associated with various scheduler events (processing time vs wait time in queue) is collected. * RM indicates the next HB interval to NM and AM to throttle the heartbeat. 2. Event based Heartbeat: * Send out of band heartbeat to send emergent request such as new resource requests, container completion etc. before the heartbeat interval indicated by RM. * RM can notify AM when the containers have been allocated so that AM does not have to wait for the scheduled heartbeat to get resources. Reference:https://www.slideshare.net/vsaxenavarun/venturing-into-large-hadoop-clusters [~Jim_Brennan] was (Author: chaosju): Why adaptive Heartbeat ? * {color:#FF0000}Regular heartbeats can overload RM.{color} * {color:#FF0000}if RM is overloaded things get worse over time as events queue up.{color} * Lower work efficiency as important events at NM/AM need to wait for next heartbeat to let RM know of their status. * Not every heartbeat from a node or AM may be important. If nodes are running full, heartbeats from such nodes would not be useful for application scheduling. * RM should be able to control heartbeats sent to itself How adaptive Heartbeat ? 1.Throttle Heartbeat: * {color:#FF0000} HB interval based on scheduler load (LIGHT, NORMAL, BUSY, HEAVY){color} * Statistics associated with various scheduler events (processing time vs wait time in queue) is collected. * RM indicates the next HB interval to NM and AM to throttle the heartbeat. 2. Event based Heartbeat: * Send out of band heartbeat to send emergent request such as new resource requests, container completion etc. before the heartbeat interval indicated by RM. * RM can notify AM when the containers have been allocated so that AM does not have to wait for the scheduled heartbeat to get resources. [~Jim_Brennan] > Add cpu and memory utilization per node and cluster-wide metrics > ---------------------------------------------------------------- > > Key: YARN-10450 > URL: https://issues.apache.org/jira/browse/YARN-10450 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn > Affects Versions: 3.3.1 > Reporter: Jim Brennan > Assignee: Jim Brennan > Priority: Minor > Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 2.10.2, 3.2.3 > > Attachments: NodesPage.png, YARN-10450-branch-2.10.003.patch, > YARN-10450-branch-3.1.003.patch, YARN-10450-branch-3.2.003.patch, > YARN-10450.001.patch, YARN-10450.002.patch, YARN-10450.003.patch > > > Add metrics to show actual cpu and memory utilization for each node and > aggregated for the entire cluster. This is information is already passed > from NM to RM in the node status update. > We have been running with this internally for quite a while and found it > useful to be able to quickly see the actual cpu/memory utilization on the > node/cluster. It's especially useful if some form of overcommit is used. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org