[ https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321648#comment-15321648 ]
Jason Lowe commented on YARN-5215: ---------------------------------- Ah, so the headline was a bit misleading. Most people saw that and thought this feature is going to schedule more containers on the server, but this is essentially the opposite of YARN-1011 and YARN-5202. It's scaling down the nodes from the original size as the node utilization increases rather than scaling it up from the original size as the node utilization decreases. Instead of overcommit, this feature is "undercommit." ;) I'm OK with the idea of the feature in general, and I don't think it will be horribly incompatible. In fact I think features like YARN-1011 / YARN-5202 could emulate this behavior by tuning the "guaranteed" node size for YARN very low but allowing it to drastically overcommit up to the original node capability. In other words rather than starting nodes big and scaling down, we start nodes small and scale up when we can. The YARN-5202 patch is already dynamically scaling the node size based on the reported total node utilization, so it will respond to increasing external load similarly. The only thing missing there is it won't go below the original node size no matter how bad the utilization gets, so either that would need to be changed or as I mentioned the users tune the feature differently to get this behavior. Any thoughts on whether this is better implemented as an overcommit setup rather than an undercommit setup? It may be confusing if YARN has two separate features doing essentially the same thing from opposite viewpoints. Also the guaranteed containers from YARN-1011 are going to be difficult to guarantee if this feature can preempt them based on external node load. Arguably the user should configure a guaranteed YARN capacity on these nodes and then YARN can opportunistically use the remaining node's capacity when it appears available. If we do go with this approach, it seems like this patch is quite a ways off. Besides unit tests, conf switches, and boundary condition hardening, I think it will be confusing to users and admins to monitor it. Simply adjusting the SchedulerNode will semantically accomplish the desired task as far as scheduling containers goes, but the UI, queue and cluster metrics will not reflect the reality of the scheduler. For example if most of the nodes have significantly been scaled back due to external load, the scheduler UI will show a well-underutilized cluster when in reality it may be completely full and can't schedule another container. That's going to be very confusing to users. And there are no metrics showing how much has been scaled back -- I think the user would have to go to the scheduler nodes page, sum the node capabilities themselves, notice it's significantly lower than the reported cluster total, and assume it must be this feature causing that anomaly. I would think minimally the cluster size should be changing (along with the queue portions of that size) so the amount of utilization of the YARN cluster and scheduler UI is accurate. That still leaves the user to divine why their cluster size is floating around over time when they aren't adding or removing nodes, which is why we may need another metric showing how much has been "stolen" by external node load outside of YARN. Maybe we still have an overcommit metric but it goes negative when we've had original capacity removed by external factors? Not sure how best to represent it without over-cluttering the UI with a bunch of feature-specific fields. This was addressed in the YARN-5202 patch by adjusting the queue and cluster metrics as we adjust the scheduler node, and there were also metrics and UI fields added to show the amount of overcommit. Note that in the YARN-5202 patch we added a fast-path to adjusting the node's size in the scheduler. The typical remove-old-node-add-new-node form of updating is quite expensive since it computes unnecessary things like node label diffs, etc. and updates the metrics twice, once for the removal and once for the add. Since this kind of feature is going to be adjusting node sizes all the time, a node adjustment needs to be as cheap as possible while still keeping the UI and metrics up to date. > Scheduling containers based on external load in the servers > ----------------------------------------------------------- > > Key: YARN-5215 > URL: https://issues.apache.org/jira/browse/YARN-5215 > Project: Hadoop YARN > Issue Type: Improvement > Reporter: Inigo Goiri > Attachments: YARN-5215.000.patch > > > Currently YARN runs containers in the servers assuming that they own all the > resources. The proposal is to use the utilization information in the node and > the containers to estimate how much is consumed by external processes and > schedule based on this estimation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org