[ https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848615#comment-13848615 ]
Bikas Saha commented on YARN-311: --------------------------------- Sorry for coming in so late on this. Some javadoc on ResourceOption would be helpful. Explanation of the context of dynamic resource changes on a node and resulting over-commitment of resources would be good. ResourceOption as a name did not make it clear to me the nature/use of the object. Would it be less error prone if we compared the total size of schedulernode and rmnode instead of the difference in their current available capacity? {code} + Resource oldAvailableResource = node.getAvailableResource(); + Resource newAvailableResource = Resources.subtract( + rmNode.getTotalCapability(), node.getUsedResource()); {code} Also, in the update to the node. Why are we updating only the availableResource and skipping totalresource? Total resource is used during scheduling decisions. {code} + @Override + public synchronized void applyDeltaOnAvailableResource(Resource deltaResource) { + // we can only adjust available resource if total resource is changed. + Resources.addTo(this.availableResource, deltaResource); + } {code} The current impl of addTo will work for both +ve and -ve deltas but given that there are addTo and subtractFrom methods, its not clear to me if that is a coincidence or not. Ideally there should have been one update method that by definition should handle +ve and -ve updates. Since changing the resource on a node would be an admin/service operation, why are we adding resourceOption to the rmnode and setting it in registernodemanager? {code} RMNode rmNode = new RMNodeImpl(nodeId, rmContext, host, cmPort, httpPort, - resolve(host), capability, nodeManagerVersion); + resolve(host), ResourceOption.newInstance(capability, RMNode.OVER_COMMIT_TIMEOUT_MILLIS_DEFAULT), + nodeManagerVersion); {code} Similarly, why are we trying to update the node on every heartbeat. I was expecting that whenever the node resource would be updated then an event would be sent to the scheduler. Upon receiving the event, the scheduler would make a one time update of the internal book-keeping objects. {code} + // Update resource if any change + SchedulerUtils.updateResourceIfChanged(node, nm, clusterResource, LOG); {code} Again, I apologize for coming in so late on this jira. > Dynamic node resource configuration: core scheduler changes > ----------------------------------------------------------- > > Key: YARN-311 > URL: https://issues.apache.org/jira/browse/YARN-311 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler > Reporter: Junping Du > Assignee: Junping Du > Fix For: 2.4.0 > > Attachments: YARN-311-v1.patch, YARN-311-v10.patch, > YARN-311-v11.patch, YARN-311-v12.patch, YARN-311-v12b.patch, > YARN-311-v13.patch, YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, > YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, > YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, > YARN-311-v9.patch > > > As the first step, we go for resource change on RM side and expose admin APIs > (admin protocol, CLI, REST and JMX API) later. In this jira, we will only > contain changes in scheduler. > The flow to update node's resource and awareness in resource scheduling is: > 1. Resource update is through admin API to RM and take effect on RMNodeImpl. > 2. When next NM heartbeat for updating status comes, the RMNode's resource > change will be aware and the delta resource is added to schedulerNode's > availableResource before actual scheduling happens. > 3. Scheduler do resource allocation according to new availableResource in > SchedulerNode. > For more design details, please refer proposal and discussions in parent > JIRA: YARN-291. -- This message was sent by Atlassian JIRA (v6.1.4#6159)