[ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848615#comment-13848615
 ] 

Bikas Saha commented on YARN-311:
---------------------------------

Sorry for coming in so late on this.

Some javadoc on ResourceOption would be helpful. Explanation of the context of 
dynamic resource changes on a node and resulting over-commitment of resources 
would be good.

ResourceOption as a name did not make it clear to me the nature/use of the 
object.

Would it be less error prone if we compared the total size of schedulernode and 
rmnode instead of the difference in their current available capacity? 
{code}
+    Resource oldAvailableResource = node.getAvailableResource();
+    Resource newAvailableResource = Resources.subtract(
+        rmNode.getTotalCapability(), node.getUsedResource());
{code}

Also, in the update to the node. Why are we updating only the availableResource 
and skipping totalresource? Total resource is used during scheduling decisions.
{code}
+  @Override
+  public synchronized void applyDeltaOnAvailableResource(Resource 
deltaResource) {
+    // we can only adjust available resource if total resource is changed.
+    Resources.addTo(this.availableResource, deltaResource);
+  }
{code}
The current impl of addTo will work for both +ve and -ve deltas but given that 
there are addTo and subtractFrom methods, its not clear to me if that is a 
coincidence or not. Ideally there should have been one update method that by 
definition should handle +ve and -ve updates.

Since changing the resource on a node would be an admin/service operation, why 
are we adding resourceOption to the rmnode and setting it in 
registernodemanager?
{code}
     RMNode rmNode = new RMNodeImpl(nodeId, rmContext, host, cmPort, httpPort,
-        resolve(host), capability, nodeManagerVersion);
+        resolve(host), ResourceOption.newInstance(capability, 
RMNode.OVER_COMMIT_TIMEOUT_MILLIS_DEFAULT),
+        nodeManagerVersion);
{code}

Similarly, why are we trying to update the node on every heartbeat. I was 
expecting that whenever the node resource would be updated then an event would 
be sent to the scheduler. Upon receiving the event, the scheduler would make a 
one time update of the internal book-keeping objects.
{code}
+    // Update resource if any change
+    SchedulerUtils.updateResourceIfChanged(node, nm, clusterResource, LOG);
{code}

Again, I apologize for coming in so late on this jira.

> Dynamic node resource configuration: core scheduler changes
> -----------------------------------------------------------
>
>                 Key: YARN-311
>                 URL: https://issues.apache.org/jira/browse/YARN-311
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager, scheduler
>            Reporter: Junping Du
>            Assignee: Junping Du
>             Fix For: 2.4.0
>
>         Attachments: YARN-311-v1.patch, YARN-311-v10.patch, 
> YARN-311-v11.patch, YARN-311-v12.patch, YARN-311-v12b.patch, 
> YARN-311-v13.patch, YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, 
> YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, 
> YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, 
> YARN-311-v9.patch
>
>
> As the first step, we go for resource change on RM side and expose admin APIs 
> (admin protocol, CLI, REST and JMX API) later. In this jira, we will only 
> contain changes in scheduler. 
> The flow to update node's resource and awareness in resource scheduling is: 
> 1. Resource update is through admin API to RM and take effect on RMNodeImpl.
> 2. When next NM heartbeat for updating status comes, the RMNode's resource 
> change will be aware and the delta resource is added to schedulerNode's 
> availableResource before actual scheduling happens.
> 3. Scheduler do resource allocation according to new availableResource in 
> SchedulerNode.
> For more design details, please refer proposal and discussions in parent 
> JIRA: YARN-291.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to