[ 
https://issues.apache.org/jira/browse/YARN-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148261#comment-15148261
 ] 

Arun Suresh commented on YARN-4412:
-----------------------------------

Many thanks for the detailed review [~curino].

# I totally agree with your point in explicitly authorizing AMs to allow them 
to send and receive cluster information via the extended protocol : YARN-4631 
has been raised to track this.
# With regard to generalizing {{QueuedContainersStatus}} into a 
{{ClusterStatus}}, Please note.. this is actually metadata sent from the NM to 
the RM, therefore *ClusterStatus* might not apply here. But I agree, we 
probably can add more cluster information to the 
{{DistributedSchedulingProtocol}}, which we introduced in YARN-2885. Also the 
node heartbeat does already contain both Container as well as aggregate Node 
resource utilization information. {{QueuedContainersStatus}} is just another 
utilization metric used by the {{ClusterMonitor}} running on the RM and used by 
the DistributedScheduling framework to gauge the relative load on a Node based 
on the state of the queue (maintained by the {{ContainersMonitor}} which queues 
OPPORTUNISTICS container requests) 

bq.  ..documentation on the various classes would help. e.g., you introduce a 
DistributedSchedulingService, ..
Agreed, I have added some class level docs to some of the new classes 
introduced here.

bq. ... if you are factoring out all the "guts" of SchedulerEventDispatcher, 
can't we simply move the class out? ..
Agreed.. 

bq. Can you clarify what happens in DistributedSchedulingService.getServer() 
?...
Fixed the comment to explain this.

bq. ..assumes resources will have only cpu/mem...Is there any better way to 
load this info from configuration? It would be nice to have a 
config.getResource("blah"), which takes care of this...
Good point.. unfortunately, currently the Configuration object does not support 
{{getResource()}}.. Once the generalized resource model lands, will circle back 
to this.

bq. I see tests for TopKNodeSelector, but for nothing else. Is this enough?
Definitely not.. but we have to wait for the actual changes in the 
{{ContainerManager}} and {{ContainersMonitor}} class, handled in YARN-2883 to 
test this end-to-end. In the mean time, I will add tests to verify that extra 
fields in the protobuff are handled correctly.


> Create ClusterMonitor to compute ordered list of preferred NMs for 
> OPPORTUNITIC containers
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-4412
>                 URL: https://issues.apache.org/jira/browse/YARN-4412
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>         Attachments: YARN-4412-yarn-2877.v1.patch, 
> YARN-4412-yarn-2877.v2.patch
>
>
> Introduce a Cluster Monitor that aggregates load information from individual 
> Node Managers and computes an ordered list of preferred Node managers to be 
> used as target Nodes for OPPORTUNISTIC container allocations. 
> This list can be pushed out to the Node Manager (specifically the AMRMProxy 
> running on the Node) via the Allocate Response. This will be used to make 
> local Scheduling decisions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to