[ 
https://issues.apache.org/jira/browse/YARN-4902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409750#comment-15409750
 ] 

Wangda Tan commented on YARN-4902:
----------------------------------

Thanks for sharing this, [~kkaranasos]/[~pg1...@imperial.ac.uk],

I took a quick look at the design doc and POC patch.

*For the design doc:*
1) From the requirement's perspective, I didn't see new things, please remind 
me if I missed anything
- The cardinality constraints is placement_set with maximum_concurrency 
constraint: see {{(4.3.3) Placement Strategy}} in my design doc.
- The dynamic tags for node properties (like hardware configs, etc.), is node 
constraints.
- The dynamic tags for application, is it as same as allocation tags? Using 
node label manager or not is a implementation decision to me
2) I'm not sure how LRA planner will look like, should it be a separate 
scheduler running in parallel? I didn't see your patch uses that approach.

*For the patch:*
3) It might be better to implement complex scheduling logics like 
affinity-between-apps and cardinality in a global scheduling way. (YARN-5139)

4) Will this patch support anti-affinity / affinity between apps? I uploaded my 
latest POC patch to YARN-1042, it supports affinity/anti-affinity for 
inter/intra apps. We can easily extend it to support intra/inter resource 
request within the app.

5) Major logic of this patch depends on node label manager dynamic tag changes. 
First of all, I'm not sure if NLM works efficiently when node label changes 
rapidly (we could update label on node when allocate / release every 
container). And I'm not sure how you plan to avoid malicious application add 
labels. For example if a distributed shell application claims it is a "hbase 
master" just for fun, how to enforce cardinality logics like "only put 10 HBase 
masters in the rack"?

*Suggestions*
- Could you take a look at global scheduling patch which I attached to 
YARN-5139 to see if it is possible to build new features added in your patch on 
top of the global scheduling framework? And also please share your thoughts 
about what's your overall feedbacks to the global scheduling framework like 
efficiency, extensibility, etc.
- It will be better to design Java API for this ticket, both of our poc patches 
(this one and the one I attached to YARN-1042) don't have a solid API 
definition. It is very important to define API first, could you help with API 
definition works?

> [Umbrella] Generalized and unified scheduling-strategies in YARN
> ----------------------------------------------------------------
>
>                 Key: YARN-4902
>                 URL: https://issues.apache.org/jira/browse/YARN-4902
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Wangda Tan
>         Attachments: Generalized and unified scheduling-strategies in YARN 
> -v0.pdf, LRA-scheduling-design.v0.pdf, YARN-5468.prototype.patch
>
>
> Apache Hadoop YARN's ResourceRequest mechanism is the core part of the YARN's 
> scheduling API for applications to use. The ResourceRequest mechanism is a 
> powerful API for applications (specifically ApplicationMasters) to indicate 
> to YARN what size of containers are needed, and where in the cluster etc.
> However a host of new feature requirements are making the API increasingly 
> more and more complex and difficult to understand by users and making it very 
> complicated to implement within the code-base.
> This JIRA aims to generalize and unify all such scheduling-strategies in YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to