[ 
https://issues.apache.org/jira/browse/YARN-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182591#comment-15182591
 ] 

Wangda Tan commented on YARN-4719:
----------------------------------

Hi [~kasha],
bq. through addNode and removeNode so total_cluster_resources, 
total_inflated_cluster_resources (for YARN-1011), max_cluster_resources are not 
affected by other scheduler code.
I may not understand about this, could you elaborate?

For handle scheduler code to iterate nodes, we could either:
# Use concurrent map to avoid locking, and code will not break. Drawback: we 
need to handle stale data.
# Expose lock to external caller, so scheduler can get readlock of 
ClusterNodeTracker and do iteration. Drawback: iteration nodes and allocating 
containers could lock ClusterNodeTracker for long time.
# Assume synchronize lock of scheduler will be acquired when make changes to 
ClusterNodeTracker (like addNode, removeNode, etc.), and also when iterating 
nodes. We don't need extra lock of returned node collections. Drawback: this 
hides locks to external caller behaviors, and in the future scheduler could 
remove synchronized lock to get better performance.

I would suggest to look at if #1 is doable (handle stale data and assumes 
eventually consistency). #1 should have best performance and flexible to future 
scheduler changes.

> Add a helper library to maintain node state and allows common queries
> ---------------------------------------------------------------------
>
>                 Key: YARN-4719
>                 URL: https://issues.apache.org/jira/browse/YARN-4719
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: scheduler
>    Affects Versions: 2.8.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>         Attachments: yarn-4719-1.patch, yarn-4719-2.patch, yarn-4719-3.patch
>
>
> The scheduler could use a helper library to maintain node state and allowing 
> matching/sorting queries. Several reasons for this:
> # Today, a lot of the node state management is done separately in each 
> scheduler. Having a single library will take us that much closer to reducing 
> duplication among schedulers.
> # Adding a filtering/matching API would simplify node labels and locality 
> significantly. 
> # An API that returns a sorted list for a custom comparator would help 
> YARN-1011 where we want to sort by allocation and utilization for 
> continuous/asynchronous and opportunistic scheduling respectively. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to