[ https://issues.apache.org/jira/browse/YARN-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182591#comment-15182591 ]
Wangda Tan commented on YARN-4719: ---------------------------------- Hi [~kasha], bq. through addNode and removeNode so total_cluster_resources, total_inflated_cluster_resources (for YARN-1011), max_cluster_resources are not affected by other scheduler code. I may not understand about this, could you elaborate? For handle scheduler code to iterate nodes, we could either: # Use concurrent map to avoid locking, and code will not break. Drawback: we need to handle stale data. # Expose lock to external caller, so scheduler can get readlock of ClusterNodeTracker and do iteration. Drawback: iteration nodes and allocating containers could lock ClusterNodeTracker for long time. # Assume synchronize lock of scheduler will be acquired when make changes to ClusterNodeTracker (like addNode, removeNode, etc.), and also when iterating nodes. We don't need extra lock of returned node collections. Drawback: this hides locks to external caller behaviors, and in the future scheduler could remove synchronized lock to get better performance. I would suggest to look at if #1 is doable (handle stale data and assumes eventually consistency). #1 should have best performance and flexible to future scheduler changes. > Add a helper library to maintain node state and allows common queries > --------------------------------------------------------------------- > > Key: YARN-4719 > URL: https://issues.apache.org/jira/browse/YARN-4719 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler > Affects Versions: 2.8.0 > Reporter: Karthik Kambatla > Assignee: Karthik Kambatla > Attachments: yarn-4719-1.patch, yarn-4719-2.patch, yarn-4719-3.patch > > > The scheduler could use a helper library to maintain node state and allowing > matching/sorting queries. Several reasons for this: > # Today, a lot of the node state management is done separately in each > scheduler. Having a single library will take us that much closer to reducing > duplication among schedulers. > # Adding a filtering/matching API would simplify node labels and locality > significantly. > # An API that returns a sorted list for a custom comparator would help > YARN-1011 where we want to sort by allocation and utilization for > continuous/asynchronous and opportunistic scheduling respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)