[ https://issues.apache.org/jira/browse/YARN-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176278#comment-15176278 ]
Wangda Tan commented on YARN-4719: ---------------------------------- [~kasha], bq. Not sure I understand the suggestion. Elaborate? In ver.2 patch, getAllNodes uses shallowCopy, what I meant is instead of copying the entire HashMap, you can use ConcurrentMap instead. In ver.3 patch, you removed shallowCopy and returns HashMap.values(), if node removed while someone is iterating values(), the behavior is undefined. See: [javadoc|https://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html#values()] bq. I feel any logic that has to iterate through all nodes should go through ClusterNodeTracker - that way, we don't run into cases where we access the list of nodes without a lock. As I commented above, we can use ConcurrentMap instead of locking ClusterNodeTracker. Do you need strong consistency for addBlacklistedNodeIdsToList? (Because node list could be updated while we updating blacklistedNodes. bq. Any particular reason you think this doesn't belong here? I would prefer to keep cleaner responsibility of ClusterNodeTracker, if we adds application logic here, we could add any logic related to SchedulerNode to this class as well. This refactoring patch is majorly for code clean up to me, I think it's better to keep it clean from the beginning. > Add a helper library to maintain node state and allows common queries > --------------------------------------------------------------------- > > Key: YARN-4719 > URL: https://issues.apache.org/jira/browse/YARN-4719 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler > Affects Versions: 2.8.0 > Reporter: Karthik Kambatla > Assignee: Karthik Kambatla > Attachments: yarn-4719-1.patch, yarn-4719-2.patch, yarn-4719-3.patch > > > The scheduler could use a helper library to maintain node state and allowing > matching/sorting queries. Several reasons for this: > # Today, a lot of the node state management is done separately in each > scheduler. Having a single library will take us that much closer to reducing > duplication among schedulers. > # Adding a filtering/matching API would simplify node labels and locality > significantly. > # An API that returns a sorted list for a custom comparator would help > YARN-1011 where we want to sort by allocation and utilization for > continuous/asynchronous and opportunistic scheduling respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)