[ https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776502#comment-16776502 ]
Wilfred Spiegelenburg commented on YARN-9278: --------------------------------------------- [~uranus] I can understand that you want to limit the number of nodes to look at for pre-emption in large clusters. I could speed things up in certain cases. However when I look at the way we identify we already break out of the loop when we get to a node that gives back a container list without AMs. In {{identifyContainersToPreemptForOneContainer}} we break out of the loop checking nodes when {{numAMContainers}} was 0. So we do already break out of the loop looking for suitable nodes. Based on your comment this will change will introduce a trade of between AMs and nodes. You propose to stop checking nodes even if we still have AMs in the list. In other words you are willing to accept some AMs in the list even if that has side effects on those applications. I don't think that that is a good idea. I do agree with you that for the ANY resource we probably want to do something else and not just grab the first nodes out of the list all the time. The list that comes back from the node tracker is unsorted and just a copy of what is known without a filter. We should introduce some logic to not just use a for loop to run over the list from the start. If we use a seeded start point somewhere in the list which moves around we spread our preemption better. We could base the starting point on the current time (second) and the size of the list returned. I don't think we need that if the list is smaller than a hard coded number (maybe 50 or 100) but it would really help in large clusters. > Shuffle nodes when selecting to be preempted nodes > -------------------------------------------------- > > Key: YARN-9278 > URL: https://issues.apache.org/jira/browse/YARN-9278 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler > Reporter: Zhaohui Xin > Assignee: Zhaohui Xin > Priority: Major > > We should *shuffle* the nodes to avoid some nodes being preempted frequently. > Also, we should *limit* the num of nodes to make preemption more efficient. > Just like this, > {code:java} > // we should not iterate all nodes, that will be very slow > long maxTryNodeNum = > context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce(); > if (potentialNodes.size() > maxTryNodeNum){ > Collections.shuffle(potentialNodes); > List<FSSchedulerNode> newPotentialNodes = new ArrayList<FSSchedulerNode>(); > for (int i = 0; i < maxTryNodeNum; i++){ > newPotentialNodes.add(potentialNodes.get(i)); > } > potentialNodes = newPotentialNodes; > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org