[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm
[ https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350732#comment-16350732 ] Arun Suresh commented on YARN-7839: --- Thanks for the patch [~pgaref] It looks pretty straight forward to me. +1 will commit this shortly. > Check node capacity before placing in the Algorithm > --- > > Key: YARN-7839 > URL: https://issues.apache.org/jira/browse/YARN-7839 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: YARN-7839-YARN-6592.001.patch > > > Currently, the Algorithm assigns a node to a request purely based on if the > constraints are met. It is later in the scheduling phase that the Queue > capacity and Node capacity are checked. If the request cannot be placed > because of unavailable Queue/Node capacity, the request is retried by the > Algorithm. > For clusters that are running at high utilization, we can reduce the retries > if we perform the Node capacity check in the Algorithm as well. The Queue > capacity check and the other user limit checks can still be handled by the > scheduler (since queues and other limits are tied to the scheduler, and not > scheduler agnostic) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm
[ https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350731#comment-16350731 ] Arun Suresh commented on YARN-7839: --- [~sunilg], regarding the {{CandidateNodeSet}}, lets move the discussion to when we refactor the {{AppSchedulingInfo}} - since this patch is isolated to the algorithm. [~kkaranasos] comment: bq. However, what about the case that a node seems full but a container is about to finish (and will be finished until the allocate is done)? Should we completely reject such nodes, or simply give higher priority to nodes that already have available resources? We are not rejecting those resources. If a Scheduling request cannot be satisfied by any node in the algorithm round, it will be retried in the next AM heartbeat - and hopefully some of those containers would complete by then. We can set the retry to a higher value for clusters that are running at a higher utilization. > Check node capacity before placing in the Algorithm > --- > > Key: YARN-7839 > URL: https://issues.apache.org/jira/browse/YARN-7839 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: YARN-7839-YARN-6592.001.patch > > > Currently, the Algorithm assigns a node to a request purely based on if the > constraints are met. It is later in the scheduling phase that the Queue > capacity and Node capacity are checked. If the request cannot be placed > because of unavailable Queue/Node capacity, the request is retried by the > Algorithm. > For clusters that are running at high utilization, we can reduce the retries > if we perform the Node capacity check in the Algorithm as well. The Queue > capacity check and the other user limit checks can still be handled by the > scheduler (since queues and other limits are tied to the scheduler, and not > scheduler agnostic) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm
[ https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350656#comment-16350656 ] Sunil G commented on YARN-7839: --- bq.despite the naming, as far as I know, the candidateNodeSet is currently always only a single node [~kkaranasos] and [~asuresh] for multi node, CandidateNodeSet was ideal interface to extend for. So multiple nodes could come in tat iterator. > Check node capacity before placing in the Algorithm > --- > > Key: YARN-7839 > URL: https://issues.apache.org/jira/browse/YARN-7839 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: YARN-7839-YARN-6592.001.patch > > > Currently, the Algorithm assigns a node to a request purely based on if the > constraints are met. It is later in the scheduling phase that the Queue > capacity and Node capacity are checked. If the request cannot be placed > because of unavailable Queue/Node capacity, the request is retried by the > Algorithm. > For clusters that are running at high utilization, we can reduce the retries > if we perform the Node capacity check in the Algorithm as well. The Queue > capacity check and the other user limit checks can still be handled by the > scheduler (since queues and other limits are tied to the scheduler, and not > scheduler agnostic) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm
[ https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350649#comment-16350649 ] genericqa commented on YARN-7839: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} YARN-6592 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 15s{color} | {color:green} YARN-6592 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} YARN-6592 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} YARN-6592 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} YARN-6592 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} YARN-6592 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} YARN-6592 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 37s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 11s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}106m 23s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServiceAppsNodelabel | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7839 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12908993/YARN-7839-YARN-6592.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 9cb62d03926c 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | YARN-6592 / 8df7666 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/19579/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/19579/testReport/ | | Max. process+thread count | 866 (vs. ulimit of 5000) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemana
[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm
[ https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350496#comment-16350496 ] Panagiotis Garefalakis commented on YARN-7839: -- Submitting a simple patch tracking available cluster resources in the DefaultPlacement algorithm - to support capacity check before placement. The actual check is part of the attemptPlacementOnNode method which could be configured with the **ignoreResourceCheck** flag. In the current patch the check is enabled on placement step and disabled on the validation step. A wrapper class SchedulingRequestWithPlacementAttempt was also introduced to keep track of the failed attempts on the rejected SchedulingRequests. Thoughts? [~asuresh] [~kkaranasos] [~cheersyang] > Check node capacity before placing in the Algorithm > --- > > Key: YARN-7839 > URL: https://issues.apache.org/jira/browse/YARN-7839 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: YARN-7839-YARN-6592.001.patch > > > Currently, the Algorithm assigns a node to a request purely based on if the > constraints are met. It is later in the scheduling phase that the Queue > capacity and Node capacity are checked. If the request cannot be placed > because of unavailable Queue/Node capacity, the request is retried by the > Algorithm. > For clusters that are running at high utilization, we can reduce the retries > if we perform the Node capacity check in the Algorithm as well. The Queue > capacity check and the other user limit checks can still be handled by the > scheduler (since queues and other limits are tied to the scheduler, and not > scheduler agnostic) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm
[ https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343850#comment-16343850 ] Konstantinos Karanasos commented on YARN-7839: -- I think the change makes sense, [~asuresh]. However, what about the case that a node seems full but a container is about to finish (and will be finished until the allocate is done)? Should we completely reject such nodes, or simply give higher priority to nodes that already have available resources? {quote}getPreferredNodeIterator(CandidateNodeSet candidateNodeSet) {quote} [~cheersyang], despite the naming, as far as I know, the candidateNodeSet is currently always only a single node... > Check node capacity before placing in the Algorithm > --- > > Key: YARN-7839 > URL: https://issues.apache.org/jira/browse/YARN-7839 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Priority: Major > > Currently, the Algorithm assigns a node to a requests purely based on if the > constraints are met. It is later in the scheduling phase that the Queue > capacity and Node capacity are checked. If the request cannot be placed > because of unavailable Queue/Node capacity, the request is retried by the > Algorithm. > For clusters that are running at high utilization, we can reduce the retries > if we perform the Node capacity check in the Algorithm as well. The Queue > capacity check and the other user limit checks can still be handled by the > scheduler (since queues and other limits are tied to the scheduler, and not > scheduler agnostic) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm
[ https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343154#comment-16343154 ] Weiwei Yang commented on YARN-7839: --- Hi [~asuresh] OK, I think I was talking about the AppPlacementAllocator approach, because I noticed \{{AppPlacementAllocator#getPreferredNodeIterator(CandidateNodeSet candidateNodeSet)}} in API, all I was thinking is to use placement constraint to filter out such candidate nodes. For the processor approach, I do agree your proposed approach can help, it won't create too much overhead as it only checks in-memory data and doesn't hold any lock, it should help. I just could not tell how much it helps on a real cluster. > Check node capacity before placing in the Algorithm > --- > > Key: YARN-7839 > URL: https://issues.apache.org/jira/browse/YARN-7839 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Priority: Major > > Currently, the Algorithm assigns a node to a requests purely based on if the > constraints are met. It is later in the scheduling phase that the Queue > capacity and Node capacity are checked. If the request cannot be placed > because of unavailable Queue/Node capacity, the request is retried by the > Algorithm. > For clusters that are running at high utilization, we can reduce the retries > if we perform the Node capacity check in the Algorithm as well. The Queue > capacity check and the other user limit checks can still be handled by the > scheduler (since queues and other limits are tied to the scheduler, and not > scheduler agnostic) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm
[ https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343133#comment-16343133 ] Arun Suresh commented on YARN-7839: --- [~cheersyang], bq. Instead, I am thinking... can we generate a list of ordered candidate nodes for each allocation (based on some policy), then let scheduler work on such candidate set of nodes to pick up one that fulfills scheduler's requests? Candidate nodes are not a bad idea (I remember we discussed it briefly earlier). The reason I did not want to attempt it initially was for the following reasons: # In the current scheme, we do a re-place and then re-schedule ONLY if the initial placement was unsuccessful. This means the algorithm can bail as soon as it finds the first viable node - this makes the algorithm simpler and the Datastructures that the algorithm returns back simpler. If the algorithm were to output a list of candidate nodes, there is a very good chance, it would have to loop through more nodes (and possibly the entire nodeset) per request. Also, for cases a node affinity, candidate selection would be complicated. For eg. if *foo* has to be placed only in Nodes that have *bar*, and if we have 5 candidates for *bar*, the 5 candidates for *foo* would also have to depend on the *bar* candidates. # The Scheduler's attemptOnNode takes multiple locks - lock on the app, node and queue. Passing in multiple nodes means the lock will be held for a longer duration - which can cause problems - we have experience many such issues, espescially, when applications complete before its containers complete which results in releseContainer taking a lock on multiple nodes. In any case, even if we were to try candidate nodes, I still believe letting the algorithm query the nodes' capacity at the time of placing might not be bad. I prefer we do not hold a lock - since one of the motivations of separating out the Placement phase from the scheduling phase is so that the placement can operate on a loosly consistent view of the cluster. > Check node capacity before placing in the Algorithm > --- > > Key: YARN-7839 > URL: https://issues.apache.org/jira/browse/YARN-7839 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Priority: Major > > Currently, the Algorithm assigns a node to a requests purely based on if the > constraints are met. It is later in the scheduling phase that the Queue > capacity and Node capacity are checked. If the request cannot be placed > because of unavailable Queue/Node capacity, the request is retried by the > Algorithm. > For clusters that are running at high utilization, we can reduce the retries > if we perform the Node capacity check in the Algorithm as well. The Queue > capacity check and the other user limit checks can still be handled by the > scheduler (since queues and other limits are tied to the scheduler, and not > scheduler agnostic) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm
[ https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343069#comment-16343069 ] Weiwei Yang commented on YARN-7839: --- Hi [~asuresh] When there is multiple threads running placement algorithm, it could be true when you check a node capacity in both places, but eventually got rejected when some other container allocated to this node. As you don't hold the SchedulerNode's lock to update resource when you attempt to place the request. So I am not sure how much this can improve in a highly utilized cluster. Instead, I am thinking... can we generate *a list of ordered candidate nodes* for each allocation (based on some policy), then let scheduler work on such candidate set of nodes to pick up one that fulfills scheduler's requests? Thanks > Check node capacity before placing in the Algorithm > --- > > Key: YARN-7839 > URL: https://issues.apache.org/jira/browse/YARN-7839 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Priority: Major > > Currently, the Algorithm assigns a node to a requests purely based on if the > constraints are met. It is later in the scheduling phase that the Queue > capacity and Node capacity are checked. If the request cannot be placed > because of unavailable Queue/Node capacity, the request is retried by the > Algorithm. > For clusters that are running at high utilization, we can reduce the retries > if we perform the Node capacity check in the Algorithm as well. The Queue > capacity check and the other user limit checks can still be handled by the > scheduler (since queues and other limits are tied to the scheduler, and not > scheduler agnostic) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm
[ https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16342973#comment-16342973 ] Arun Suresh commented on YARN-7839: --- This is a fairly trivial change - we just need to add an additional check in the {{DefaultPlacementAlgorithm#attemptPlacementOnNode()}} method. Thoughts [~kkaranasos] / [~cheersyang] ? > Check node capacity before placing in the Algorithm > --- > > Key: YARN-7839 > URL: https://issues.apache.org/jira/browse/YARN-7839 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Priority: Major > > Currently, the Algorithm assigns a node to a requests purely based on if the > constraints are met. It is later in the scheduling phase that the Queue > capacity and Node capacity are checked. If the request cannot be placed > because of unavailable Queue/Node capacity, the request is retried by the > Algorithm. > For clusters that are running at high utilization, we can reduce the retries > if we perform the Node capacity check in the Algorithm as well. The Queue > capacity check can still be handled by the scheduler (since queues are tied > to the scheduler) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org