[jira] [Updated] (YARN-10590) Fix TestCapacitySchedulerAutoCreatedQueueBase witch related absolute caculation loss.
[ https://issues.apache.org/jira/browse/YARN-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated YARN-10590: - Description: "Because as we discussed in YARN-10504 , the initialization of auto created queues from template was changed (see comment and comment)." 1. As the comment we discussed, we found the effective core is different(the gap), because the update effective will override the absolute auto created leaf queue. 2. But actually, the new logic in YARN-10504 override is right, the difference is caused by test case , don't consider the caculation loss of multi resource type, the cap/absolute are all caculated by one type, (memory) in default resourcecaculator, (dominant type) in dominant resourcecaculator. As we known in the comment, the absolute auto created leaf queue will merge the effective resource by cap/absolute caculated result, this caused the gap. 2. In other case(not absolute case) in the auto created leaf queue, the merge will not cause the gap, in update effective resource override will also use the one type caculated result. 3. So this jira just make the test right, the caculation result is already right. was: Because we have introduced YARN-10504. In auto created leaf queue with absolute mode, the caculation of effective min resource will different from other mode. We should fix the related test in TestCapacitySchedulerAutoCreatedQueueBase. > Fix TestCapacitySchedulerAutoCreatedQueueBase witch related absolute > caculation loss. > - > > Key: YARN-10590 > URL: https://issues.apache.org/jira/browse/YARN-10590 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: YARN-10590.001.patch, YARN-10590.002.patch > > > "Because as we discussed in YARN-10504 , the initialization of auto created > queues from template was changed (see comment and comment)." > 1. As the comment we discussed, we found the effective core is different(the > gap), because the update effective will override the absolute auto created > leaf queue. > 2. But actually, the new logic in YARN-10504 override is right, the > difference is caused by test case , don't consider the caculation loss of > multi resource type, the cap/absolute are all caculated by one type, (memory) > in default resourcecaculator, (dominant type) in dominant resourcecaculator. > As we known in the comment, the absolute auto created leaf queue will merge > the effective resource by cap/absolute caculated result, this caused the gap. > 2. In other case(not absolute case) in the auto created leaf queue, the merge > will not cause the gap, in update effective resource override will also use > the one type caculated result. > 3. So this jira just make the test right, the caculation result is already > right. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10352) Skip schedule on not heartbeated nodes in Multi Node Placement
[ https://issues.apache.org/jira/browse/YARN-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17274196#comment-17274196 ] Zhankun Tang commented on YARN-10352: - Sorry for the late reply. Thanks for the contribution [~prabhujoseph] and [~zhuqi]. And thanks for the review! [~bibinchundatt] The "filteringIterator" costs me quite a while to understand. Especially the confusing "cache" filed member. It's better to call it "nextObject". But that's a minor suggestion. +1 from me. > Skip schedule on not heartbeated nodes in Multi Node Placement > -- > > Key: YARN-10352 > URL: https://issues.apache.org/jira/browse/YARN-10352 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0, 3.4.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: capacityscheduler, multi-node-placement > Attachments: YARN-10352-001.patch, YARN-10352-002.patch, > YARN-10352-003.patch, YARN-10352-004.patch, YARN-10352-005.patch, > YARN-10352-006.patch, YARN-10352-007.patch, YARN-10352-008.patch, > YARN-10352-010.patch, YARN-10352.009.patch > > > When Node Recovery is Enabled, Stopping a NM won't unregister to RM. So RM > Active Nodes will be still having those stopped nodes until NM Liveliness > Monitor Expires after configured timeout > (yarn.nm.liveness-monitor.expiry-interval-ms = 10 mins). During this 10mins, > Multi Node Placement assigns the containers on those nodes. They need to > exclude the nodes which has not heartbeated for configured heartbeat interval > (yarn.resourcemanager.nodemanagers.heartbeat-interval-ms=1000ms) similar to > Asynchronous Capacity Scheduler Threads. > (CapacityScheduler#shouldSkipNodeSchedule) > *Repro:* > 1. Enable Multi Node Placement > (yarn.scheduler.capacity.multi-node-placement-enabled) + Node Recovery > Enabled (yarn.node.recovery.enabled) > 2. Have only one NM running say worker0 > 3. Stop worker0 and start any other NM say worker1 > 4. Submit a sleep job. The containers will timeout as assigned to stopped NM > worker0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10590) Fix TestCapacitySchedulerAutoCreatedQueueBase witch related absolute caculation loss.
[ https://issues.apache.org/jira/browse/YARN-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17274178#comment-17274178 ] zhuqi edited comment on YARN-10590 at 1/29/21, 7:09 AM: Thanks [~bteke] for review. "Because as we discussed in YARN-10504 , the initialization of auto created queues from template was changed (see comment and comment)." 1. As the comment we discussed, we found the effective core is different(the gap), because the update effective will override the absolute auto created leaf queue. 2. But actually, the new logic in YARN-10504 override is right, the difference is caused by test case , don't consider the caculation loss of multi resource type, the cap/absolute are all caculated by one type, (memory) in default resourcecaculator, (dominant type) in dominant resourcecaculator. As we known in the comment, the absolute auto created leaf queue will merge the effective resource by cap/absolute caculated result, this caused the gap. 2. In other case(not absolute case) in the auto created leaf queue, the merge will not cause the gap, in update effective resource override will also use the one type caculated result. 3. So this jira just make the test right, the caculation result is already right. In future we should make multi type caculation of cap/absolute, consistent with the absolute multi resource type. In Jira : YARN-10592 I introduced the vector of cap/absolute, it will make this consistent . Also in the future, we will handle other cases about multi resource type cases, it is a more huge thing to do, may be we need a new Umbrella to forward this. cc [~wangda] if you any thoughts about this and YARN-10592. Thanks. was (Author: zhuqi): Thanks [~bteke] for review. "Because as we discussed in YARN-10504 , the initialization of auto created queues from template was changed (see comment and comment)." 1. As the comment we discussed, we found the effective core is different(the gap), because the update effective will override the absolute auto created leaf queue. 2. But actually, the override is right, the difference is caused by test case , don't consider the caculation loss of multi resource type, the cap/absolute are all caculated by one type, (memory) in default resourcecaculator, (dominant type) in dominant resourcecaculator. As we known in the comment, the absolute auto created leaf queue will merge the effective resource by cap/absolute caculated result, this caused the gap. 2. In other case(not absolute case) in the auto created leaf queue, the merge will not cause the gap, in update effective resource override will also use the one type caculated result. 3. So this jira just make the test right, the caculation result is already right. In future we should make multi type caculation of cap/absolute, consistent with the absolute multi resource type. In Jira : YARN-10592 I introduced the vector of cap/absolute, it will make this consistent . Also in the future, we will handle other cases about multi resource type cases, it is a more huge thing to do, may be we need a new Umbrella to forward this. cc [~wangda] if you any thoughts about this and YARN-10592. Thanks. > Fix TestCapacitySchedulerAutoCreatedQueueBase witch related absolute > caculation loss. > - > > Key: YARN-10590 > URL: https://issues.apache.org/jira/browse/YARN-10590 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: YARN-10590.001.patch, YARN-10590.002.patch > > > Because we have introduced YARN-10504. > In auto created leaf queue with absolute mode, the caculation of effective > min resource will different from other mode. > We should fix the related test in TestCapacitySchedulerAutoCreatedQueueBase. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10590) Fix TestCapacitySchedulerAutoCreatedQueueBase witch related absolute caculation loss.
[ https://issues.apache.org/jira/browse/YARN-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17274178#comment-17274178 ] zhuqi commented on YARN-10590: -- Thanks [~bteke] for review. "Because as we discussed in YARN-10504 , the initialization of auto created queues from template was changed (see comment and comment)." 1. As the comment we discussed, we found the effective core is different(the gap), because the update effective will override the absolute auto created leaf queue. 2. But actually, the override is right, the difference is caused by test case , don't consider the caculation loss of multi resource type, the cap/absolute are all caculated by one type, (memory) in default resourcecaculator, (dominant type) in dominant resourcecaculator. As we known in the comment, the absolute auto created leaf queue will merge the effective resource by cap/absolute caculated result, this caused the gap. 2. In other case(not absolute case) in the auto created leaf queue, the merge will not cause the gap, in update effective resource override will also use the one type caculated result. 3. So this jira just make the test right, the caculation result is already right. In future we should make multi type caculation of cap/absolute, consistent with the absolute multi resource type. In Jira : YARN-10592 I introduced the vector of cap/absolute, it will make this consistent . Also in the future, we will handle other cases about multi resource type cases, it is a more huge thing to do, may be we need a new Umbrella to forward this. cc [~wangda] if you any thoughts about this and YARN-10592. Thanks. > Fix TestCapacitySchedulerAutoCreatedQueueBase witch related absolute > caculation loss. > - > > Key: YARN-10590 > URL: https://issues.apache.org/jira/browse/YARN-10590 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: YARN-10590.001.patch, YARN-10590.002.patch > > > Because we have introduced YARN-10504. > In auto created leaf queue with absolute mode, the caculation of effective > min resource will different from other mode. > We should fix the related test in TestCapacitySchedulerAutoCreatedQueueBase. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17274163#comment-17274163 ] zhuqi commented on YARN-10178: -- [~wangda] [~bteke] I have updated a patch to sort PriorityQueueResourcesForSorting, and add reference to queue. I also add tests to prevent side effect/regression. After the performance test, i found the there seems no performance cost: 1. And i am surprise that the queue size is about 1000, the new sort fast than old. 2. When the queue size is huge big : 1, the old fast than the old, but the gap is always less than 10s. If you any thoughts about this? Thanks a lot. > Global Scheduler async thread crash caused by 'Comparison method violates its > general contract' > --- > > Key: YARN-10178 > URL: https://issues.apache.org/jira/browse/YARN-10178 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.2.1 >Reporter: tuyu >Assignee: zhuqi >Priority: Major > Attachments: YARN-10178.001.patch, YARN-10178.002.patch, > YARN-10178.003.patch > > > Global Scheduler Async Thread crash stack > {code:java} > ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, > Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: > Comparison method violates its general contract! >at > java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1462) > at java.util.Collections.sort(Collections.java:177) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616) > {code} > JAVA 8 Arrays.sort default use timsort algo, and timsort has few require > {code:java} > 1.x.compareTo(y) != y.compareTo(x) > 2.x>y,y>z --> x > z > 3.x=y, x.compareTo(z) == y.compareTo(z) > {code} > if not Arrays paramters not satify this require,TimSort will throw > 'java.lang.IllegalArgumentException' > look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know > Capacity Scheduler use this these queue resource usage to compare > {code:java} > AbsoluteUsedCapacity > UsedCapacity > ConfiguredMinResource > AbsoluteCapacity > {code} > In Capacity Scheduler Global Scheduler AsyncThread use > PriorityUtilizationQueueOrderingPolicy function to choose queue to assign > container,and construct a CSAssignment struct, and use > submitResourceCommitRequest function add CSAssignment to backlogs > ResourceCommitterService will tryCommit this CSAssignment,look tryCommit > function,there will update queue resource usage > {code:java} > public boolean tryCommit(Resource cluster, ResourceCommitRequest r, > boolean updatePending) { > long commitStart = System.nanoTime(); > ResourceCommitRequest request = > (ResourceCommitRequest) r; > > ... > boolean isSuccess = false; > if (attemptId != null) { > FiCaSchedulerApp app =
[jira] [Updated] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated YARN-10178: - Attachment: YARN-10178.003.patch > Global Scheduler async thread crash caused by 'Comparison method violates its > general contract' > --- > > Key: YARN-10178 > URL: https://issues.apache.org/jira/browse/YARN-10178 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.2.1 >Reporter: tuyu >Assignee: zhuqi >Priority: Major > Attachments: YARN-10178.001.patch, YARN-10178.002.patch, > YARN-10178.003.patch > > > Global Scheduler Async Thread crash stack > {code:java} > ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, > Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: > Comparison method violates its general contract! >at > java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1462) > at java.util.Collections.sort(Collections.java:177) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616) > {code} > JAVA 8 Arrays.sort default use timsort algo, and timsort has few require > {code:java} > 1.x.compareTo(y) != y.compareTo(x) > 2.x>y,y>z --> x > z > 3.x=y, x.compareTo(z) == y.compareTo(z) > {code} > if not Arrays paramters not satify this require,TimSort will throw > 'java.lang.IllegalArgumentException' > look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know > Capacity Scheduler use this these queue resource usage to compare > {code:java} > AbsoluteUsedCapacity > UsedCapacity > ConfiguredMinResource > AbsoluteCapacity > {code} > In Capacity Scheduler Global Scheduler AsyncThread use > PriorityUtilizationQueueOrderingPolicy function to choose queue to assign > container,and construct a CSAssignment struct, and use > submitResourceCommitRequest function add CSAssignment to backlogs > ResourceCommitterService will tryCommit this CSAssignment,look tryCommit > function,there will update queue resource usage > {code:java} > public boolean tryCommit(Resource cluster, ResourceCommitRequest r, > boolean updatePending) { > long commitStart = System.nanoTime(); > ResourceCommitRequest request = > (ResourceCommitRequest) r; > > ... > boolean isSuccess = false; > if (attemptId != null) { > FiCaSchedulerApp app = getApplicationAttempt(attemptId); > // Required sanity check for attemptId - when async-scheduling enabled, > // proposal might be outdated if AM failover just finished > // and proposal queue was not be consumed in time > if (app != null && attemptId.equals(app.getApplicationAttemptId())) { > if (app.accept(cluster, request, updatePending) > && app.apply(cluster, request, updatePending)) { // apply this > resource > ... > } > } > } > return
[jira] [Commented] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17274122#comment-17274122 ] zhuqi commented on YARN-10178: -- Thanks a lot for [~wangda] [~bteke] review. It makes sense to me, [~wangda] you are right. Compare will be called multiple times during one sorting call, i am not aware this, this is the key to change my code. I will try to add tests to prevent side effect/regression, after change the logic. > Global Scheduler async thread crash caused by 'Comparison method violates its > general contract' > --- > > Key: YARN-10178 > URL: https://issues.apache.org/jira/browse/YARN-10178 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.2.1 >Reporter: tuyu >Assignee: zhuqi >Priority: Major > Attachments: YARN-10178.001.patch, YARN-10178.002.patch > > > Global Scheduler Async Thread crash stack > {code:java} > ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, > Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: > Comparison method violates its general contract! >at > java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1462) > at java.util.Collections.sort(Collections.java:177) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616) > {code} > JAVA 8 Arrays.sort default use timsort algo, and timsort has few require > {code:java} > 1.x.compareTo(y) != y.compareTo(x) > 2.x>y,y>z --> x > z > 3.x=y, x.compareTo(z) == y.compareTo(z) > {code} > if not Arrays paramters not satify this require,TimSort will throw > 'java.lang.IllegalArgumentException' > look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know > Capacity Scheduler use this these queue resource usage to compare > {code:java} > AbsoluteUsedCapacity > UsedCapacity > ConfiguredMinResource > AbsoluteCapacity > {code} > In Capacity Scheduler Global Scheduler AsyncThread use > PriorityUtilizationQueueOrderingPolicy function to choose queue to assign > container,and construct a CSAssignment struct, and use > submitResourceCommitRequest function add CSAssignment to backlogs > ResourceCommitterService will tryCommit this CSAssignment,look tryCommit > function,there will update queue resource usage > {code:java} > public boolean tryCommit(Resource cluster, ResourceCommitRequest r, > boolean updatePending) { > long commitStart = System.nanoTime(); > ResourceCommitRequest request = > (ResourceCommitRequest) r; > > ... > boolean isSuccess = false; > if (attemptId != null) { > FiCaSchedulerApp app = getApplicationAttempt(attemptId); > // Required sanity check for attemptId - when async-scheduling enabled, > // proposal might be outdated if AM failover just finished > // and proposal queue was not be consumed in time
[jira] [Commented] (YARN-10600) Convert root queue in fs2cs weight mode conversion
[ https://issues.apache.org/jira/browse/YARN-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273966#comment-17273966 ] Peter Bacsko commented on YARN-10600: - +1. Thanks for the patch [~bteke] and [~gandras] for the review, committed to trunk. > Convert root queue in fs2cs weight mode conversion > -- > > Key: YARN-10600 > URL: https://issues.apache.org/jira/browse/YARN-10600 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10600.001.patch > > > Weight mode conversion was added to fs2cs in YARN-10525. However root queue > doesn't get converted to weight mode, which can be misleading to former FS > users, so it should be added to fs2cs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273944#comment-17273944 ] Wangda Tan commented on YARN-10178: --- [~zhuqi], [~bteke], I'm wondering if it is the right fix, now we initialize PriorityQueueResourcesForSorting during compare(). However, compare will be called multiple times during one sorting call. It is still possible we can get an inconsistent result like before. To me the right fix should be done inside PriorityUtilizationQueueOrderingPolicy#getAssignmentIterator: We should take snapshots of ParentQueue's PriorityQueueResourcesForSorting first, and then sort PriorityQueueResourcesForSorting. (We may need to add a reference to ParentQueue from PriorityQueueResourcesForSorting). And can we add tests to prevent side effect/regression? > Global Scheduler async thread crash caused by 'Comparison method violates its > general contract' > --- > > Key: YARN-10178 > URL: https://issues.apache.org/jira/browse/YARN-10178 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.2.1 >Reporter: tuyu >Assignee: zhuqi >Priority: Major > Attachments: YARN-10178.001.patch, YARN-10178.002.patch > > > Global Scheduler Async Thread crash stack > {code:java} > ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, > Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: > Comparison method violates its general contract! >at > java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1462) > at java.util.Collections.sort(Collections.java:177) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616) > {code} > JAVA 8 Arrays.sort default use timsort algo, and timsort has few require > {code:java} > 1.x.compareTo(y) != y.compareTo(x) > 2.x>y,y>z --> x > z > 3.x=y, x.compareTo(z) == y.compareTo(z) > {code} > if not Arrays paramters not satify this require,TimSort will throw > 'java.lang.IllegalArgumentException' > look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know > Capacity Scheduler use this these queue resource usage to compare > {code:java} > AbsoluteUsedCapacity > UsedCapacity > ConfiguredMinResource > AbsoluteCapacity > {code} > In Capacity Scheduler Global Scheduler AsyncThread use > PriorityUtilizationQueueOrderingPolicy function to choose queue to assign > container,and construct a CSAssignment struct, and use > submitResourceCommitRequest function add CSAssignment to backlogs > ResourceCommitterService will tryCommit this CSAssignment,look tryCommit > function,there will update queue resource usage > {code:java} > public boolean tryCommit(Resource cluster, ResourceCommitRequest r, > boolean updatePending) { > long commitStart = System.nanoTime(); > ResourceCommitRequest request = >
[jira] [Commented] (YARN-10600) Convert root queue in fs2cs weight mode conversion
[ https://issues.apache.org/jira/browse/YARN-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273895#comment-17273895 ] Hadoop QA commented on YARN-10600: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 22s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 7s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 2s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 50s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 48s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/560/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-warnings.html{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager in trunk has 1 extant findbugs warnings. {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 9s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green}
[jira] [Commented] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273883#comment-17273883 ] Benjamin Teke commented on YARN-10178: -- [~zhuqi] thanks for the patch. The logic looks good to me, +1 (non-binding) from my side. > Global Scheduler async thread crash caused by 'Comparison method violates its > general contract' > --- > > Key: YARN-10178 > URL: https://issues.apache.org/jira/browse/YARN-10178 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.2.1 >Reporter: tuyu >Assignee: zhuqi >Priority: Major > Attachments: YARN-10178.001.patch, YARN-10178.002.patch > > > Global Scheduler Async Thread crash stack > {code:java} > ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, > Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: > Comparison method violates its general contract! >at > java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1462) > at java.util.Collections.sort(Collections.java:177) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616) > {code} > JAVA 8 Arrays.sort default use timsort algo, and timsort has few require > {code:java} > 1.x.compareTo(y) != y.compareTo(x) > 2.x>y,y>z --> x > z > 3.x=y, x.compareTo(z) == y.compareTo(z) > {code} > if not Arrays paramters not satify this require,TimSort will throw > 'java.lang.IllegalArgumentException' > look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know > Capacity Scheduler use this these queue resource usage to compare > {code:java} > AbsoluteUsedCapacity > UsedCapacity > ConfiguredMinResource > AbsoluteCapacity > {code} > In Capacity Scheduler Global Scheduler AsyncThread use > PriorityUtilizationQueueOrderingPolicy function to choose queue to assign > container,and construct a CSAssignment struct, and use > submitResourceCommitRequest function add CSAssignment to backlogs > ResourceCommitterService will tryCommit this CSAssignment,look tryCommit > function,there will update queue resource usage > {code:java} > public boolean tryCommit(Resource cluster, ResourceCommitRequest r, > boolean updatePending) { > long commitStart = System.nanoTime(); > ResourceCommitRequest request = > (ResourceCommitRequest) r; > > ... > boolean isSuccess = false; > if (attemptId != null) { > FiCaSchedulerApp app = getApplicationAttempt(attemptId); > // Required sanity check for attemptId - when async-scheduling enabled, > // proposal might be outdated if AM failover just finished > // and proposal queue was not be consumed in time > if (app != null && attemptId.equals(app.getApplicationAttemptId())) { > if (app.accept(cluster, request, updatePending) > && app.apply(cluster, request, updatePending)) {
[jira] [Commented] (YARN-10600) Convert root queue in fs2cs weight mode conversion
[ https://issues.apache.org/jira/browse/YARN-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273863#comment-17273863 ] Andras Gyori commented on YARN-10600: - Thank you [~bteke]! The overall logic seems good to me +1. > Convert root queue in fs2cs weight mode conversion > -- > > Key: YARN-10600 > URL: https://issues.apache.org/jira/browse/YARN-10600 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10600.001.patch > > > Weight mode conversion was added to fs2cs in YARN-10525. However root queue > doesn't get converted to weight mode, which can be misleading to former FS > users, so it should be added to fs2cs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10590) Fix TestCapacitySchedulerAutoCreatedQueueBase witch related absolute caculation loss.
[ https://issues.apache.org/jira/browse/YARN-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273860#comment-17273860 ] Benjamin Teke commented on YARN-10590: -- Thanks [~zhuqi], for working on this. Can you please elaborate on the part where you talk about the validity of the current logic? Because as we discussed in YARN-10504 the initialization of auto created queues from template was changed (see [comment|https://issues.apache.org/jira/browse/YARN-10504?focusedCommentId=17261549=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17261549] and [comment|https://issues.apache.org/jira/browse/YARN-10504?focusedCommentId=17262562=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17262562]). Also once we decide that we follow with the test modifications we probably should modify the jiras description to contain the original logic change introduced in YARN-10504. > Fix TestCapacitySchedulerAutoCreatedQueueBase witch related absolute > caculation loss. > - > > Key: YARN-10590 > URL: https://issues.apache.org/jira/browse/YARN-10590 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: YARN-10590.001.patch, YARN-10590.002.patch > > > Because we have introduced YARN-10504. > In auto created leaf queue with absolute mode, the caculation of effective > min resource will different from other mode. > We should fix the related test in TestCapacitySchedulerAutoCreatedQueueBase. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10600) Convert root queue in fs2cs weight mode conversion
[ https://issues.apache.org/jira/browse/YARN-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Teke updated YARN-10600: - Attachment: YARN-10600.001.patch > Convert root queue in fs2cs weight mode conversion > -- > > Key: YARN-10600 > URL: https://issues.apache.org/jira/browse/YARN-10600 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10600.001.patch > > > Weight mode conversion was added to fs2cs in YARN-10525. However root queue > doesn't get converted to weight mode, which can be misleading to former FS > users, so it should be added to fs2cs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10600) Convert root queue in fs2cs weight mode conversion
[ https://issues.apache.org/jira/browse/YARN-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Teke updated YARN-10600: - Description: Weight mode conversion was added to fs2cs in YARN-10525. However root queue doesn't get converted to weight mode, which can be misleading to former FS users, so it should be added to fs2cs. (was: Weight mode conversion was added to fs2cs in YARN-10525. However root queue doesn't get converted to weight mode, which can be misleading to former FS users. Currently, we convert FS weights to percentages, however, it will be more useful to keep those values and use them in CS as well.) > Convert root queue in fs2cs weight mode conversion > -- > > Key: YARN-10600 > URL: https://issues.apache.org/jira/browse/YARN-10600 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Fix For: 3.4.0 > > > Weight mode conversion was added to fs2cs in YARN-10525. However root queue > doesn't get converted to weight mode, which can be misleading to former FS > users, so it should be added to fs2cs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10600) Convert root queue in fs2cs weight mode conversion
[ https://issues.apache.org/jira/browse/YARN-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Teke updated YARN-10600: - Description: Weight mode conversion was added to fs2cs in YARN-10525. However root queue doesn't get converted to weight mode, which can be misleading to former FS users. Currently, we convert FS weights to percentages, however, it will be more useful to keep those values and use them in CS as well. was: Weight mode will be added to Capacity Scheduler. Currently, we convert FS weights to percentages, however, it will be more useful to keep those values and use them in CS as well. > Convert root queue in fs2cs weight mode conversion > -- > > Key: YARN-10600 > URL: https://issues.apache.org/jira/browse/YARN-10600 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.4.0 > > > Weight mode conversion was added to fs2cs in YARN-10525. However root queue > doesn't get converted to weight mode, which can be misleading to former FS > users. > Currently, we convert FS weights to percentages, however, it will be more > useful to keep those values and use them in CS as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10600) Convert root queue in fs2cs weight mode conversion
[ https://issues.apache.org/jira/browse/YARN-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Teke reassigned YARN-10600: Assignee: Benjamin Teke (was: Peter Bacsko) > Convert root queue in fs2cs weight mode conversion > -- > > Key: YARN-10600 > URL: https://issues.apache.org/jira/browse/YARN-10600 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Fix For: 3.4.0 > > > Weight mode conversion was added to fs2cs in YARN-10525. However root queue > doesn't get converted to weight mode, which can be misleading to former FS > users. > Currently, we convert FS weights to percentages, however, it will be more > useful to keep those values and use them in CS as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10600) Convert root queue in fs2cs weight mode conversion
Benjamin Teke created YARN-10600: Summary: Convert root queue in fs2cs weight mode conversion Key: YARN-10600 URL: https://issues.apache.org/jira/browse/YARN-10600 Project: Hadoop YARN Issue Type: Sub-task Reporter: Benjamin Teke Assignee: Peter Bacsko Fix For: 3.4.0 Weight mode will be added to Capacity Scheduler. Currently, we convert FS weights to percentages, however, it will be more useful to keep those values and use them in CS as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10521) To support the mixed mode on different levels(optional) or disabled all.
[ https://issues.apache.org/jira/browse/YARN-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Gyori resolved YARN-10521. - Resolution: Won't Fix > To support the mixed mode on different levels(optional) or disabled all. > > > Key: YARN-10521 > URL: https://issues.apache.org/jira/browse/YARN-10521 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > > *Mixed* percentage / weight / absolute resource should *not* be allowed *at > the same time* _at all hierarchical levels_. Restricting this to all levels > of the hierarchy is not absolutely necessary, as theoretically it is possible > to support the mixed mode on different levels, but whether it is worth it is > up for debate. > *Mixed* static and auto created under the parent will not be supported if the > queues are defined by percentages. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10521) To support the mixed mode on different levels(optional) or disabled all.
[ https://issues.apache.org/jira/browse/YARN-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273558#comment-17273558 ] Andras Gyori commented on YARN-10521: - I think this design decision has been already made and ready to be closed. Feel free to open it if something still needs to be done. > To support the mixed mode on different levels(optional) or disabled all. > > > Key: YARN-10521 > URL: https://issues.apache.org/jira/browse/YARN-10521 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > > *Mixed* percentage / weight / absolute resource should *not* be allowed *at > the same time* _at all hierarchical levels_. Restricting this to all levels > of the hierarchy is not absolutely necessary, as theoretically it is possible > to support the mixed mode on different levels, but whether it is worth it is > up for debate. > *Mixed* static and auto created under the parent will not be supported if the > queues are defined by percentages. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10589) Improve logic of multi-node allocation
[ https://issues.apache.org/jira/browse/YARN-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273523#comment-17273523 ] Hadoop QA commented on YARN-10589: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 15s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 49s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 40s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 56s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 54s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/559/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-warnings.html{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager in trunk has 1 extant findbugs warnings. {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 52s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 41s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/559/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 9 new + 28 unchanged - 0 fixed = 37 total (was 28) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} |
[jira] [Commented] (YARN-10428) Zombie applications in the YARN queue using FAIR + sizebasedweight
[ https://issues.apache.org/jira/browse/YARN-10428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273507#comment-17273507 ] Hadoop QA commented on YARN-10428: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 15s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 4s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 21s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 13s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 9s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/558/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-warnings.html{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager in trunk has 1 extant findbugs warnings. {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 18s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green}
[jira] [Commented] (YARN-10589) Improve logic of multi-node allocation
[ https://issues.apache.org/jira/browse/YARN-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273499#comment-17273499 ] Tanu Ajmera commented on YARN-10589: [~zhuqi] In the code block I attached, it will create a set of all those nodes of one partition and send for allocation. If the partition doesn't match in preCheckNodeCandidateSet, it should stop checking for all other nodes in that set. I'm just creating a break point so that it stops after one node. I have attached patch, please review and give comments > Improve logic of multi-node allocation > -- > > Key: YARN-10589 > URL: https://issues.apache.org/jira/browse/YARN-10589 > Project: Hadoop YARN > Issue Type: Task >Affects Versions: 3.3.0 >Reporter: Tanu Ajmera >Assignee: Tanu Ajmera >Priority: Major > Attachments: YARN-10589-001.patch > > > {code:java} > for (String partititon : partitions) { > if (current++ > start) { > break; > } > CandidateNodeSet candidates = > cs.getCandidateNodeSet(partititon); > if (candidates == null) { > continue; > } > cs.allocateContainersToNode(candidates, false); > }{code} > In above logic, if we have thousands of node in one partition, we will still > repeatedly access all nodes of the partition thousands of times. There is no > break point where if the partition is not same for the first node, it should > stop checking other nodes in that partition. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10555) missing access check before getAppAttempts
[ https://issues.apache.org/jira/browse/YARN-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273463#comment-17273463 ] lujie commented on YARN-10555: -- Shoud we merge this pull request if there is no more questions? > missing access check before getAppAttempts > --- > > Key: YARN-10555 > URL: https://issues.apache.org/jira/browse/YARN-10555 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: lujie >Assignee: lujie >Priority: Critical > Labels: pull-request-available, security > Attachments: YARN-10555_1.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > It seems that we miss a security check before getAppAttempts, see > [https://github.com/apache/hadoop/blob/513f1995adc9b73f9c7f4c7beb89725b51b313ac/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L1127] > thus we can get the some sensitive information, like logs link. > {code:java} > application_1609318368700_0002 belong to user2 > user1@hadoop11$ curl --negotiate -u : > http://hadoop11:8088/ws/v1/cluster/apps/application_1609318368700_0002/appattempts/|jq > { > "appAttempts": { > "appAttempt": [ > { > "id": 1, > "startTime": 1609318411566, > "containerId": "container_1609318368700_0002_01_01", > "nodeHttpAddress": "hadoop12:8044", > "nodeId": "hadoop12:36831", > "logsLink": > "http://hadoop12:8044/node/containerlogs/container_1609318368700_0002_01_01/user2;, > "blacklistedNodes": "", > "nodesBlacklistedBySystem": "" > } > ] > } > } > {code} > Other apis, like getApps and getApp, has access check like "hasAccess(app, > hsr)", they would hide the logs link if the appid do not belong to query > user, see > [https://github.com/apache/hadoop/blob/513f1995adc9b73f9c7f4c7beb89725b51b313ac/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L1098] > We need add hasAccess(app, hsr) for getAppAttempts. > > Besides, at > [https://github.com/apache/hadoop/blob/580a6a75a3e3d3b7918edeffd6e93fc211166884/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java#L145] > it seems that we have a access check in its caller, so now i pass "true" to > AppAttemptInfo in the patch. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9650) Set thread names for CapacityScheduler AsyncScheduleThread
[ https://issues.apache.org/jira/browse/YARN-9650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273452#comment-17273452 ] Amogh Rajesh Desai commented on YARN-9650: -- Hi [~bibinchundatt]! I think I can help with this ticket > Set thread names for CapacityScheduler AsyncScheduleThread > -- > > Key: YARN-9650 > URL: https://issues.apache.org/jira/browse/YARN-9650 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin Chundatt >Assignee: Amogh Rajesh Desai >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9650) Set thread names for CapacityScheduler AsyncScheduleThread
[ https://issues.apache.org/jira/browse/YARN-9650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amogh Rajesh Desai reassigned YARN-9650: Assignee: Amogh Rajesh Desai > Set thread names for CapacityScheduler AsyncScheduleThread > -- > > Key: YARN-9650 > URL: https://issues.apache.org/jira/browse/YARN-9650 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin Chundatt >Assignee: Amogh Rajesh Desai >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270581#comment-17270581 ] zhuqi edited comment on YARN-10178 at 1/28/21, 8:52 AM: cc [~wangda] [~bteke] [~BilwaST] [~tuyu] [~gandras] Since the bug and related YARN-8737 have raised for more than 2 months, i have faced the same issue recently. i have assigned to myself, and attached a patch to fix it. If you can help review it? And if we need to add some performance test cases to confirm the performance loss about the snapshot. Thanks. was (Author: zhuqi): cc [~wangda] [~bteke] [~BilwaST] [~tuyu] Since the bug and related YARN-8737 have raised for more than 2 months, i have faced the same issue recently. i have assigned to myself, and attached a patch to fix it. If you can help review it? And if we need to add some performance test cases to confirm the performance loss about the snapshot. Thanks. > Global Scheduler async thread crash caused by 'Comparison method violates its > general contract' > --- > > Key: YARN-10178 > URL: https://issues.apache.org/jira/browse/YARN-10178 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.2.1 >Reporter: tuyu >Assignee: zhuqi >Priority: Major > Attachments: YARN-10178.001.patch, YARN-10178.002.patch > > > Global Scheduler Async Thread crash stack > {code:java} > ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, > Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: > Comparison method violates its general contract! >at > java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1462) > at java.util.Collections.sort(Collections.java:177) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616) > {code} > JAVA 8 Arrays.sort default use timsort algo, and timsort has few require > {code:java} > 1.x.compareTo(y) != y.compareTo(x) > 2.x>y,y>z --> x > z > 3.x=y, x.compareTo(z) == y.compareTo(z) > {code} > if not Arrays paramters not satify this require,TimSort will throw > 'java.lang.IllegalArgumentException' > look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know > Capacity Scheduler use this these queue resource usage to compare > {code:java} > AbsoluteUsedCapacity > UsedCapacity > ConfiguredMinResource > AbsoluteCapacity > {code} > In Capacity Scheduler Global Scheduler AsyncThread use > PriorityUtilizationQueueOrderingPolicy function to choose queue to assign > container,and construct a CSAssignment struct, and use > submitResourceCommitRequest function add CSAssignment to backlogs > ResourceCommitterService will tryCommit this CSAssignment,look tryCommit > function,there will update queue resource usage > {code:java} > public boolean tryCommit(Resource cluster,
[jira] [Commented] (YARN-10428) Zombie applications in the YARN queue using FAIR + sizebasedweight
[ https://issues.apache.org/jira/browse/YARN-10428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273402#comment-17273402 ] Andras Gyori commented on YARN-10428: - Uploaded a new patch. I have fixed the test to set the cachedUsed and cachedPending. The magnitude is calculated from the cached value, that is why it was necessary to do so. Also, incorporated an additional check for weight. I agree, that the current demand calculation will never be zero, if the used resource is not zero, but better to be sure for future refactors (however unlikely it is). This was a nasty issue, and not even deterministic, because of how ConcurrentSkipList works. Sometimes no items were removed, an other time only 4 items could haven been removed. I think this issue affects a lot of users without them noticing it, until the queues are fully depleted. > Zombie applications in the YARN queue using FAIR + sizebasedweight > -- > > Key: YARN-10428 > URL: https://issues.apache.org/jira/browse/YARN-10428 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.8.5 >Reporter: Guang Yang >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10428.001.patch, YARN-10428.002.patch, > YARN-10428.003.patch > > > Seeing zombie jobs in the YARN queue that uses FAIR and size based weight > ordering policy . > *Detection:* > The YARN UI shows incorrect number of "Num Schedulable Applications". > *Impact:* > The queue has an upper limit of number of running applications, with zombie > job, it hits the limit even though the number of running applications is far > less than the limit. > *Workaround:* > **Fail-over and restart Resource Manager process. > *Analysis:* > **In the heap dump, we can find the zombie jobs in the `FairOderingPolicy# > schedulableEntities` (see attachment). Take application > "application_1599157165858_29429" for example, it is still in the > `FairOderingPolicy#schedulableEntities` set, however, if we check the log of > resource manager, we can see RM already tried to remove the application: > > ./yarn-yarn-resourcemanager-ip-172-21-153-252.log.2020-09-04-04:2020-09-04 > 04:32:19,730 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue > (ResourceManager Event Processor): Application removed - appId: > application_1599157165858_29429 user: svc_di_data_eng queue: core-data > #user-pending-applications: -3 #user-active-applications: 7 > #queue-pending-applications: 0 #queue-active-applications: 21 > > So it appears RM failed to removed the application from the set. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10428) Zombie applications in the YARN queue using FAIR + sizebasedweight
[ https://issues.apache.org/jira/browse/YARN-10428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Gyori updated YARN-10428: Attachment: YARN-10428.003.patch > Zombie applications in the YARN queue using FAIR + sizebasedweight > -- > > Key: YARN-10428 > URL: https://issues.apache.org/jira/browse/YARN-10428 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.8.5 >Reporter: Guang Yang >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10428.001.patch, YARN-10428.002.patch, > YARN-10428.003.patch > > > Seeing zombie jobs in the YARN queue that uses FAIR and size based weight > ordering policy . > *Detection:* > The YARN UI shows incorrect number of "Num Schedulable Applications". > *Impact:* > The queue has an upper limit of number of running applications, with zombie > job, it hits the limit even though the number of running applications is far > less than the limit. > *Workaround:* > **Fail-over and restart Resource Manager process. > *Analysis:* > **In the heap dump, we can find the zombie jobs in the `FairOderingPolicy# > schedulableEntities` (see attachment). Take application > "application_1599157165858_29429" for example, it is still in the > `FairOderingPolicy#schedulableEntities` set, however, if we check the log of > resource manager, we can see RM already tried to remove the application: > > ./yarn-yarn-resourcemanager-ip-172-21-153-252.log.2020-09-04-04:2020-09-04 > 04:32:19,730 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue > (ResourceManager Event Processor): Application removed - appId: > application_1599157165858_29429 user: svc_di_data_eng queue: core-data > #user-pending-applications: -3 #user-active-applications: 7 > #queue-pending-applications: 0 #queue-active-applications: 21 > > So it appears RM failed to removed the application from the set. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org