[jira] [Updated] (YARN-10623) Capacity scheduler should support refresh queue automatically by a thread policy.
[ https://issues.apache.org/jira/browse/YARN-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu updated YARN-10623: -- Description: In fair scheduler, it is supported that refresh queue related conf automatically by a thread to reload, but in capacity scheduler we only support to refresh queue related changes by refreshQueues, it is needed for our cluster to realize queue manage. cc [~wangda] [~ztang] [~pbacsko] [~snemeth] [~gandras] [~bteke] [~shuzirra] was: In fair scheduler, it is supported that refresh queue related conf automatically by a thread to reload, but in capacity scheduler we only support to refresh queue related changes by refreshQueues, it is needed for our cluster to realize queue manage. cc [~wangda] [~pbacsko] [~snemeth] [~gandras] [~bteke] [~shuzirra] > Capacity scheduler should support refresh queue automatically by a thread > policy. > - > > Key: YARN-10623 > URL: https://issues.apache.org/jira/browse/YARN-10623 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > > In fair scheduler, it is supported that refresh queue related conf > automatically by a thread to reload, but in capacity scheduler we only > support to refresh queue related changes by refreshQueues, it is needed for > our cluster to realize queue manage. > cc [~wangda] [~ztang] [~pbacsko] [~snemeth] [~gandras] [~bteke] [~shuzirra] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10623) Capacity scheduler should support refresh queue automatically by a thread policy.
[ https://issues.apache.org/jira/browse/YARN-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu updated YARN-10623: -- Summary: Capacity scheduler should support refresh queue automatically by a thread policy. (was: Capacity scheduler should support refresh queue automatically.) > Capacity scheduler should support refresh queue automatically by a thread > policy. > - > > Key: YARN-10623 > URL: https://issues.apache.org/jira/browse/YARN-10623 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > > In fair scheduler, it is supported that refresh queue related conf > automatically by a thread to reload, but in capacity scheduler we only > support to refresh queue related changes by refreshQueues, it is needed for > our cluster to realize queue manage. > cc [~wangda] [~pbacsko] [~snemeth] [~gandras] [~bteke] [~shuzirra] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10623) Capacity scheduler should support refresh queue automatically.
Qi Zhu created YARN-10623: - Summary: Capacity scheduler should support refresh queue automatically. Key: YARN-10623 URL: https://issues.apache.org/jira/browse/YARN-10623 Project: Hadoop YARN Issue Type: Improvement Components: capacity scheduler Reporter: Qi Zhu Assignee: Qi Zhu In fair scheduler, it is supported that refresh queue related conf automatically by a thread to reload, but in capacity scheduler we only support to refresh queue related changes by refreshQueues, it is needed for our cluster to realize queue manage. cc [~wangda] [~pbacsko] [~snemeth] [~gandras] [~bteke] [~shuzirra] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10588) Percentage of queue and cluster is zero in WebUI
[ https://issues.apache.org/jira/browse/YARN-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282737#comment-17282737 ] Eric Payne commented on YARN-10588: --- I see. Thanks [~BilwaST] for the explanation. After looking at the code and talking it over with [~Jim_Brennan], it does look like a better solution would be to modify {{DominantResourceCalculator#isInvalidDivisor}} so that its behavior matches the logic of {{DominantResourceCalculator#divide". > Percentage of queue and cluster is zero in WebUI > - > > Key: YARN-10588 > URL: https://issues.apache.org/jira/browse/YARN-10588 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-10588.001.patch, YARN-10588.002.patch, > YARN-10588.003.patch > > > Steps to reproduce: > Configure below property in resource-types.xml > {code:java} > > yarn.resource-types > yarn.io/gpu > {code} > Submit a job > In UI you can see % Of Queue and % Of Cluster is zero for the submitted > application > > This is because in SchedulerApplicationAttempt has below check for > calculating queueUsagePerc and clusterUsagePerc > {code:java} > if (!calc.isInvalidDivisor(cluster)) { > float queueCapacityPerc = queue.getQueueInfo(false, false) > .getCapacity(); > queueUsagePerc = calc.divide(cluster, usedResourceClone, > Resources.multiply(cluster, queueCapacityPerc)) * 100; > if (Float.isNaN(queueUsagePerc) || Float.isInfinite(queueUsagePerc)) { > queueUsagePerc = 0.0f; > } > clusterUsagePerc = > calc.divide(cluster, usedResourceClone, cluster) * 100; > } > {code} > calc.isInvalidDivisor(cluster) always returns true as gpu resource is 0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10500) TestDelegationTokenRenewer fails intermittently
[ https://issues.apache.org/jira/browse/YARN-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282670#comment-17282670 ] Jim Brennan commented on YARN-10500: Actually I missed noticing there were some check-style issues. [~iwasakims]. can you please fix those? And while you are at it, there is an unneeded {{throws Exception}} on {{testShutdown()}}. Can you remove that as well? > TestDelegationTokenRenewer fails intermittently > --- > > Key: YARN-10500 > URL: https://issues.apache.org/jira/browse/YARN-10500 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Akira Ajisaka >Assignee: Masatake Iwasaki >Priority: Major > Labels: flaky-test, pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > TestDelegationTokenRenewer sometimes timeouts. > https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/334/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt > {noformat} > [INFO] Running > org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer > [ERROR] Tests run: 23, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 83.675 s <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer > [ERROR] > testTokenThreadTimeout(org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer) > Time elapsed: 30.065 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 3 > milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:394) > at > org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer.testTokenThreadTimeout(TestDelegationTokenRenewer.java:1769) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10500) TestDelegationTokenRenewer fails intermittently
[ https://issues.apache.org/jira/browse/YARN-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282664#comment-17282664 ] Jim Brennan commented on YARN-10500: +1 Thanks for fixing this [~iwasakims]! I will commit shortly. > TestDelegationTokenRenewer fails intermittently > --- > > Key: YARN-10500 > URL: https://issues.apache.org/jira/browse/YARN-10500 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Akira Ajisaka >Assignee: Masatake Iwasaki >Priority: Major > Labels: flaky-test, pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > TestDelegationTokenRenewer sometimes timeouts. > https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/334/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt > {noformat} > [INFO] Running > org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer > [ERROR] Tests run: 23, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 83.675 s <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer > [ERROR] > testTokenThreadTimeout(org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer) > Time elapsed: 30.065 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 3 > milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:394) > at > org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer.testTokenThreadTimeout(TestDelegationTokenRenewer.java:1769) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10618) RM UI2 Application page shows the AM preempted containers instead of the nonAM ones
[ https://issues.apache.org/jira/browse/YARN-10618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282642#comment-17282642 ] Gergely Pollak commented on YARN-10618: --- [~bteke] thank you for the patch it is quite straightforward LGTM+1 (non-binding). > RM UI2 Application page shows the AM preempted containers instead of the > nonAM ones > --- > > Key: YARN-10618 > URL: https://issues.apache.org/jira/browse/YARN-10618 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Minor > Attachments: YARN-10618.001.patch > > > YARN RM UIv2 application page shows the AM preempted containers under both > the _Num Non-AM container preempted_ and _Num AM container preempted_. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM
[ https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282594#comment-17282594 ] Hadoop QA commented on YARN-9615: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 15s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 43s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 47s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 13s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 38s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 53s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 33s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 56s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 13s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 9s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 22s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 43s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 36s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 36s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 11s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 11s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 35s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/611/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 17 new + 53 unchanged - 0 fixed = 70 total (was 53) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 49s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:gr
[jira] [Assigned] (YARN-9927) RM multi-thread event processing mechanism
[ https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bilwa S T reassigned YARN-9927: --- Assignee: Bilwa S T > RM multi-thread event processing mechanism > -- > > Key: YARN-9927 > URL: https://issues.apache.org/jira/browse/YARN-9927 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.0.0, 2.9.2 >Reporter: hcarrot >Assignee: Bilwa S T >Priority: Major > Attachments: RM multi-thread event processing mechanism.pdf, > YARN-9927.001.patch > > > Recently, we have observed serious event blocking in RM event dispatcher > queue. After analysis of RM event monitoring data and RM event processing > logic, we found that > 1) environment: a cluster with thousands of nodes > 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler > 3) Meanwhile, RM event processing is in a single-thread mode, and It results > in the low headroom of RM event scheduler, thus performance of RM. > So we proposed a RM multi-thread event processing mechanism to improve RM > performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10622) Fix preemption policy to exclude childless ParentQueues
[ https://issues.apache.org/jira/browse/YARN-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282473#comment-17282473 ] Qi Zhu commented on YARN-10622: --- Thanks [~gandras] for good finding. > Fix preemption policy to exclude childless ParentQueues > --- > > Key: YARN-10622 > URL: https://issues.apache.org/jira/browse/YARN-10622 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > > ProportionalCapacityPreemptionPolicy selects the potential LeafQueues to be > preempted by this logic: > {code:java} > private Set getLeafQueueNames(TempQueuePerPartition q) { > // If its a ManagedParentQueue, it might not have any children > if ((q.children == null || q.children.isEmpty()) > && !(q.parentQueue instanceof ManagedParentQueue)) { > return ImmutableSet.of(q.queueName); > } > Set leafQueueNames = new HashSet<>(); > for (TempQueuePerPartition child : q.children) { > leafQueueNames.addAll(getLeafQueueNames(child)); > } > return leafQueueNames; > } > {code} > This, however does not take childless ParentQueues (which was introduced in > YARN-10596) into account. > A childless ParentQueue will throw a NPE in > FifoCandidatesSelector#selectCandidates: > {code:java} > LeafQueue leafQueue = preemptionContext.getQueueByPartition(queueName, > RMNodeLabelsManager.NO_LABEL).leafQueue; > {code} > TempQueuePerPartition has a leafQueue member variable, which is null, if the > queue is not a LeafQueue. In case of childless ParentQueue, it is null, but > its name is present in the leafQueueNames as stated before. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10546) Limit application resource reservation on nodes for non-node/rack specific requests shoud be supported in CS.
[ https://issues.apache.org/jira/browse/YARN-10546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282467#comment-17282467 ] Qi Zhu commented on YARN-10546: --- cc [~wangda] [~ztang] [~epayne] [~Jim_Brennan] [~ebadger] Could you take a look at this? Thanks. > Limit application resource reservation on nodes for non-node/rack specific > requests shoud be supported in CS. > - > > Key: YARN-10546 > URL: https://issues.apache.org/jira/browse/YARN-10546 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.3.0 >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Just as fixed in YARN-4270 about FairScheduler. > The capacityScheduler should also fixed it. > It is a big problem in production cluster, when it happended. > Also we should support fs convert to cs to support it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM
[ https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282458#comment-17282458 ] Qi Zhu commented on YARN-9615: -- [~jhung] [~bibinchundatt] I want to take it, now attached a patch for review, i will add test later. Thanks. > Add dispatcher metrics to RM > > > Key: YARN-9615 > URL: https://issues.apache.org/jira/browse/YARN-9615 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-9615.001.patch, YARN-9615.poc.patch, > screenshot-1.png > > > It'd be good to have counts/processing times for each event type in RM async > dispatcher and scheduler async dispatcher. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9615) Add dispatcher metrics to RM
[ https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu reassigned YARN-9615: Assignee: Qi Zhu (was: Jonathan Hung) > Add dispatcher metrics to RM > > > Key: YARN-9615 > URL: https://issues.apache.org/jira/browse/YARN-9615 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-9615.001.patch, YARN-9615.poc.patch, > screenshot-1.png > > > It'd be good to have counts/processing times for each event type in RM async > dispatcher and scheduler async dispatcher. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9615) Add dispatcher metrics to RM
[ https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu updated YARN-9615: - Attachment: (was: YARN-9618.001.patch) > Add dispatcher metrics to RM > > > Key: YARN-9615 > URL: https://issues.apache.org/jira/browse/YARN-9615 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9615.001.patch, YARN-9615.poc.patch, > screenshot-1.png > > > It'd be good to have counts/processing times for each event type in RM async > dispatcher and scheduler async dispatcher. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9615) Add dispatcher metrics to RM
[ https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu updated YARN-9615: - Attachment: YARN-9615.001.patch > Add dispatcher metrics to RM > > > Key: YARN-9615 > URL: https://issues.apache.org/jira/browse/YARN-9615 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9615.001.patch, YARN-9615.poc.patch, > screenshot-1.png > > > It'd be good to have counts/processing times for each event type in RM async > dispatcher and scheduler async dispatcher. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9615) Add dispatcher metrics to RM
[ https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu updated YARN-9615: - Attachment: YARN-9618.001.patch > Add dispatcher metrics to RM > > > Key: YARN-9615 > URL: https://issues.apache.org/jira/browse/YARN-9615 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9615.001.patch, YARN-9615.poc.patch, > screenshot-1.png > > > It'd be good to have counts/processing times for each event type in RM async > dispatcher and scheduler async dispatcher. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10622) Fix preemption policy to exclude childless ParentQueues
[ https://issues.apache.org/jira/browse/YARN-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Gyori updated YARN-10622: Summary: Fix preemption policy to exclude childless ParentQueues (was: Fix preemption policy to exclude childless ParentQueus) > Fix preemption policy to exclude childless ParentQueues > --- > > Key: YARN-10622 > URL: https://issues.apache.org/jira/browse/YARN-10622 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > > ProportionalCapacityPreemptionPolicy selects the potential LeafQueues to be > preempted by this logic: > {code:java} > private Set getLeafQueueNames(TempQueuePerPartition q) { > // If its a ManagedParentQueue, it might not have any children > if ((q.children == null || q.children.isEmpty()) > && !(q.parentQueue instanceof ManagedParentQueue)) { > return ImmutableSet.of(q.queueName); > } > Set leafQueueNames = new HashSet<>(); > for (TempQueuePerPartition child : q.children) { > leafQueueNames.addAll(getLeafQueueNames(child)); > } > return leafQueueNames; > } > {code} > This, however does not take childless ParentQueues (which was introduced in > YARN-10596) into account. > A childless ParentQueue will throw a NPE in > FifoCandidatesSelector#selectCandidates: > {code:java} > LeafQueue leafQueue = preemptionContext.getQueueByPartition(queueName, > RMNodeLabelsManager.NO_LABEL).leafQueue; > {code} > TempQueuePerPartition has a leafQueue member variable, which is null, if the > queue is not a LeafQueue. In case of childless ParentQueue, it is null, but > its name is present in the leafQueueNames as stated before. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10622) Fix preemption policy to exclude childless ParentQueus
Andras Gyori created YARN-10622: --- Summary: Fix preemption policy to exclude childless ParentQueus Key: YARN-10622 URL: https://issues.apache.org/jira/browse/YARN-10622 Project: Hadoop YARN Issue Type: Sub-task Reporter: Andras Gyori Assignee: Andras Gyori ProportionalCapacityPreemptionPolicy selects the potential LeafQueues to be preempted by this logic: {code:java} private Set getLeafQueueNames(TempQueuePerPartition q) { // If its a ManagedParentQueue, it might not have any children if ((q.children == null || q.children.isEmpty()) && !(q.parentQueue instanceof ManagedParentQueue)) { return ImmutableSet.of(q.queueName); } Set leafQueueNames = new HashSet<>(); for (TempQueuePerPartition child : q.children) { leafQueueNames.addAll(getLeafQueueNames(child)); } return leafQueueNames; } {code} This, however does not take childless ParentQueues (which was introduced in YARN-10596) into account. A childless ParentQueue will throw a NPE in FifoCandidatesSelector#selectCandidates: {code:java} LeafQueue leafQueue = preemptionContext.getQueueByPartition(queueName, RMNodeLabelsManager.NO_LABEL).leafQueue; {code} TempQueuePerPartition has a leafQueue member variable, which is null, if the queue is not a LeafQueue. In case of childless ParentQueue, it is null, but its name is present in the leafQueueNames as stated before. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10620) fs2cs: parentQueue for certain placement rules are not set during conversion
[ https://issues.apache.org/jira/browse/YARN-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282426#comment-17282426 ] Hadoop QA commented on YARN-10620: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 22m 12s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 12s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 42s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 50s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 47s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 52s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green}{color} | {color
[jira] [Commented] (YARN-10593) Fix incorrect string comparison in GpuDiscoverer
[ https://issues.apache.org/jira/browse/YARN-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282424#comment-17282424 ] Szilard Nemeth commented on YARN-10593: --- Thanks [~pbacsko] for working on this. Patch LGTM, committed to trunk. Thanks [~zhuqi] for the review. > Fix incorrect string comparison in GpuDiscoverer > > > Key: YARN-10593 > URL: https://issues.apache.org/jira/browse/YARN-10593 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10593-001.patch > > > The following comparison in {{GpuDiscoverer}} is invalid: > {noformat} > binaryPath = configuredBinaryFile; > // If path exists but file name is incorrect don't execute the file > String fileName = binaryPath.getName(); > if (DEFAULT_BINARY_NAME.equals(fileName)) { <--- inverse condition > needed > String msg = String.format("Please check the configuration value of" > +" %s. It should point to an %s binary.", > YarnConfiguration.NM_GPU_PATH_TO_EXEC, > DEFAULT_BINARY_NAME); > throwIfNecessary(new YarnException(msg), config); > LOG.warn(msg); > }{noformat} > Obviously it should be other way around - we should log a warning or throw an > exception if the file names *differ*, not when they're equal. > Consider adding a unit test for this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10593) Fix incorrect string comparison in GpuDiscoverer
[ https://issues.apache.org/jira/browse/YARN-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10593: -- Fix Version/s: 3.4.0 > Fix incorrect string comparison in GpuDiscoverer > > > Key: YARN-10593 > URL: https://issues.apache.org/jira/browse/YARN-10593 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10593-001.patch > > > The following comparison in {{GpuDiscoverer}} is invalid: > {noformat} > binaryPath = configuredBinaryFile; > // If path exists but file name is incorrect don't execute the file > String fileName = binaryPath.getName(); > if (DEFAULT_BINARY_NAME.equals(fileName)) { <--- inverse condition > needed > String msg = String.format("Please check the configuration value of" > +" %s. It should point to an %s binary.", > YarnConfiguration.NM_GPU_PATH_TO_EXEC, > DEFAULT_BINARY_NAME); > throwIfNecessary(new YarnException(msg), config); > LOG.warn(msg); > }{noformat} > Obviously it should be other way around - we should log a warning or throw an > exception if the file names *differ*, not when they're equal. > Consider adding a unit test for this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10620) fs2cs: parentQueue for certain placement rules are not set during conversion
[ https://issues.apache.org/jira/browse/YARN-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282357#comment-17282357 ] Szilard Nemeth commented on YARN-10620: --- Hi [~pbacsko], Thanks for working on this. Latest patch LGTM, just committed to trunk. Thanks [~gandras] for the review. > fs2cs: parentQueue for certain placement rules are not set during conversion > > > Key: YARN-10620 > URL: https://issues.apache.org/jira/browse/YARN-10620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: fs2cs > Fix For: 3.4.0 > > Attachments: YARN-10620-001.patch, YARN-10620-002.patch > > > There are some placement rules in FS which are currently not handled properly > by fs2cs: > {noformat} > > > > > {noformat} > The first rule means that if the user queue doesn't exist, it should be > created as {{root.}}. > The second means the same thing, except refers to the primary group instead > of the submitting user: {{root.}}. > The problem is that in order for the create="true" setting to take effect, we > must set the parent queue in the generated JSON: > Current: > {noformat} > { > "rules" : [ { > "type" : "user", > "matches" : "*", > "policy" : "user", > "fallbackResult" : "skip", > "create" : true > }, { > "type" : "user", > "matches" : "*", > "policy" : "primaryGroup", > "fallbackResult" : "skip", > "create" : true > } ] > } > {noformat} > Expected: > {noformat} > { > "rules" : [ { > "type" : "user", > "matches" : "*", > "policy" : "user", > "fallbackResult" : "skip", > "parentQueue": "root", > "create" : true > }, { > "type" : "user", > "matches" : "*", > "policy" : "primaryGroup", > "fallbackResult" : "skip", > "parentQueue": "root", > "create" : true > } ] > {noformat} > This is missing right now and it need to be fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10620) fs2cs: parentQueue for certain placement rules are not set during conversion
[ https://issues.apache.org/jira/browse/YARN-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10620: -- Fix Version/s: 3.4.0 > fs2cs: parentQueue for certain placement rules are not set during conversion > > > Key: YARN-10620 > URL: https://issues.apache.org/jira/browse/YARN-10620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: fs2cs > Fix For: 3.4.0 > > Attachments: YARN-10620-001.patch, YARN-10620-002.patch > > > There are some placement rules in FS which are currently not handled properly > by fs2cs: > {noformat} > > > > > {noformat} > The first rule means that if the user queue doesn't exist, it should be > created as {{root.}}. > The second means the same thing, except refers to the primary group instead > of the submitting user: {{root.}}. > The problem is that in order for the create="true" setting to take effect, we > must set the parent queue in the generated JSON: > Current: > {noformat} > { > "rules" : [ { > "type" : "user", > "matches" : "*", > "policy" : "user", > "fallbackResult" : "skip", > "create" : true > }, { > "type" : "user", > "matches" : "*", > "policy" : "primaryGroup", > "fallbackResult" : "skip", > "create" : true > } ] > } > {noformat} > Expected: > {noformat} > { > "rules" : [ { > "type" : "user", > "matches" : "*", > "policy" : "user", > "fallbackResult" : "skip", > "parentQueue": "root", > "create" : true > }, { > "type" : "user", > "matches" : "*", > "policy" : "primaryGroup", > "fallbackResult" : "skip", > "parentQueue": "root", > "create" : true > } ] > {noformat} > This is missing right now and it need to be fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10620) fs2cs: parentQueue for certain placement rules are not set during conversion
[ https://issues.apache.org/jira/browse/YARN-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-10620: Description: There are some placement rules in FS which are currently not handled properly by fs2cs: {noformat} {noformat} The first rule means that if the user queue doesn't exist, it should be created as {{root.}}. The second means the same thing, except refers to the primary group instead of the submitting user: {{root.}}. The problem is that in order for the create="true" setting to take effect, we must set the parent queue in the generated JSON: Current: {noformat} { "rules" : [ { "type" : "user", "matches" : "*", "policy" : "user", "fallbackResult" : "skip", "create" : true }, { "type" : "user", "matches" : "*", "policy" : "primaryGroup", "fallbackResult" : "skip", "create" : true } ] } {noformat} Expected: {noformat} { "rules" : [ { "type" : "user", "matches" : "*", "policy" : "user", "fallbackResult" : "skip", "parentQueue": "root", "create" : true }, { "type" : "user", "matches" : "*", "policy" : "primaryGroup", "fallbackResult" : "skip", "parentQueue": "root", "create" : true } ] {noformat} This is missing right now and it need to be fixed. was: There are some placement rules in FS which are currently not handled properly by fs2cs: {noformat} {noformat} The first rule means that if the user queue doesn't exist, it should be created as {{root.}}. The second means the same thing, except refers to the primary group instead of the submitting user: {{root.}}. The problem is that in order for the create="true" setting to take effect, we must set the parent queue in the generated JSON: {noformat} { "rules" : [ { "type" : "user", "matches" : "*", "policy" : "user", "fallbackResult" : "skip", "create" : true }, { "type" : "user", "matches" : "*", "policy" : "primaryGroup", "fallbackResult" : "skip", "create" : true } ] } {noformat} This is missing right now and it need to be fixed. > fs2cs: parentQueue for certain placement rules are not set during conversion > > > Key: YARN-10620 > URL: https://issues.apache.org/jira/browse/YARN-10620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: fs2cs > Attachments: YARN-10620-001.patch, YARN-10620-002.patch > > > There are some placement rules in FS which are currently not handled properly > by fs2cs: > {noformat} > > > > > {noformat} > The first rule means that if the user queue doesn't exist, it should be > created as {{root.}}. > The second means the same thing, except refers to the primary group instead > of the submitting user: {{root.}}. > The problem is that in order for the create="true" setting to take effect, we > must set the parent queue in the generated JSON: > Current: > {noformat} > { > "rules" : [ { > "type" : "user", > "matches" : "*", > "policy" : "user", > "fallbackResult" : "skip", > "create" : true > }, { > "type" : "user", > "matches" : "*", > "policy" : "primaryGroup", > "fallbackResult" : "skip", > "create" : true > } ] > } > {noformat} > Expected: > {noformat} > { > "rules" : [ { > "type" : "user", > "matches" : "*", > "policy" : "user", > "fallbackResult" : "skip", > "parentQueue": "root", > "create" : true > }, { > "type" : "user", > "matches" : "*", > "policy" : "primaryGroup", > "fallbackResult" : "skip", > "parentQueue": "root", > "create" : true > } ] > {noformat} > This is missing right now and it need to be fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10588) Percentage of queue and cluster is zero in WebUI
[ https://issues.apache.org/jira/browse/YARN-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282328#comment-17282328 ] Bilwa S T commented on YARN-10588: -- Hi [~epayne] I added change in FicaSchedulerApp.java as same issue can occur ie cluster and queue resource will not be calculated if one of the resource is zero. I added instanceOf check because that method is applicable only for capacityscheduler . Many testcases were failing once i removed DominantResourceCalculator.isInvalidDivisor() check as testcases had configured Fifoscheduler. > Percentage of queue and cluster is zero in WebUI > - > > Key: YARN-10588 > URL: https://issues.apache.org/jira/browse/YARN-10588 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-10588.001.patch, YARN-10588.002.patch, > YARN-10588.003.patch > > > Steps to reproduce: > Configure below property in resource-types.xml > {code:java} > > yarn.resource-types > yarn.io/gpu > {code} > Submit a job > In UI you can see % Of Queue and % Of Cluster is zero for the submitted > application > > This is because in SchedulerApplicationAttempt has below check for > calculating queueUsagePerc and clusterUsagePerc > {code:java} > if (!calc.isInvalidDivisor(cluster)) { > float queueCapacityPerc = queue.getQueueInfo(false, false) > .getCapacity(); > queueUsagePerc = calc.divide(cluster, usedResourceClone, > Resources.multiply(cluster, queueCapacityPerc)) * 100; > if (Float.isNaN(queueUsagePerc) || Float.isInfinite(queueUsagePerc)) { > queueUsagePerc = 0.0f; > } > clusterUsagePerc = > calc.divide(cluster, usedResourceClone, cluster) * 100; > } > {code} > calc.isInvalidDivisor(cluster) always returns true as gpu resource is 0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10588) Percentage of queue and cluster is zero in WebUI
[ https://issues.apache.org/jira/browse/YARN-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282328#comment-17282328 ] Bilwa S T edited comment on YARN-10588 at 2/10/21, 9:19 AM: Thanks [~epayne] [~Jim_Brennan] for taking a look at this issue. I added change in FicaSchedulerApp.java as same issue can occur ie cluster and queue resource will not be calculated if one of the resource is zero. I added instanceOf check because that method is applicable only for capacityscheduler . Many testcases were failing once i removed DominantResourceCalculator.isInvalidDivisor() check as testcases had configured Fifoscheduler. was (Author: bilwast): Hi [~epayne] I added change in FicaSchedulerApp.java as same issue can occur ie cluster and queue resource will not be calculated if one of the resource is zero. I added instanceOf check because that method is applicable only for capacityscheduler . Many testcases were failing once i removed DominantResourceCalculator.isInvalidDivisor() check as testcases had configured Fifoscheduler. > Percentage of queue and cluster is zero in WebUI > - > > Key: YARN-10588 > URL: https://issues.apache.org/jira/browse/YARN-10588 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-10588.001.patch, YARN-10588.002.patch, > YARN-10588.003.patch > > > Steps to reproduce: > Configure below property in resource-types.xml > {code:java} > > yarn.resource-types > yarn.io/gpu > {code} > Submit a job > In UI you can see % Of Queue and % Of Cluster is zero for the submitted > application > > This is because in SchedulerApplicationAttempt has below check for > calculating queueUsagePerc and clusterUsagePerc > {code:java} > if (!calc.isInvalidDivisor(cluster)) { > float queueCapacityPerc = queue.getQueueInfo(false, false) > .getCapacity(); > queueUsagePerc = calc.divide(cluster, usedResourceClone, > Resources.multiply(cluster, queueCapacityPerc)) * 100; > if (Float.isNaN(queueUsagePerc) || Float.isInfinite(queueUsagePerc)) { > queueUsagePerc = 0.0f; > } > clusterUsagePerc = > calc.divide(cluster, usedResourceClone, cluster) * 100; > } > {code} > calc.isInvalidDivisor(cluster) always returns true as gpu resource is 0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10620) fs2cs: parentQueue for certain placement rules are not set during conversion
[ https://issues.apache.org/jira/browse/YARN-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282323#comment-17282323 ] Andras Gyori commented on YARN-10620: - [~pbacsko] I agree with your solution, the set is more appropriate and efficient here +1. > fs2cs: parentQueue for certain placement rules are not set during conversion > > > Key: YARN-10620 > URL: https://issues.apache.org/jira/browse/YARN-10620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: fs2cs > Attachments: YARN-10620-001.patch, YARN-10620-002.patch > > > There are some placement rules in FS which are currently not handled properly > by fs2cs: > {noformat} > > > > > {noformat} > The first rule means that if the user queue doesn't exist, it should be > created as {{root.}}. > The second means the same thing, except refers to the primary group instead > of the submitting user: {{root.}}. > The problem is that in order for the create="true" setting to take effect, we > must set the parent queue in the generated JSON: > {noformat} > { > "rules" : [ { > "type" : "user", > "matches" : "*", > "policy" : "user", > "fallbackResult" : "skip", > "create" : true > }, { > "type" : "user", > "matches" : "*", > "policy" : "primaryGroup", > "fallbackResult" : "skip", > "create" : true > } ] > } > {noformat} > This is missing right now and it need to be fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10620) fs2cs: parentQueue for certain placement rules are not set during conversion
[ https://issues.apache.org/jira/browse/YARN-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282322#comment-17282322 ] Peter Bacsko commented on YARN-10620: - [~gandras] thanks, I modified the code a little bit, used a set instead of an array. > fs2cs: parentQueue for certain placement rules are not set during conversion > > > Key: YARN-10620 > URL: https://issues.apache.org/jira/browse/YARN-10620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: fs2cs > Attachments: YARN-10620-001.patch, YARN-10620-002.patch > > > There are some placement rules in FS which are currently not handled properly > by fs2cs: > {noformat} > > > > > {noformat} > The first rule means that if the user queue doesn't exist, it should be > created as {{root.}}. > The second means the same thing, except refers to the primary group instead > of the submitting user: {{root.}}. > The problem is that in order for the create="true" setting to take effect, we > must set the parent queue in the generated JSON: > {noformat} > { > "rules" : [ { > "type" : "user", > "matches" : "*", > "policy" : "user", > "fallbackResult" : "skip", > "create" : true > }, { > "type" : "user", > "matches" : "*", > "policy" : "primaryGroup", > "fallbackResult" : "skip", > "create" : true > } ] > } > {noformat} > This is missing right now and it need to be fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10620) fs2cs: parentQueue for certain placement rules are not set during conversion
[ https://issues.apache.org/jira/browse/YARN-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-10620: Attachment: YARN-10620-002.patch > fs2cs: parentQueue for certain placement rules are not set during conversion > > > Key: YARN-10620 > URL: https://issues.apache.org/jira/browse/YARN-10620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: fs2cs > Attachments: YARN-10620-001.patch, YARN-10620-002.patch > > > There are some placement rules in FS which are currently not handled properly > by fs2cs: > {noformat} > > > > > {noformat} > The first rule means that if the user queue doesn't exist, it should be > created as {{root.}}. > The second means the same thing, except refers to the primary group instead > of the submitting user: {{root.}}. > The problem is that in order for the create="true" setting to take effect, we > must set the parent queue in the generated JSON: > {noformat} > { > "rules" : [ { > "type" : "user", > "matches" : "*", > "policy" : "user", > "fallbackResult" : "skip", > "create" : true > }, { > "type" : "user", > "matches" : "*", > "policy" : "primaryGroup", > "fallbackResult" : "skip", > "create" : true > } ] > } > {noformat} > This is missing right now and it need to be fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10620) fs2cs: parentQueue for certain placement rules are not set during conversion
[ https://issues.apache.org/jira/browse/YARN-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282314#comment-17282314 ] Andras Gyori commented on YARN-10620: - Thank you [~pbacsko] for the patch. It looks good to me, I have one minor addition to this: * Checking the policy would be more readable, if the policies were stored in a constant array, and checked, if the policy is contained in this array. This would reduce the expression to !usePercentages && ArrayUtils.contains(policies, policy). > fs2cs: parentQueue for certain placement rules are not set during conversion > > > Key: YARN-10620 > URL: https://issues.apache.org/jira/browse/YARN-10620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: fs2cs > Attachments: YARN-10620-001.patch > > > There are some placement rules in FS which are currently not handled properly > by fs2cs: > {noformat} > > > > > {noformat} > The first rule means that if the user queue doesn't exist, it should be > created as {{root.}}. > The second means the same thing, except refers to the primary group instead > of the submitting user: {{root.}}. > The problem is that in order for the create="true" setting to take effect, we > must set the parent queue in the generated JSON: > {noformat} > { > "rules" : [ { > "type" : "user", > "matches" : "*", > "policy" : "user", > "fallbackResult" : "skip", > "create" : true > }, { > "type" : "user", > "matches" : "*", > "policy" : "primaryGroup", > "fallbackResult" : "skip", > "create" : true > } ] > } > {noformat} > This is missing right now and it need to be fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org