[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309179#comment-17309179 ] Hadoop QA commented on YARN-10503: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 39s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 28s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 49s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 36s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 21m 28s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 2m 15s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 5s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green}{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 23s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green}{color} | {color:green} the patch passed with JDK
[jira] [Commented] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package
[ https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309130#comment-17309130 ] Qi Zhu commented on YARN-10519: --- [~minni31] [~bibinchundatt] Could you help back-port to branch-3-2? Thanks. > Refactor QueueMetricsForCustomResources class to move to yarn-common package > > > Key: YARN-10519 > URL: https://issues.apache.org/jira/browse/YARN-10519 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, > YARN-10519.v3.patch, YARN-10519.v4.patch, YARN-10519.v5.patch, > YARN-10519.v6.patch, YARN-10519.v7.patch > > > Refactor the code for QueueMetricsForCustomResources to move the base classes > to yarn-common package. This helps in reusing the class in adding custom > resource types at NM level also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4754) Too many connection opened to TimelineServer while publishing entities
[ https://issues.apache.org/jira/browse/YARN-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309129#comment-17309129 ] Kevin wang commented on YARN-4754: -- [~rohithsharma]If you have Kerberos authentication enabled ,That's what's causing the problem,Let's see if the user has permission,[~Ying Zhang]Your code is correct > Too many connection opened to TimelineServer while publishing entities > -- > > Key: YARN-4754 > URL: https://issues.apache.org/jira/browse/YARN-4754 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Priority: Critical > Attachments: ConnectionLeak.rar > > > It is observed that there are too many connections are kept opened to > TimelineServer while publishing entities via SystemMetricsPublisher. This > cause sometimes resource shortage for other process or RM itself > {noformat} > tcp0 0 10.18.99.110:3999 10.18.214.60:59265 > ESTABLISHED 115302/java > tcp0 0 10.18.99.110:25001 :::*LISTEN > 115302/java > tcp0 0 10.18.99.110:25002 :::*LISTEN > 115302/java > tcp0 0 10.18.99.110:25003 :::*LISTEN > 115302/java > tcp0 0 10.18.99.110:25004 :::*LISTEN > 115302/java > tcp0 0 10.18.99.110:25005 :::*LISTEN > 115302/java > tcp1 0 10.18.99.110:48866 10.18.99.110:8188 > CLOSE_WAIT 115302/java > tcp1 0 10.18.99.110:48137 10.18.99.110:8188 > CLOSE_WAIT 115302/java > tcp1 0 10.18.99.110:47553 10.18.99.110:8188 > CLOSE_WAIT 115302/java > tcp1 0 10.18.99.110:48424 10.18.99.110:8188 > CLOSE_WAIT 115302/java > tcp1 0 10.18.99.110:48139 10.18.99.110:8188 > CLOSE_WAIT 115302/java > tcp1 0 10.18.99.110:48096 10.18.99.110:8188 > CLOSE_WAIT 115302/java > tcp1 0 10.18.99.110:47558 10.18.99.110:8188 > CLOSE_WAIT 115302/java > tcp1 0 10.18.99.110:49270 10.18.99.110:8188 > CLOSE_WAIT 115302/java > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10713) ClusterMetrics should support custom resource capacity related metrics.
[ https://issues.apache.org/jira/browse/YARN-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309106#comment-17309106 ] Qi Zhu edited comment on YARN-10713 at 3/26/21, 2:46 AM: - Thanks a lot [~ebadger] for review and confirm. After i investigate the code. We should backport YARN-10519 to 3.2 first, then this one can backport to 3.2. Thanks. was (Author: zhuqi): Thanks a lot [~ebadger] for review and confirm. I will backport to branch-3.2 later. > ClusterMetrics should support custom resource capacity related metrics. > --- > > Key: YARN-10713 > URL: https://issues.apache.org/jira/browse/YARN-10713 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-10713.001.patch, YARN-10713.002.patch > > > YARN-10688 > Only add gpu resource capacity related metrics, i think we should improve it > to support custom resources as [~ebadger] suggested. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10713) ClusterMetrics should support custom resource capacity related metrics.
[ https://issues.apache.org/jira/browse/YARN-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309106#comment-17309106 ] Qi Zhu commented on YARN-10713: --- Thanks a lot [~ebadger] for review and confirm. I will backport to branch-3.2 later. > ClusterMetrics should support custom resource capacity related metrics. > --- > > Key: YARN-10713 > URL: https://issues.apache.org/jira/browse/YARN-10713 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-10713.001.patch, YARN-10713.002.patch > > > YARN-10688 > Only add gpu resource capacity related metrics, i think we should improve it > to support custom resources as [~ebadger] suggested. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10705) Misleading DEBUG log for container assignment needs to be removed when the container is actually reserved, not assigned in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-10705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309104#comment-17309104 ] Siddharth Ahuja commented on YARN-10705: Have already justified in my previous comment regarding no junits needed and also provided manual steps for testing in the single node cluster. +cc [~wilfreds]. > Misleading DEBUG log for container assignment needs to be removed when the > container is actually reserved, not assigned in FairScheduler > > > Key: YARN-10705 > URL: https://issues.apache.org/jira/browse/YARN-10705 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.4.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja >Priority: Minor > Attachments: YARN-10705.001.patch > > > Following DEBUG logs are logged if a container reservation is made when a > node has been offered to the queue in FairScheduler: > {code} > 2021-02-10 07:33:55,049 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > application_1610442362681_2607's resource request is reserved. > 2021-02-10 07:33:55,049 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: > Assigned container in queue:root.pj_dc_pe container: > {code} > The latter log from above seems to indicate a bad container assignment with > resource allocation, whereas, in actual, it is a bad > log which shouldn't have been logged in the first place. > This log comes from [1] after an application attempt with an unmet demand is > checked for container assignment/reservation. > If the container for this app attempt is reserved on the node, then, it > returns from [2]. > From [3]: > {quote} >* If an assignment was made, returns the resources allocated to the >* container. If a reservation was made, returns >* FairScheduler.CONTAINER_RESERVED. If no assignment or reservation > was >* made, returns an empty resource. > {quote} > We are checking for the empty resource at [4], but not > FairScheduler.CONTAINER_RESERVED before logging out a message for container > assignment specifically which is incorrect. > Instead of: > {code} > if (!assigned.equals(none())) { > LOG.debug("Assigned container in queue:{} container:{}", > getName(), assigned); > break; > } > {code} > it should be: > {code} > // check if an assignment or a reservation was made. > if (!assigned.equals(none())) { > // only log container assignment if there is > // an actual assignment, not a reservation. > if (!assigned.equals(FairScheduler.CONTAINER_RESERVED) > && LOG.isDebugEnabled()) { > LOG.debug("Assigned container in queue:" + getName() + " " + > "container:" + assigned); > } > break; > } > {code} > [1] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java#L356 > [2] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L911 > [3] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L842 > [4] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java#L355 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10517) QueueMetrics has incorrect Allocated Resource when labelled partitions updated
[ https://issues.apache.org/jira/browse/YARN-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309084#comment-17309084 ] Qi Zhu edited comment on YARN-10517 at 3/26/21, 2:33 AM: - Thanks [~epayne] for reply. Actually i not see it in the UI, i just see the jmx partition queue metrics. And the unit test handle the partition metrics update, i am not sure the UI is related to fix. [~sibyl.lv] Could you help reproduce the UI problem? was (Author: zhuqi): Thanks [~epayne] for reply. Actually i not see it in the UI, i just see the jmx partition queue metrics. [~sibyl.lv] Could you help reproduce the UI problem? > QueueMetrics has incorrect Allocated Resource when labelled partitions updated > -- > > Key: YARN-10517 > URL: https://issues.apache.org/jira/browse/YARN-10517 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0, 3.3.0 >Reporter: sibyl.lv >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10517-branch-3.2.001.patch, YARN-10517.001.patch, > wrong metrics.png > > > After https://issues.apache.org/jira/browse/YARN-9596, QueueMetrics still has > incorrect allocated jmx, such as {color:#660e7a}allocatedMB, > {color}{color:#660e7a}allocatedVCores and > {color}{color:#660e7a}allocatedContainers, {color}when the node partition is > updated from "DEFAULT" to other label and there are running applications. > Steps to reproduce > == > # Configure capacity-scheduler.xml with label configuration > # Submit one application to default partition and run > # Add label "tpcds" to cluster and replace label on node1 and node2 to be > "tpcds" when the above application is running > # Note down "VCores Used" at Web UI > # When the application is finished, the metrics get wrong (screenshots > attached). > == > > FiCaSchedulerApp doesn't update queue metrics when CapacityScheduler handles > this event {color:#660e7a}NODE_LABELS_UPDATE.{color} > So we should release container resource from old partition and add used > resource to new partition, just as updating queueUsage. > {code:java} > // code placeholder > public void nodePartitionUpdated(RMContainer rmContainer, String oldPartition, > String newPartition) { > Resource containerResource = rmContainer.getAllocatedResource(); > this.attemptResourceUsage.decUsed(oldPartition, containerResource); > this.attemptResourceUsage.incUsed(newPartition, containerResource); > getCSLeafQueue().decUsedResource(oldPartition, containerResource, this); > getCSLeafQueue().incUsedResource(newPartition, containerResource, this); > // Update new partition name if container is AM and also update AM resource > if (rmContainer.isAMContainer()) { > setAppAMNodePartitionName(newPartition); > this.attemptResourceUsage.decAMUsed(oldPartition, containerResource); > this.attemptResourceUsage.incAMUsed(newPartition, containerResource); > getCSLeafQueue().decAMUsedResource(oldPartition, containerResource, this); > getCSLeafQueue().incAMUsedResource(newPartition, containerResource, this); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10705) Misleading DEBUG log for container assignment needs to be removed when the container is actually reserved, not assigned in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-10705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309102#comment-17309102 ] Hadoop QA commented on YARN-10705: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 24m 27s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 33s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 7s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m 14s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 1m 50s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 55s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | |
[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309099#comment-17309099 ] Qi Zhu commented on YARN-10503: --- Thanks a lot [~ebadger] for review. It's a very good suggestion to me, i am appreciate that the change will make our code better. Updated it in latest patch.:D > Support queue capacity in terms of absolute resources with custom > resourceType. > --- > > Key: YARN-10503 > URL: https://issues.apache.org/jira/browse/YARN-10503 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-10503.001.patch, YARN-10503.002.patch, > YARN-10503.003.patch, YARN-10503.004.patch, YARN-10503.005.patch, > YARN-10503.006.patch, YARN-10503.007.patch > > > Now the absolute resources are memory and cores. > {code:java} > /** > * Different resource types supported. > */ > public enum AbsoluteResourceType { > MEMORY, VCORES; > }{code} > But in our GPU production clusters, we need to support more resourceTypes. > It's very import for cluster scaling when with different resourceType > absolute demands. > > This Jira will handle GPU first. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu updated YARN-10503: -- Attachment: YARN-10503.007.patch > Support queue capacity in terms of absolute resources with custom > resourceType. > --- > > Key: YARN-10503 > URL: https://issues.apache.org/jira/browse/YARN-10503 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-10503.001.patch, YARN-10503.002.patch, > YARN-10503.003.patch, YARN-10503.004.patch, YARN-10503.005.patch, > YARN-10503.006.patch, YARN-10503.007.patch > > > Now the absolute resources are memory and cores. > {code:java} > /** > * Different resource types supported. > */ > public enum AbsoluteResourceType { > MEMORY, VCORES; > }{code} > But in our GPU production clusters, we need to support more resourceTypes. > It's very import for cluster scaling when with different resourceType > absolute demands. > > This Jira will handle GPU first. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10517) QueueMetrics has incorrect Allocated Resource when labelled partitions updated
[ https://issues.apache.org/jira/browse/YARN-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309084#comment-17309084 ] Qi Zhu commented on YARN-10517: --- Thanks [~epayne] for reply. Actually i not see it in the UI, i just see the jmx queue metrics. [~sibyl.lv] Could you help reproduce the UI problem? > QueueMetrics has incorrect Allocated Resource when labelled partitions updated > -- > > Key: YARN-10517 > URL: https://issues.apache.org/jira/browse/YARN-10517 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0, 3.3.0 >Reporter: sibyl.lv >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10517-branch-3.2.001.patch, YARN-10517.001.patch, > wrong metrics.png > > > After https://issues.apache.org/jira/browse/YARN-9596, QueueMetrics still has > incorrect allocated jmx, such as {color:#660e7a}allocatedMB, > {color}{color:#660e7a}allocatedVCores and > {color}{color:#660e7a}allocatedContainers, {color}when the node partition is > updated from "DEFAULT" to other label and there are running applications. > Steps to reproduce > == > # Configure capacity-scheduler.xml with label configuration > # Submit one application to default partition and run > # Add label "tpcds" to cluster and replace label on node1 and node2 to be > "tpcds" when the above application is running > # Note down "VCores Used" at Web UI > # When the application is finished, the metrics get wrong (screenshots > attached). > == > > FiCaSchedulerApp doesn't update queue metrics when CapacityScheduler handles > this event {color:#660e7a}NODE_LABELS_UPDATE.{color} > So we should release container resource from old partition and add used > resource to new partition, just as updating queueUsage. > {code:java} > // code placeholder > public void nodePartitionUpdated(RMContainer rmContainer, String oldPartition, > String newPartition) { > Resource containerResource = rmContainer.getAllocatedResource(); > this.attemptResourceUsage.decUsed(oldPartition, containerResource); > this.attemptResourceUsage.incUsed(newPartition, containerResource); > getCSLeafQueue().decUsedResource(oldPartition, containerResource, this); > getCSLeafQueue().incUsedResource(newPartition, containerResource, this); > // Update new partition name if container is AM and also update AM resource > if (rmContainer.isAMContainer()) { > setAppAMNodePartitionName(newPartition); > this.attemptResourceUsage.decAMUsed(oldPartition, containerResource); > this.attemptResourceUsage.incAMUsed(newPartition, containerResource); > getCSLeafQueue().decAMUsedResource(oldPartition, containerResource, this); > getCSLeafQueue().incAMUsedResource(newPartition, containerResource, this); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10517) QueueMetrics has incorrect Allocated Resource when labelled partitions updated
[ https://issues.apache.org/jira/browse/YARN-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309084#comment-17309084 ] Qi Zhu edited comment on YARN-10517 at 3/26/21, 2:12 AM: - Thanks [~epayne] for reply. Actually i not see it in the UI, i just see the jmx partition queue metrics. [~sibyl.lv] Could you help reproduce the UI problem? was (Author: zhuqi): Thanks [~epayne] for reply. Actually i not see it in the UI, i just see the jmx queue metrics. [~sibyl.lv] Could you help reproduce the UI problem? > QueueMetrics has incorrect Allocated Resource when labelled partitions updated > -- > > Key: YARN-10517 > URL: https://issues.apache.org/jira/browse/YARN-10517 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0, 3.3.0 >Reporter: sibyl.lv >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10517-branch-3.2.001.patch, YARN-10517.001.patch, > wrong metrics.png > > > After https://issues.apache.org/jira/browse/YARN-9596, QueueMetrics still has > incorrect allocated jmx, such as {color:#660e7a}allocatedMB, > {color}{color:#660e7a}allocatedVCores and > {color}{color:#660e7a}allocatedContainers, {color}when the node partition is > updated from "DEFAULT" to other label and there are running applications. > Steps to reproduce > == > # Configure capacity-scheduler.xml with label configuration > # Submit one application to default partition and run > # Add label "tpcds" to cluster and replace label on node1 and node2 to be > "tpcds" when the above application is running > # Note down "VCores Used" at Web UI > # When the application is finished, the metrics get wrong (screenshots > attached). > == > > FiCaSchedulerApp doesn't update queue metrics when CapacityScheduler handles > this event {color:#660e7a}NODE_LABELS_UPDATE.{color} > So we should release container resource from old partition and add used > resource to new partition, just as updating queueUsage. > {code:java} > // code placeholder > public void nodePartitionUpdated(RMContainer rmContainer, String oldPartition, > String newPartition) { > Resource containerResource = rmContainer.getAllocatedResource(); > this.attemptResourceUsage.decUsed(oldPartition, containerResource); > this.attemptResourceUsage.incUsed(newPartition, containerResource); > getCSLeafQueue().decUsedResource(oldPartition, containerResource, this); > getCSLeafQueue().incUsedResource(newPartition, containerResource, this); > // Update new partition name if container is AM and also update AM resource > if (rmContainer.isAMContainer()) { > setAppAMNodePartitionName(newPartition); > this.attemptResourceUsage.decAMUsed(oldPartition, containerResource); > this.attemptResourceUsage.incAMUsed(newPartition, containerResource); > getCSLeafQueue().decAMUsedResource(oldPartition, containerResource, this); > getCSLeafQueue().incAMUsedResource(newPartition, containerResource, this); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309080#comment-17309080 ] Ahmed Hussein commented on YARN-10501: -- findbugs is not supported. We need to pull HADOOP-16870 into branch-2.10. https://issues.apache.org/jira/browse/HADOOP-16870?focusedCommentId=17309077=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17309077 > Can't remove all node labels after add node label without nodemanager port > -- > > Key: YARN-10501 > URL: https://issues.apache.org/jira/browse/YARN-10501 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Critical > Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3 > > Attachments: YARN-10501-branch-2.10.001.patch, YARN-10501.002.patch, > YARN-10501.003.patch, YARN-10501.004.patch, YARN-10502-branch-2.10.002.patch > > > When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) > port, it can't remove all label info in these nodes > Reproduce process: > {code:java} > 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)" > 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode" > 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}} > 4.yarn rmadmin -replaceLabelsOnNode "server001" > 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}} > {code} > You can see after the 4 process to remove nodemanager labels, the label info > is still in the node info. > {code:java} > 641 case REPLACE: > 642 replaceNodeForLabels(nodeId, host.labels, labels); > 643 replaceLabelsForNode(nodeId, host.labels, labels); > 644 host.labels.clear(); > 645 host.labels.addAll(labels); > 646 for (Node node : host.nms.values()) { > 647 replaceNodeForLabels(node.nodeId, node.labels, labels); > 649 node.labels = null; > 650 } > 651 break;{code} > The cause is in 647 line, when add labels to node without port, the 0 port > and the real nm port with be both add to node info, and when remove labels, > the parameter node.labels in 647 line is null, so it will not remove the old > label. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309058#comment-17309058 ] Eric Badger commented on YARN-10503: Thanks for the patch, [~zhuqi]! Here are a few comments {noformat} +if (ResourceUtils.getNumberOfKnownResourceTypes() > 2) { + ResourceInformation[] resources = + resource.getResources(); + for (int i = 2; i < resources.length; i++) { +ResourceInformation resInfo = resources[i]; +resourceString.append("," ++ resInfo.getName() + "=" + resInfo.getValue()); + } +} {noformat} This code snippet is repeated a lot of different times in this patch. I think it would make sense to make this into a method so that we don't have so much code repetition. {{splits[0]}} is used enough in the code that I think it makes sense to make it into a local variable for better readability. > Support queue capacity in terms of absolute resources with custom > resourceType. > --- > > Key: YARN-10503 > URL: https://issues.apache.org/jira/browse/YARN-10503 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-10503.001.patch, YARN-10503.002.patch, > YARN-10503.003.patch, YARN-10503.004.patch, YARN-10503.005.patch, > YARN-10503.006.patch > > > Now the absolute resources are memory and cores. > {code:java} > /** > * Different resource types supported. > */ > public enum AbsoluteResourceType { > MEMORY, VCORES; > }{code} > But in our GPU production clusters, we need to support more resourceTypes. > It's very import for cluster scaling when with different resourceType > absolute demands. > > This Jira will handle GPU first. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10705) Misleading DEBUG log for container assignment needs to be removed when the container is actually reserved, not assigned in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-10705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309038#comment-17309038 ] Siddharth Ahuja edited comment on YARN-10705 at 3/25/21, 11:36 PM: --- Added a patch to ensure that logging only happens in case of an actual container assignment/allocation, not reservation. Tested this on a single node cluster from generated distribution after compilation of the patch on trunk using the below steps: * Set {{yarn.resourcemanager.scheduler.class}} to {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler}}, * Started YARN on the single node cluster, it has 1 NodeManager with 8GB to run containers, * Enabled DEBUG logging for the FSLeafQueue class to check for debug logs: {code} bin/yarn daemonlog -setlevel localhost:8088 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue DEBUG {code} * Check for the DEBUG allocation message in the RM logs : {code} tail -f rmlogs.log | grep "DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.somequeue" {code} * Ran the first job requiring 1 AM + 1 non-AM worth 2GB each, so 4GB out of 8GB are used up: {code} bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar sleep -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.map.memory.mb=2048 -m 1 -mt 60 {code} * Ran a second job requiring 1 AM + 1 non-AM worth 2GB and 4GB respectively. 2nd application starts i.e. AM starts but there is no room for the 4GB container yet so reservation for the 4GB non-AM happens. {code} bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar sleep -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.map.memory.mb=4096 -m 1 -mt 60 {code} * With the patch only following 3 lines are present when reservation occurs which is expected after the patch is applied: {code} 2021-03-25 17:54:13,475 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.somequeue container: 2021-03-25 17:54:20,507 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.somequeue container: 2021-03-25 17:54:35,558 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.somequeue container: {code} however, in the case of no patch, this was getting added before: {code} 2021-03-25 17:54:43,589 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.somequeue container: {code} No JUnits required, as the change is about a "lack" of log, no change to functionality, as such, re-running existing JUnits should suffice. was (Author: sahuja): Added a patch to ensure that logging only happens in case of an actual container assignment/allocation, not reservation. Tested this on a single node cluster from generated distribution after compilation of the patch on trunk using the below steps: * Set {{yarn.resourcemanager.scheduler.class}} to {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler}}, * Started YARN on the single node cluster, it has 1 NodeManager with 8GB to run containers, * Enabled DEBUG logging for the FSLeafQueue class to check for debug logs: {code} bin/yarn daemonlog -setlevel localhost:8088 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue DEBUG {code} * Check for the DEBUG allocation message in the RM logs : {code} tail -f rmlogs.log | grep "DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.somequeue" {code} * Ran the first job requiring 1 AM + 1 non-AM worth 2GB each, so 4GB out of 8GB are used up: {code} bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar sleep -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.map.memory.mb=2048 -m 1 -mt 60 {code} * Ran a second job requiring 1 AM + 1 non-AM worth 2GB and 4GB respectively. 2nd application starts i.e. AM starts but there is no room for the 4GB container yet so reservation for the 4GB non-AM happens. {code} bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar sleep -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.map.memory.mb=4096 -m 1 -mt 60 {code} * With the patch only following 3 lines are present when reservation occurs which is expected after the patch is applied: {code} 2021-03-25 17:54:13,475 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.sidtheadmin container: 2021-03-25 17:54:20,507 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned
[jira] [Comment Edited] (YARN-10705) Misleading DEBUG log for container assignment needs to be removed when the container is actually reserved, not assigned in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-10705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309038#comment-17309038 ] Siddharth Ahuja edited comment on YARN-10705 at 3/25/21, 11:36 PM: --- Added a patch to ensure that logging only happens in case of an actual container assignment/allocation, not reservation. Tested this on a single node cluster from generated distribution after compilation of the patch on trunk using the below steps: * Set {{yarn.resourcemanager.scheduler.class}} to {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler}}, * Started YARN on the single node cluster, it has 1 NodeManager with 8GB to run containers, * Enabled DEBUG logging for the FSLeafQueue class to check for debug logs: {code} bin/yarn daemonlog -setlevel localhost:8088 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue DEBUG {code} * Check for the DEBUG allocation message in the RM logs : {code} tail -f rmlogs.log | grep "DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.somequeue" {code} * Ran the first job requiring 1 AM + 1 non-AM worth 2GB each, so 4GB out of 8GB are used up: {code} bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar sleep -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.map.memory.mb=2048 -m 1 -mt 60 {code} * Ran a second job requiring 1 AM + 1 non-AM worth 2GB and 4GB respectively. 2nd application starts i.e. AM starts but there is no room for the 4GB container yet so reservation for the 4GB non-AM happens. {code} bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar sleep -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.map.memory.mb=4096 -m 1 -mt 60 {code} * With the patch only following 3 lines are present when reservation occurs which is expected after the patch is applied: {code} 2021-03-25 17:54:13,475 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.sidtheadmin container: 2021-03-25 17:54:20,507 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.sidtheadmin container: 2021-03-25 17:54:35,558 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.somequeue container: {code} however, in the case of no patch, this was getting added before: {code} 2021-03-25 17:54:43,589 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.somequeue container: {code} No JUnits required, as the change is about a "lack" of log, no change to functionality, as such, re-running existing JUnits should suffice. was (Author: sahuja): Added a patch to ensure that logging only happens in case of an actual container assignment/allocation, not reservation. Tested this on a single node cluster from generated distribution after compilation of the patch on trunk using the below steps: * Set {{yarn.resourcemanager.scheduler.class}} to {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler}}, * Started YARN on the single node cluster, it has 1 NodeManager with 8GB to run containers, * Enabled DEBUG logging for the FSLeafQueue class to check for debug logs: {code} bin/yarn daemonlog -setlevel localhost:8088 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue DEBUG {code} * Check for the DEBUG allocation message in the RM logs : {code} tail -f rmlogs.log | grep "DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.somequeue" {code} * Ran the first job requiring 1 AM + 1 non-AM worth 2GB each, so 4GB out of 8GB are used up: {code} bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar sleep -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.map.memory.mb=2048 -m 1 -mt 60 {code} * Ran a second job requiring 1 AM + 1 non-AM worth 2GB and 4GB respectively. 2nd application starts i.e. AM starts but there is no room for the 4GB container yet so reservation for the 4GB non-AM happens. {code} bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar sleep -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.map.memory.mb=4096 -m 1 -mt 60 {code} * With the patch only following 3 lines are present when reservation occurs which is expected after the patch is applied: {code} 2021-03-25 17:54:13,475 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.sidtheadmin container: 2021-03-25 17:54:20,507 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue:
[jira] [Comment Edited] (YARN-10705) Misleading DEBUG log for container assignment needs to be removed when the container is actually reserved, not assigned in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-10705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309038#comment-17309038 ] Siddharth Ahuja edited comment on YARN-10705 at 3/25/21, 11:18 PM: --- Added a patch to ensure that logging only happens in case of an actual container assignment/allocation, not reservation. Tested this on a single node cluster from generated distribution after compilation of the patch on trunk using the below steps: * Set {{yarn.resourcemanager.scheduler.class}} to {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler}}, * Started YARN on the single node cluster, it has 1 NodeManager with 8GB to run containers, * Enabled DEBUG logging for the FSLeafQueue class to check for debug logs: {code} bin/yarn daemonlog -setlevel localhost:8088 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue DEBUG {code} * Check for the DEBUG allocation message in the RM logs : {code} tail -f rmlogs.log | grep "DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.somequeue" {code} * Ran the first job requiring 1 AM + 1 non-AM worth 2GB each, so 4GB out of 8GB are used up: {code} bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar sleep -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.map.memory.mb=2048 -m 1 -mt 60 {code} * Ran a second job requiring 1 AM + 1 non-AM worth 2GB and 4GB respectively. 2nd application starts i.e. AM starts but there is no room for the 4GB container yet so reservation for the 4GB non-AM happens. {code} bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar sleep -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.map.memory.mb=4096 -m 1 -mt 60 {code} * With the patch only following 3 lines are present when reservation occurs which is expected after the patch is applied: {code} 2021-03-25 17:54:13,475 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.sidtheadmin container: 2021-03-25 17:54:20,507 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.sidtheadmin container: 2021-03-25 17:54:35,558 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.sidtheadmin container: {code} however, in the case of no patch, this was getting added before: {code} 2021-03-25 17:54:43,589 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.sidtheadmin container: {code} No JUnits required, as the change is about a "lack" of log, no change to functionality, as such, re-running existing JUnits should suffice. was (Author: sahuja): Added a patch to ensure that logging only happens in case of an actual container assignment/allocation, not reservation. Tested this on a single node cluster from generated distribution after compilation of the patch on trunk using the below steps: * Set {{yarn.resourcemanager.scheduler.class}} to {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler}}, * Started YARN on the single node cluster, it has 1 NodeManager with 8GB to run containers, * Enabled DEBUG logging for the FSLeafQueue class to check for debug logs: {code} bin/yarn daemonlog -setlevel localhost:8088 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue DEBUG {code} * Check for the DEBUG allocation message in the RM logs : {code} tail -f rmlogs.log | grep "DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.somequeue" {code} * Ran the first job requiring 1 AM + 1 non-AM worth 2GB each, so 4GB out of 8GB are used up: {code} bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar sleep -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.map.memory.mb=2048 -m 1 -mt 60 {code} * Ran a second job requiring 1 AM + 1 non-AM worth 2GB and 4GB respectively. 2nd application starts i.e. AM starts but there is no room for the 4GB container yet so reservation for the 4GB non-AM happens. {code} bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar sleep -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.map.memory.mb=4096 -m 1 -mt 60 {code} * With the patch only following 3 lines are pretty when reservation occurs: {code} 2021-03-25 17:54:13,475 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.sidtheadmin container: 2021-03-25 17:54:20,507 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.sidtheadmin
[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309028#comment-17309028 ] Eric Badger commented on YARN-10501: [~aajisaka], can you help out here? The Yetus bug is blocking this patch > Can't remove all node labels after add node label without nodemanager port > -- > > Key: YARN-10501 > URL: https://issues.apache.org/jira/browse/YARN-10501 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Critical > Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3 > > Attachments: YARN-10501-branch-2.10.001.patch, YARN-10501.002.patch, > YARN-10501.003.patch, YARN-10501.004.patch, YARN-10502-branch-2.10.002.patch > > > When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) > port, it can't remove all label info in these nodes > Reproduce process: > {code:java} > 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)" > 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode" > 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}} > 4.yarn rmadmin -replaceLabelsOnNode "server001" > 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}} > {code} > You can see after the 4 process to remove nodemanager labels, the label info > is still in the node info. > {code:java} > 641 case REPLACE: > 642 replaceNodeForLabels(nodeId, host.labels, labels); > 643 replaceLabelsForNode(nodeId, host.labels, labels); > 644 host.labels.clear(); > 645 host.labels.addAll(labels); > 646 for (Node node : host.nms.values()) { > 647 replaceNodeForLabels(node.nodeId, node.labels, labels); > 649 node.labels = null; > 650 } > 651 break;{code} > The cause is in 647 line, when add labels to node without port, the 0 port > and the real nm port with be both add to node info, and when remove labels, > the parameter node.labels in 647 line is null, so it will not remove the old > label. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10713) ClusterMetrics should support custom resource capacity related metrics.
[ https://issues.apache.org/jira/browse/YARN-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10713: --- Fix Version/s: 3.3.1 3.4.0 Thanks for the patch, [~zhuqi]. I tested this out on my local GPU environment and everything looks good. +1 I've committed this to trunk (3.4) and branch-3.3. The cherry-pick comes back clean to branch-3.2, but there is a compilation error that I believe is due to some other requisite patches not being pulled back there. If you'd like it to go back to branch-3.2, we'll need to do some additional work. Closing for now, though. > ClusterMetrics should support custom resource capacity related metrics. > --- > > Key: YARN-10713 > URL: https://issues.apache.org/jira/browse/YARN-10713 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-10713.001.patch, YARN-10713.002.patch > > > YARN-10688 > Only add gpu resource capacity related metrics, i think we should improve it > to support custom resources as [~ebadger] suggested. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10493) RunC container repository v2
[ https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Sharp updated YARN-10493: - Attachment: runc-container-repository-v2-design_updated.pdf > RunC container repository v2 > > > Key: YARN-10493 > URL: https://issues.apache.org/jira/browse/YARN-10493 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, yarn >Affects Versions: 3.3.0 >Reporter: Craig Condit >Assignee: Matthew Sharp >Priority: Major > Labels: pull-request-available > Attachments: runc-container-repository-v2-design.pdf, > runc-container-repository-v2-design_updated.pdf > > Time Spent: 0.5h > Remaining Estimate: 0h > > The current runc container repository design has scalability and usability > issues which will likely limit widespread adoption. We should address this > with a new, V2 layout. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10660) YARN Web UI have problem when show node partitions resource
[ https://issues.apache.org/jira/browse/YARN-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-10660: -- Fix Version/s: (was: 3.2.2) (was: 3.2.1) (was: 3.1.1) (was: 3.1.0) [~tuyu], I'm removing the entries in the Fix Version field. Values are only entered in that field by the committer when the JIRA is resolved. > YARN Web UI have problem when show node partitions resource > --- > > Key: YARN-10660 > URL: https://issues.apache.org/jira/browse/YARN-10660 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.1.0, 3.1.1, 3.2.1, 3.2.2 >Reporter: tuyu >Priority: Minor > Attachments: 2021-03-01 19-56-02 的屏幕截图.png, YARN-10660.patch > > > when enable yarn label function, Yarn UI will show queue resource base on > partitions,but there have some problem when click expand button. The url will > increase very long, like this > {code:java} > 127.0.0.1:20701/cluster/scheduler?openQueues=Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96DEFAULT_PARTITION%20memory:491520,%20vCores:96 > {code} > The root cause is > {code:java} >origin url is: > Partition: >htmlencode is: > Partition: DEFAULT_PARTITION memory:491520, vCores:96 > SchedulerPageUtil have some javascript code > storeExpandedQueue > tmpCurrentParam = tmpCurrentParam.split('&');", >the Partition: DEFAULT_PARTITION memory:491520, vCores:96 > will split and len > 1, the problem logic is here, if click expand button > close, the function will clear params, but it the split array is not match > orgin url > {code} > when click expand button close, lt;DEFAULT_PARTITION memory:491520, > vCores:96 will append, if click expand multi times, the length will > increase too long > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10597) CSMappingPlacementRule should not create new instance of Groups
[ https://issues.apache.org/jira/browse/YARN-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308884#comment-17308884 ] Ahmed Hussein commented on YARN-10597: -- Thanks [~shuzirra] for the fix. +1 (non-binding) > CSMappingPlacementRule should not create new instance of Groups > --- > > Key: YARN-10597 > URL: https://issues.apache.org/jira/browse/YARN-10597 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: YARN-10597.001.patch, YARN-10597.002.patch > > > As [~ahussein] pointed out in YARN-10425, no new Groups instance should be > created. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1187) Add discrete event-based simulation to yarn scheduler simulator
[ https://issues.apache.org/jira/browse/YARN-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308857#comment-17308857 ] Andrew Chung commented on YARN-1187: [~108anup] has updated our patch to be applicable to trunk, uploaded [here|https://issues.apache.org/jira/secure/attachment/13022958/YARN-1187-trunk.001.patch], thanks [~108anup]! > Add discrete event-based simulation to yarn scheduler simulator > --- > > Key: YARN-1187 > URL: https://issues.apache.org/jira/browse/YARN-1187 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Andrew Chung >Priority: Major > Attachments: YARN-1187 design doc.pdf, > YARN-1187-branch-2.1.3.001.patch, YARN-1187-trunk.001.patch > > > Follow the discussion in YARN-1021. > Discrete event simulation decouples the running from any real-world clock. > This allows users to step through the execution, set debug points, and > definitely get a deterministic rexec. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-1187) Add discrete event-based simulation to yarn scheduler simulator
[ https://issues.apache.org/jira/browse/YARN-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Chung updated YARN-1187: --- Attachment: YARN-1187-trunk.001.patch > Add discrete event-based simulation to yarn scheduler simulator > --- > > Key: YARN-1187 > URL: https://issues.apache.org/jira/browse/YARN-1187 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Andrew Chung >Priority: Major > Attachments: YARN-1187 design doc.pdf, > YARN-1187-branch-2.1.3.001.patch, YARN-1187-trunk.001.patch > > > Follow the discussion in YARN-1021. > Discrete event simulation decouples the running from any real-world clock. > This allows users to step through the execution, set debug points, and > definitely get a deterministic rexec. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10517) QueueMetrics has incorrect Allocated Resource when labelled partitions updated
[ https://issues.apache.org/jira/browse/YARN-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308849#comment-17308849 ] Eric Payne commented on YARN-10517: --- [~sibyl.lv] / [~zhuqi]. I can't seem to reproduce this issue. Can you please provide your config property values for the following? {{yarn.scheduler.capacity.root.accessible-node-labels.tpcds.capacity}} {{yarn.scheduler.capacity.root.accessible-node-labels.tpcds.maximum-capacity}} {{yarn.scheduler.capacity.root..accessible-node-labels}} {{yarn.scheduler.capacity.root..accessible-node-labels.tpcds.capacity}} {{yarn.scheduler.capacity.root..accessible-node-labels.tpcds.maximum-capacity}} {{yarn.scheduler.capacity.root..default-node-label-expression}} bq. 3. Add label "tpcds" to cluster and replace label on node1 and node2 to be "tpcds" when the above application is running Also, in step 3, can you provide the exact commands that you ran? I assume they are as follows, but I want to make sure we are on the same page: {code:bash} $ yarn rmadmin -addToClusterNodeLabels "tpcds" $ yarn rmadmin -replaceLabelsOnNode "Node1:Port1=tpcds Node2:Port2=tpcds" {code} > QueueMetrics has incorrect Allocated Resource when labelled partitions updated > -- > > Key: YARN-10517 > URL: https://issues.apache.org/jira/browse/YARN-10517 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0, 3.3.0 >Reporter: sibyl.lv >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10517-branch-3.2.001.patch, YARN-10517.001.patch, > wrong metrics.png > > > After https://issues.apache.org/jira/browse/YARN-9596, QueueMetrics still has > incorrect allocated jmx, such as {color:#660e7a}allocatedMB, > {color}{color:#660e7a}allocatedVCores and > {color}{color:#660e7a}allocatedContainers, {color}when the node partition is > updated from "DEFAULT" to other label and there are running applications. > Steps to reproduce > == > # Configure capacity-scheduler.xml with label configuration > # Submit one application to default partition and run > # Add label "tpcds" to cluster and replace label on node1 and node2 to be > "tpcds" when the above application is running > # Note down "VCores Used" at Web UI > # When the application is finished, the metrics get wrong (screenshots > attached). > == > > FiCaSchedulerApp doesn't update queue metrics when CapacityScheduler handles > this event {color:#660e7a}NODE_LABELS_UPDATE.{color} > So we should release container resource from old partition and add used > resource to new partition, just as updating queueUsage. > {code:java} > // code placeholder > public void nodePartitionUpdated(RMContainer rmContainer, String oldPartition, > String newPartition) { > Resource containerResource = rmContainer.getAllocatedResource(); > this.attemptResourceUsage.decUsed(oldPartition, containerResource); > this.attemptResourceUsage.incUsed(newPartition, containerResource); > getCSLeafQueue().decUsedResource(oldPartition, containerResource, this); > getCSLeafQueue().incUsedResource(newPartition, containerResource, this); > // Update new partition name if container is AM and also update AM resource > if (rmContainer.isAMContainer()) { > setAppAMNodePartitionName(newPartition); > this.attemptResourceUsage.decAMUsed(oldPartition, containerResource); > this.attemptResourceUsage.incAMUsed(newPartition, containerResource); > getCSLeafQueue().decAMUsedResource(oldPartition, containerResource, this); > getCSLeafQueue().incAMUsedResource(newPartition, containerResource, this); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-1187) Add discrete event-based simulation to yarn scheduler simulator
[ https://issues.apache.org/jira/browse/YARN-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anup Agarwal updated YARN-1187: --- Comment: was deleted (was: Migrated the patch over to trunk.) > Add discrete event-based simulation to yarn scheduler simulator > --- > > Key: YARN-1187 > URL: https://issues.apache.org/jira/browse/YARN-1187 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Andrew Chung >Priority: Major > Attachments: YARN-1187 design doc.pdf, > YARN-1187-branch-2.1.3.001.patch > > > Follow the discussion in YARN-1021. > Discrete event simulation decouples the running from any real-world clock. > This allows users to step through the execution, set debug points, and > definitely get a deterministic rexec. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1187) Add discrete event-based simulation to yarn scheduler simulator
[ https://issues.apache.org/jira/browse/YARN-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308835#comment-17308835 ] Anup Agarwal commented on YARN-1187: Migrated the patch over to trunk. > Add discrete event-based simulation to yarn scheduler simulator > --- > > Key: YARN-1187 > URL: https://issues.apache.org/jira/browse/YARN-1187 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Andrew Chung >Priority: Major > Attachments: YARN-1187 design doc.pdf, > YARN-1187-branch-2.1.3.001.patch > > > Follow the discussion in YARN-1021. > Discrete event simulation decouples the running from any real-world clock. > This allows users to step through the execution, set debug points, and > definitely get a deterministic rexec. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10713) ClusterMetrics should support custom resource capacity related metrics.
[ https://issues.apache.org/jira/browse/YARN-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308817#comment-17308817 ] Eric Badger commented on YARN-10713: [~zhuqi], I very much appreciate the patches and am trying to review as quickly as possible. But the number of different patches going on concurrently is quite overwhelming. I will do my best to review them in a timely matter > ClusterMetrics should support custom resource capacity related metrics. > --- > > Key: YARN-10713 > URL: https://issues.apache.org/jira/browse/YARN-10713 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10713.001.patch, YARN-10713.002.patch > > > YARN-10688 > Only add gpu resource capacity related metrics, i think we should improve it > to support custom resources as [~ebadger] suggested. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10711) Make CSQueueMetrics configured related field to support nodelabel.
[ https://issues.apache.org/jira/browse/YARN-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308773#comment-17308773 ] Hadoop QA commented on YARN-10711: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 17s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 35s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 24s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 41s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 36s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 47s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 53s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 12s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 30s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 26m 24s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 3m 56s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 30s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 9s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 9s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 17s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 17s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 43s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/848/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 36 new + 418 unchanged - 0 fixed = 454 total (was 418) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 49s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | |
[jira] [Updated] (YARN-10657) We should make max application per queue to support node label.
[ https://issues.apache.org/jira/browse/YARN-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu updated YARN-10657: -- Description: https://issues.apache.org/jira/browse/YARN-10641?focusedCommentId=17291708=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17291708 As we discussed in above comment: We should deep into the label related max applications per queue. I think when node label enabled in queue, max applications should consider the max capacity of all labels. was: https://issues.apache.org/jira/browse/YARN-10641?focusedCommentId=17291708=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17291708 As we discussed in above comment: We should deep into the label related max applications per queue. > We should make max application per queue to support node label. > --- > > Key: YARN-10657 > URL: https://issues.apache.org/jira/browse/YARN-10657 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > > https://issues.apache.org/jira/browse/YARN-10641?focusedCommentId=17291708=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17291708 > As we discussed in above comment: > We should deep into the label related max applications per queue. > I think when node label enabled in queue, max applications should consider > the max capacity of all labels. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308657#comment-17308657 ] Hadoop QA commented on YARN-10501: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 18m 24s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} yetus {color} | {color:red} 0m 14s{color} | {color:red}{color} | {color:red} Unprocessed flag(s): --spotbugs-strict-precheck {color} | \\ \\ || Subsystem || Report/Notes || | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/849/artifact/out/Dockerfile | | JIRA Issue | YARN-10501 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13022949/YARN-10502-branch-2.10.002.patch | | Console output | https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/849/console | | versions | git=2.7.4 | | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. > Can't remove all node labels after add node label without nodemanager port > -- > > Key: YARN-10501 > URL: https://issues.apache.org/jira/browse/YARN-10501 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Critical > Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3 > > Attachments: YARN-10501-branch-2.10.001.patch, YARN-10501.002.patch, > YARN-10501.003.patch, YARN-10501.004.patch, YARN-10502-branch-2.10.002.patch > > > When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) > port, it can't remove all label info in these nodes > Reproduce process: > {code:java} > 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)" > 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode" > 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}} > 4.yarn rmadmin -replaceLabelsOnNode "server001" > 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}} > {code} > You can see after the 4 process to remove nodemanager labels, the label info > is still in the node info. > {code:java} > 641 case REPLACE: > 642 replaceNodeForLabels(nodeId, host.labels, labels); > 643 replaceLabelsForNode(nodeId, host.labels, labels); > 644 host.labels.clear(); > 645 host.labels.addAll(labels); > 646 for (Node node : host.nms.values()) { > 647 replaceNodeForLabels(node.nodeId, node.labels, labels); > 649 node.labels = null; > 650 } > 651 break;{code} > The cause is in 647 line, when add labels to node without port, the 0 port > and the real nm port with be both add to node info, and when remove labels, > the parameter node.labels in 647 line is null, so it will not remove the old > label. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10501) Can't remove all node labels after add node label without nodemanager port
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caozhiqiang updated YARN-10501: --- Attachment: YARN-10502-branch-2.10.002.patch > Can't remove all node labels after add node label without nodemanager port > -- > > Key: YARN-10501 > URL: https://issues.apache.org/jira/browse/YARN-10501 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Critical > Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3 > > Attachments: YARN-10501-branch-2.10.001.patch, YARN-10501.002.patch, > YARN-10501.003.patch, YARN-10501.004.patch, YARN-10502-branch-2.10.002.patch > > > When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) > port, it can't remove all label info in these nodes > Reproduce process: > {code:java} > 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)" > 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode" > 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}} > 4.yarn rmadmin -replaceLabelsOnNode "server001" > 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}} > {code} > You can see after the 4 process to remove nodemanager labels, the label info > is still in the node info. > {code:java} > 641 case REPLACE: > 642 replaceNodeForLabels(nodeId, host.labels, labels); > 643 replaceLabelsForNode(nodeId, host.labels, labels); > 644 host.labels.clear(); > 645 host.labels.addAll(labels); > 646 for (Node node : host.nms.values()) { > 647 replaceNodeForLabels(node.nodeId, node.labels, labels); > 649 node.labels = null; > 650 } > 651 break;{code} > The cause is in 647 line, when add labels to node without port, the 0 port > and the real nm port with be both add to node info, and when remove labels, > the parameter node.labels in 647 line is null, so it will not remove the old > label. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10501) Can't remove all node labels after add node label without nodemanager port
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caozhiqiang updated YARN-10501: --- Attachment: (was: YARN-10502-branch-2.10.002.patch) > Can't remove all node labels after add node label without nodemanager port > -- > > Key: YARN-10501 > URL: https://issues.apache.org/jira/browse/YARN-10501 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Critical > Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3 > > Attachments: YARN-10501-branch-2.10.001.patch, YARN-10501.002.patch, > YARN-10501.003.patch, YARN-10501.004.patch > > > When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) > port, it can't remove all label info in these nodes > Reproduce process: > {code:java} > 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)" > 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode" > 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}} > 4.yarn rmadmin -replaceLabelsOnNode "server001" > 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}} > {code} > You can see after the 4 process to remove nodemanager labels, the label info > is still in the node info. > {code:java} > 641 case REPLACE: > 642 replaceNodeForLabels(nodeId, host.labels, labels); > 643 replaceLabelsForNode(nodeId, host.labels, labels); > 644 host.labels.clear(); > 645 host.labels.addAll(labels); > 646 for (Node node : host.nms.values()) { > 647 replaceNodeForLabels(node.nodeId, node.labels, labels); > 649 node.labels = null; > 650 } > 651 break;{code} > The cause is in 647 line, when add labels to node without port, the 0 port > and the real nm port with be both add to node info, and when remove labels, > the parameter node.labels in 647 line is null, so it will not remove the old > label. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10501) Can't remove all node labels after add node label without nodemanager port
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caozhiqiang updated YARN-10501: --- Attachment: (was: YARN-10502-branch-2.10.003.patch) > Can't remove all node labels after add node label without nodemanager port > -- > > Key: YARN-10501 > URL: https://issues.apache.org/jira/browse/YARN-10501 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Critical > Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3 > > Attachments: YARN-10501-branch-2.10.001.patch, YARN-10501.002.patch, > YARN-10501.003.patch, YARN-10501.004.patch > > > When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) > port, it can't remove all label info in these nodes > Reproduce process: > {code:java} > 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)" > 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode" > 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}} > 4.yarn rmadmin -replaceLabelsOnNode "server001" > 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}} > {code} > You can see after the 4 process to remove nodemanager labels, the label info > is still in the node info. > {code:java} > 641 case REPLACE: > 642 replaceNodeForLabels(nodeId, host.labels, labels); > 643 replaceLabelsForNode(nodeId, host.labels, labels); > 644 host.labels.clear(); > 645 host.labels.addAll(labels); > 646 for (Node node : host.nms.values()) { > 647 replaceNodeForLabels(node.nodeId, node.labels, labels); > 649 node.labels = null; > 650 } > 651 break;{code} > The cause is in 647 line, when add labels to node without port, the 0 port > and the real nm port with be both add to node info, and when remove labels, > the parameter node.labels in 647 line is null, so it will not remove the old > label. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10711) Make CSQueueMetrics configured related field to support nodelabel.
[ https://issues.apache.org/jira/browse/YARN-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308625#comment-17308625 ] Qi Zhu commented on YARN-10711: --- cc [~pbacsko] [~gandras] [~ebadger] [~Jim_Brennan] [~epayne] Updated a patch for review. Thanks. > Make CSQueueMetrics configured related field to support nodelabel. > -- > > Key: YARN-10711 > URL: https://issues.apache.org/jira/browse/YARN-10711 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10711.001.patch > > > {code:java} > // Update configured capacity/max-capacity for default partition only > CSQueueUtils.updateConfiguredCapacityMetrics(resourceCalculator, > labelManager.getResourceByLabel(null, clusterResource), > RMNodeLabelsManager.NO_LABEL, this); > {code} > Now configured capacity/max-capacity only support default partition. > We should support nodelabel. > > cc [~pbacsko] [~gandras] [~ebadger] [~Jim_Brennan] [~epayne] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308596#comment-17308596 ] Hadoop QA commented on YARN-10503: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 27s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 57s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 41s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 19m 48s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 1m 49s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green}{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 46s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green}{color} | {color:green} the patch passed with JDK
[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308472#comment-17308472 ] Qi Zhu commented on YARN-10503: --- Fixed checksyle in latest patch. > Support queue capacity in terms of absolute resources with custom > resourceType. > --- > > Key: YARN-10503 > URL: https://issues.apache.org/jira/browse/YARN-10503 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-10503.001.patch, YARN-10503.002.patch, > YARN-10503.003.patch, YARN-10503.004.patch, YARN-10503.005.patch, > YARN-10503.006.patch > > > Now the absolute resources are memory and cores. > {code:java} > /** > * Different resource types supported. > */ > public enum AbsoluteResourceType { > MEMORY, VCORES; > }{code} > But in our GPU production clusters, we need to support more resourceTypes. > It's very import for cluster scaling when with different resourceType > absolute demands. > > This Jira will handle GPU first. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu updated YARN-10503: -- Attachment: YARN-10503.006.patch > Support queue capacity in terms of absolute resources with custom > resourceType. > --- > > Key: YARN-10503 > URL: https://issues.apache.org/jira/browse/YARN-10503 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-10503.001.patch, YARN-10503.002.patch, > YARN-10503.003.patch, YARN-10503.004.patch, YARN-10503.005.patch, > YARN-10503.006.patch > > > Now the absolute resources are memory and cores. > {code:java} > /** > * Different resource types supported. > */ > public enum AbsoluteResourceType { > MEMORY, VCORES; > }{code} > But in our GPU production clusters, we need to support more resourceTypes. > It's very import for cluster scaling when with different resourceType > absolute demands. > > This Jira will handle GPU first. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10713) ClusterMetrics should support custom resource capacity related metrics.
[ https://issues.apache.org/jira/browse/YARN-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308428#comment-17308428 ] Hadoop QA commented on YARN-10713: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 35s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 36s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 23s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m 38s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 1m 56s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green}{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 15 unchanged - 1 fixed = 15 total (was 16) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 2s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | |
[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308423#comment-17308423 ] Hadoop QA commented on YARN-10503: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 16s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 48s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 6s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m 15s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 1m 52s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 52s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 39s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/845/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 54 unchanged - 0 fixed = 55 total (was 54) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green}{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} |