[jira] [Updated] (YARN-10683) Add total resource in NodeManager metrics
[ https://issues.apache.org/jira/browse/YARN-10683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10683: Attachment: YARN-10683.v1.patch > Add total resource in NodeManager metrics > - > > Key: YARN-10683 > URL: https://issues.apache.org/jira/browse/YARN-10683 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Minor > Attachments: YARN-10683.v1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307585#comment-17307585 ] Hadoop QA commented on YARN-10503: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 24s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 42s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 53s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 19m 59s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 1m 49s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green}{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 7s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-1
[jira] [Comment Edited] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU and other custom resources.
[ https://issues.apache.org/jira/browse/YARN-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307540#comment-17307540 ] Qi Zhu edited comment on YARN-10704 at 3/24/21, 3:54 AM: - Thanks [~ebadger] for review. The UI v2 has not effective absolute related, i think it should be handled in ui v1 first. Thanks. was (Author: zhuqi): Thanks [~ebadger] for review. The UI v2 has not effective absolute related, i think it should be handled in ui v1 first. Thanks. > The CS effective capacity for absolute mode in UI should support GPU and > other custom resources. > > > Key: YARN-10704 > URL: https://issues.apache.org/jira/browse/YARN-10704 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10704.001.patch, YARN-10704.002.patch, > YARN-10704.003.patch, image-2021-03-19-12-05-28-412.png, > image-2021-03-19-12-08-35-273.png > > > Actually there are no information about the effective capacity about GPU in > UI for absolute resource mode. > !image-2021-03-19-12-05-28-412.png|width=873,height=136! > But we have this information in QueueMetrics: > !image-2021-03-19-12-08-35-273.png|width=613,height=268! > > It's very important for our GPU users to use in absolute mode, there still > have nothing to know GPU absolute information in CS Queue UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU and other custom resources.
[ https://issues.apache.org/jira/browse/YARN-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307540#comment-17307540 ] Qi Zhu commented on YARN-10704: --- Thanks [~ebadger] for review. The UI v2 has not effective absolute related, i think it should be handled in ui v1 first. Thanks. > The CS effective capacity for absolute mode in UI should support GPU and > other custom resources. > > > Key: YARN-10704 > URL: https://issues.apache.org/jira/browse/YARN-10704 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10704.001.patch, YARN-10704.002.patch, > YARN-10704.003.patch, image-2021-03-19-12-05-28-412.png, > image-2021-03-19-12-08-35-273.png > > > Actually there are no information about the effective capacity about GPU in > UI for absolute resource mode. > !image-2021-03-19-12-05-28-412.png|width=873,height=136! > But we have this information in QueueMetrics: > !image-2021-03-19-12-08-35-273.png|width=613,height=268! > > It's very important for our GPU users to use in absolute mode, there still > have nothing to know GPU absolute information in CS Queue UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307529#comment-17307529 ] Qi Zhu edited comment on YARN-10503 at 3/24/21, 3:35 AM: - Thanks [~ebadger] for review. [~pbacsko] Your suggestion is valid, it make sense to me. I have updated in latest patch consistent with YARN-9936 description: * In an absolute amount of resources ( e.g. memory=4GB,vcores=20,yarn.io/gpu=4 ). The amount of all resources in the queues has to be less than or equal to all resources in the cluster.{color:#de350b}Actually, the above is not supported, we only support memory and vcores now in absolute mode, we should extend in {color}YARN-10503. [~pbacsko] [~gandras] [~ebadger] If you any other advice about this? Thanks. was (Author: zhuqi): Thanks [~ebadger] for review. [~pbacsko] Your suggestion is valid, it make sense to me. I have updated in latest patch consistent with YARN-9936 description: * In an absolute amount of resources ( e.g. memory=4GB,vcores=20,yarn.io/gpu=4 ). The amount of all resources in the queues has to be less than or equal to all resources in the cluster.{color:#de350b}Actually, the above is not supported, we only support memory and vcores now in absolute mode, we should extend in {color}YARN-10503. [~pbacsko] [~gandras] [~ebadger] If you any other advice about this? > Support queue capacity in terms of absolute resources with custom > resourceType. > --- > > Key: YARN-10503 > URL: https://issues.apache.org/jira/browse/YARN-10503 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-10503.001.patch, YARN-10503.002.patch, > YARN-10503.003.patch, YARN-10503.004.patch > > > Now the absolute resources are memory and cores. > {code:java} > /** > * Different resource types supported. > */ > public enum AbsoluteResourceType { > MEMORY, VCORES; > }{code} > But in our GPU production clusters, we need to support more resourceTypes. > It's very import for cluster scaling when with different resourceType > absolute demands. > > This Jira will handle GPU first. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307529#comment-17307529 ] Qi Zhu commented on YARN-10503: --- Thanks [~ebadger] for review. [~pbacsko] Your suggestion is valid, it make sense to me. I have updated in latest patch consistent with YARN-9936 description: * In an absolute amount of resources ( e.g. memory=4GB,vcores=20,yarn.io/gpu=4 ). The amount of all resources in the queues has to be less than or equal to all resources in the cluster.{color:#de350b}Actually, the above is not supported, we only support memory and vcores now in absolute mode, we should extend in {color}YARN-10503. [~pbacsko] [~gandras] [~ebadger] If you any other advice about this? > Support queue capacity in terms of absolute resources with custom > resourceType. > --- > > Key: YARN-10503 > URL: https://issues.apache.org/jira/browse/YARN-10503 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-10503.001.patch, YARN-10503.002.patch, > YARN-10503.003.patch, YARN-10503.004.patch > > > Now the absolute resources are memory and cores. > {code:java} > /** > * Different resource types supported. > */ > public enum AbsoluteResourceType { > MEMORY, VCORES; > }{code} > But in our GPU production clusters, we need to support more resourceTypes. > It's very import for cluster scaling when with different resourceType > absolute demands. > > This Jira will handle GPU first. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu updated YARN-10503: -- Attachment: YARN-10503.004.patch > Support queue capacity in terms of absolute resources with custom > resourceType. > --- > > Key: YARN-10503 > URL: https://issues.apache.org/jira/browse/YARN-10503 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-10503.001.patch, YARN-10503.002.patch, > YARN-10503.003.patch, YARN-10503.004.patch > > > Now the absolute resources are memory and cores. > {code:java} > /** > * Different resource types supported. > */ > public enum AbsoluteResourceType { > MEMORY, VCORES; > }{code} > But in our GPU production clusters, we need to support more resourceTypes. > It's very import for cluster scaling when with different resourceType > absolute demands. > > This Jira will handle GPU first. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10517) QueueMetrics has incorrect Allocated Resource when labelled partitions updated
[ https://issues.apache.org/jira/browse/YARN-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307509#comment-17307509 ] Qi Zhu commented on YARN-10517: --- Thanks [~ebadger] for review. [~pbacsko] [~epayne] Could you help review it when you are free. Thanks. > QueueMetrics has incorrect Allocated Resource when labelled partitions updated > -- > > Key: YARN-10517 > URL: https://issues.apache.org/jira/browse/YARN-10517 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0, 3.3.0 >Reporter: sibyl.lv >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10517-branch-3.2.001.patch, YARN-10517.001.patch, > wrong metrics.png > > > After https://issues.apache.org/jira/browse/YARN-9596, QueueMetrics still has > incorrect allocated jmx, such as {color:#660e7a}allocatedMB, > {color}{color:#660e7a}allocatedVCores and > {color}{color:#660e7a}allocatedContainers, {color}when the node partition is > updated from "DEFAULT" to other label and there are running applications. > Steps to reproduce > == > # Configure capacity-scheduler.xml with label configuration > # Submit one application to default partition and run > # Add label "tpcds" to cluster and replace label on node1 and node2 to be > "tpcds" when the above application is running > # Note down "VCores Used" at Web UI > # When the application is finished, the metrics get wrong (screenshots > attached). > == > > FiCaSchedulerApp doesn't update queue metrics when CapacityScheduler handles > this event {color:#660e7a}NODE_LABELS_UPDATE.{color} > So we should release container resource from old partition and add used > resource to new partition, just as updating queueUsage. > {code:java} > // code placeholder > public void nodePartitionUpdated(RMContainer rmContainer, String oldPartition, > String newPartition) { > Resource containerResource = rmContainer.getAllocatedResource(); > this.attemptResourceUsage.decUsed(oldPartition, containerResource); > this.attemptResourceUsage.incUsed(newPartition, containerResource); > getCSLeafQueue().decUsedResource(oldPartition, containerResource, this); > getCSLeafQueue().incUsedResource(newPartition, containerResource, this); > // Update new partition name if container is AM and also update AM resource > if (rmContainer.isAMContainer()) { > setAppAMNodePartitionName(newPartition); > this.attemptResourceUsage.decAMUsed(oldPartition, containerResource); > this.attemptResourceUsage.incAMUsed(newPartition, containerResource); > getCSLeafQueue().decAMUsedResource(oldPartition, containerResource, this); > getCSLeafQueue().incAMUsedResource(newPartition, containerResource, this); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10493) RunC container repository v2
[ https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307481#comment-17307481 ] Eric Badger commented on YARN-10493: Additionally, I've run into some issues while testing. {noformat:title=CLI Invocation} hadoop jar ./hadoop-tools/hadoop-runc/target/hadoop-runc-3.4.0-SNAPSHOT.jar org.apache.hadoop.runc.tools.ImportDockerImage -r docker.foobar.com: hadoop-images/hadoop/rhel7 hadoop/rhel7 {noformat} {noformat} [ebadger@foo hadoop]$ hadoop fs -ls /runc-root/meta/hadoop/rhel7@latest.properties -rw--- 10 ebadger supergroup236 2021-03-24 00:15 /runc-root/meta/hadoop/rhel7@latest.properties {noformat} Here's the properties file after the CLI tool completes. {noformat} yarn.nodemanager.runtime.linux.runc.image-tag-to-manifest-plugin org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.runc.ImageTagToManifestV2Plugin yarn.nodemanager.runtime.linux.runc.manifest-to-resources-plugin org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.runc.HdfsManifestToResourcesV2Plugin {noformat} Then I set these properties as well as adding {{runc}} to the allowed-runtimes config. {noformat} export vars="YARN_CONTAINER_RUNTIME_TYPE=runc,YARN_CONTAINER_RUNTIME_RUNC_IMAGE=hadoop/rhel7"; $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.*-tests.jar sleep -Dyarn.app.mapreduce.am.env="HADOOP_MAPRED_HOME=$HADOOP_HOME" -Dmapreduce.admin.user.env="HADOOP_MAPRED_HOME=$HADOOP_HOME" -Dyarn.app.mapreduce.am.env=$vars -Dmapreduce.map.env=$vars -Dmapreduce.reduce.env=$vars -mt 1 -rt 1 -m 1 -r 1 {noformat} I ran a sleep job using this command. {noformat} 2021-03-24 00:26:07,823 DEBUG [NM ContainerManager dispatcher] runc.ImageTagToManifestV2Plugin (ImageTagToManifestV2Plugin.java:getHdfsImageToHashReader(144)) - Checking HDFS for image file: /runc-root/meta/library/hadoop/rhel7@latest.properties 2021-03-24 00:26:07,825 WARN [NM ContainerManager dispatcher] runc.ImageTagToManifestV2Plugin (ImageTagToManifestV2Plugin.java:getHdfsImageToHashReader(148)) - Did not load the hdfs image to hash properties file, file doesn't exist 2021-03-24 00:26:07,828 WARN [NM ContainerManager dispatcher] container.ContainerImpl (ContainerImpl.java:transition(1261)) - Failed to parse resource-request java.io.FileNotFoundException: File does not exist: /runc-root/manifest/ha/hadoop/rhel7 {noformat} Then I got this error in the NM when it was trying to resolve the tag. It added the default {{metaNamespaceDir}} (which is library) into the path when looking for the properties file. But when the CLI tool ran, it didn't add the {{metaNamespaceDir}}. I didn't have the config set in my configs at all, so the NM was using the conf default. I'm not sure if I did anything wrong here or not, but it seems inconsistent to me. Let me know what you think > RunC container repository v2 > > > Key: YARN-10493 > URL: https://issues.apache.org/jira/browse/YARN-10493 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, yarn >Affects Versions: 3.3.0 >Reporter: Craig Condit >Assignee: Matthew Sharp >Priority: Major > Labels: pull-request-available > Attachments: runc-container-repository-v2-design.pdf > > Time Spent: 0.5h > Remaining Estimate: 0h > > The current runc container repository design has scalability and usability > issues which will likely limit widespread adoption. We should address this > with a new, V2 layout. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10493) RunC container repository v2
[ https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307467#comment-17307467 ] Eric Badger commented on YARN-10493: [~MatthewSharp], thanks for the PR. Just starting to take a look at this now. I am wondering if the document is still up to date though. Is the PR you put up still a good reflection of what's in the document? Just want to make sure > RunC container repository v2 > > > Key: YARN-10493 > URL: https://issues.apache.org/jira/browse/YARN-10493 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, yarn >Affects Versions: 3.3.0 >Reporter: Craig Condit >Assignee: Matthew Sharp >Priority: Major > Labels: pull-request-available > Attachments: runc-container-repository-v2-design.pdf > > Time Spent: 0.5h > Remaining Estimate: 0h > > The current runc container repository design has scalability and usability > issues which will likely limit widespread adoption. We should address this > with a new, V2 layout. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10517) QueueMetrics has incorrect Allocated Resource when labelled partitions updated
[ https://issues.apache.org/jira/browse/YARN-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307457#comment-17307457 ] Eric Badger commented on YARN-10517: [~epayne], this change looks reasonable to me, but I'd like to get an extra pair of eyes on it as it has to do with scheduler internals > QueueMetrics has incorrect Allocated Resource when labelled partitions updated > -- > > Key: YARN-10517 > URL: https://issues.apache.org/jira/browse/YARN-10517 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0, 3.3.0 >Reporter: sibyl.lv >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10517-branch-3.2.001.patch, YARN-10517.001.patch, > wrong metrics.png > > > After https://issues.apache.org/jira/browse/YARN-9596, QueueMetrics still has > incorrect allocated jmx, such as {color:#660e7a}allocatedMB, > {color}{color:#660e7a}allocatedVCores and > {color}{color:#660e7a}allocatedContainers, {color}when the node partition is > updated from "DEFAULT" to other label and there are running applications. > Steps to reproduce > == > # Configure capacity-scheduler.xml with label configuration > # Submit one application to default partition and run > # Add label "tpcds" to cluster and replace label on node1 and node2 to be > "tpcds" when the above application is running > # Note down "VCores Used" at Web UI > # When the application is finished, the metrics get wrong (screenshots > attached). > == > > FiCaSchedulerApp doesn't update queue metrics when CapacityScheduler handles > this event {color:#660e7a}NODE_LABELS_UPDATE.{color} > So we should release container resource from old partition and add used > resource to new partition, just as updating queueUsage. > {code:java} > // code placeholder > public void nodePartitionUpdated(RMContainer rmContainer, String oldPartition, > String newPartition) { > Resource containerResource = rmContainer.getAllocatedResource(); > this.attemptResourceUsage.decUsed(oldPartition, containerResource); > this.attemptResourceUsage.incUsed(newPartition, containerResource); > getCSLeafQueue().decUsedResource(oldPartition, containerResource, this); > getCSLeafQueue().incUsedResource(newPartition, containerResource, this); > // Update new partition name if container is AM and also update AM resource > if (rmContainer.isAMContainer()) { > setAppAMNodePartitionName(newPartition); > this.attemptResourceUsage.decAMUsed(oldPartition, containerResource); > this.attemptResourceUsage.incAMUsed(newPartition, containerResource); > getCSLeafQueue().decAMUsedResource(oldPartition, containerResource, this); > getCSLeafQueue().incAMUsedResource(newPartition, containerResource, this); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10707) Support gpu in ResourceUtilization, and update Node GPU Utilization to use.
[ https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307427#comment-17307427 ] Eric Badger commented on YARN-10707: Similar to my [comment|https://issues.apache.org/jira/browse/YARN-10503?focusedCommentId=17307421&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17307421] on YARN-10503, I believe that the approach we should take here should allow for arbitrary resources, not hardcoded for GPUs. It's a lot of work to make GPUs a first class resource, but should only be a little more work in addition to make arbitrary resources (which can include GPUs) a first class resource. > Support gpu in ResourceUtilization, and update Node GPU Utilization to use. > --- > > Key: YARN-10707 > URL: https://issues.apache.org/jira/browse/YARN-10707 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10707.001.patch, YARN-10707.002.patch, > YARN-10707.003.patch > > > Support gpu in ResourceUtilization, and update Node GPU Utilization to use > first. > It will be very helpful for other use cases about GPU utilization. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307421#comment-17307421 ] Eric Badger commented on YARN-10503: bq. Do we want to treat GPUs and FPGAs like that? In other parts of the code, we have mem/vcore as primary resources, then an array of other resources. I believe the correct approach is to leave memroy and vcores as "first class" resources and then add on logic to add arbitrary extended resources, such as GPU or FPGA. The arbitrary extended resources should not be hardcoded values. The point is that we're doing the work right now to support GPUs. But in 2 years if some new resource needs to be tracked and used, we don't want to have to redo all of this work again. We should make sure that our work right here is extended to any future arbitrary resources > Support queue capacity in terms of absolute resources with custom > resourceType. > --- > > Key: YARN-10503 > URL: https://issues.apache.org/jira/browse/YARN-10503 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-10503.001.patch, YARN-10503.002.patch, > YARN-10503.003.patch > > > Now the absolute resources are memory and cores. > {code:java} > /** > * Different resource types supported. > */ > public enum AbsoluteResourceType { > MEMORY, VCORES; > }{code} > But in our GPU production clusters, we need to support more resourceTypes. > It's very import for cluster scaling when with different resourceType > absolute demands. > > This Jira will handle GPU first. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9618) NodeListManager event improvement
[ https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307413#comment-17307413 ] Eric Badger edited comment on YARN-9618 at 3/23/21, 8:52 PM: - bq. Actually, why we use an other async dispatcher here is try to make the rmDispatcher#eventQueue not boom to affect other event process. The boom will transformed to nodeListManagerDispatcher#eventQueue. I think [~gandras]'s point is that all of the events are going to go through {{rmDispatcher}} either way. Without the proposed change, {{rmDispatcher}} will get the event in the eventQueue and will also do the processing. With this proposed change, {{rmDispatcher}} will get the event and then it will copy it over to {{nodeListManagerDispatcher}}. Then {{nodeListManagerDispatcher}} will do the processing. But in both cases, {{rmDispatcher}} is dealing with {{RMAppNodeUpdateEvent}} in some way. So the question is whether copying the event or processing the event takes more time. If copying the event takes more time than processing the event, then this change only makes things worse. If processing the event takes more time than copying the event to the new async dispatcher, then this change makes sense and will remove some load on the {{rmDispatcher}}. [~gandras], is that right? was (Author: ebadger): bq. Actually, why we use an other async dispatcher here is try to make the rmDispatcher#eventQueue not boom to affect other event process. The boom will transformed to nodeListManagerDispatcher#eventQueue. I think [~gandras]'s point is that all of the events are going to go through {{rmDispatcher}} either way. Without the proposed change, {{rmDispatcher}} will get the event in the eventQueue and will also do the processing. With this proposed change, {{rmDispatcher}} will get the event and then it will copy it over to {{nodeListManagerDispatcher}}. Then {{nodeListManagerDispatcher}} will do the processing. But in both cases, {{rmDispatcher}} is dealing with {{RMAppNodeUpdateEvent}}s in some way. So the question is whether copying the event or processing the event takes more time. If copying the event takes more time than processing the event, then this change only makes things worse. If processing the event takes more time than copying the event to the new async dispatcher, then this change makes sense and will remove some load on the {{rmDispatcher}}. [~gandras], is that right? > NodeListManager event improvement > - > > Key: YARN-9618 > URL: https://issues.apache.org/jira/browse/YARN-9618 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin Chundatt >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-9618.001.patch, YARN-9618.002.patch, > YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch > > > Current implementation nodelistmanager event blocks async dispacher and can > cause RM crash and slowing down event processing. > # Cluster restart with 1K running apps . Each usable event will create 1K > events over all events could be 5k*1k events for 5K cluster > # Event processing is blocked till new events are added to queue. > Solution : > # Add another async Event handler similar to scheduler. > # Instead of adding events to dispatcher directly call RMApp event handler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9618) NodeListManager event improvement
[ https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307413#comment-17307413 ] Eric Badger commented on YARN-9618: --- bq. Actually, why we use an other async dispatcher here is try to make the rmDispatcher#eventQueue not boom to affect other event process. The boom will transformed to nodeListManagerDispatcher#eventQueue. I think [~gandras]'s point is that all of the events are going to go through {{rmDispatcher}} either way. Without the proposed change, {{rmDispatcher}} will get the event in the eventQueue and will also do the processing. With this proposed change, {{rmDispatcher}} will get the event and then it will copy it over to {{nodeListManagerDispatcher}}. Then {{nodeListManagerDispatcher}} will do the processing. But in both cases, {{rmDispatcher}} is dealing with {{RMAppNodeUpdateEvent}}s in some way. So the question is whether copying the event or processing the event takes more time. If copying the event takes more time than processing the event, then this change only makes things worse. If processing the event takes more time than copying the event to the new async dispatcher, then this change makes sense and will remove some load on the {{rmDispatcher}}. [~gandras], is that right? > NodeListManager event improvement > - > > Key: YARN-9618 > URL: https://issues.apache.org/jira/browse/YARN-9618 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin Chundatt >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-9618.001.patch, YARN-9618.002.patch, > YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch > > > Current implementation nodelistmanager event blocks async dispacher and can > cause RM crash and slowing down event processing. > # Cluster restart with 1K running apps . Each usable event will create 1K > events over all events could be 5k*1k events for 5K cluster > # Event processing is blocked till new events are added to queue. > Solution : > # Add another async Event handler similar to scheduler. > # Instead of adding events to dispatcher directly call RMApp event handler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU and other custom resources.
[ https://issues.apache.org/jira/browse/YARN-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307379#comment-17307379 ] Eric Badger commented on YARN-10704: I'm not very familiar with the new YARN UI v2. Will this change automatically apply to both UIs? Or do we need to add extra stuff for it to be supported in both? > The CS effective capacity for absolute mode in UI should support GPU and > other custom resources. > > > Key: YARN-10704 > URL: https://issues.apache.org/jira/browse/YARN-10704 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10704.001.patch, YARN-10704.002.patch, > YARN-10704.003.patch, image-2021-03-19-12-05-28-412.png, > image-2021-03-19-12-08-35-273.png > > > Actually there are no information about the effective capacity about GPU in > UI for absolute resource mode. > !image-2021-03-19-12-05-28-412.png|width=873,height=136! > But we have this information in QueueMetrics: > !image-2021-03-19-12-08-35-273.png|width=613,height=268! > > It's very important for our GPU users to use in absolute mode, there still > have nothing to know GPU absolute information in CS Queue UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307209#comment-17307209 ] Hadoop QA commented on YARN-10674: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 40s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 23s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 49s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 3s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 21m 43s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 2m 10s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 57s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 7s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green}{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 13 unchanged - 7 fixed = 13 total (was 20) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 45s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.0
[jira] [Comment Edited] (YARN-6538) Inter Queue preemption is not happening when DRF is configured
[ https://issues.apache.org/jira/browse/YARN-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307191#comment-17307191 ] Michael Zeoli edited comment on YARN-6538 at 3/23/21, 4:03 PM: --- Eric - thanks for the response and apologies for the absence. Currently we have not been able to reproduce outside of our particular pipeline, though we stopped in earnest once our platform vendor indicated they were able to reproduce with a purpose-built MR job (we are currently working the issue with them). I will try to get details. Essentially what we see is a single job (in lq1) with several thousand pending containers taking the entire cluster (expected, via dynamic allocation). When a second job enters lq2, it fails to receive executors despite having a guaranteed minimum capacity of 17% (approx 4 cores.. 28 * 0.95 * 0.17). On occasion it also fails to receive an AM. If a third job enters lq3 at this point, it also fails to receive executors. The jobs continue to starve until the first job begins attriting resources as pending containers fall to zero. YARN Resources (4 NM's, so 280 GiB / 28c total YARN resources) * yarn.nodemanager.resource.cpu-vcores = 7 * yarn.scheduler.maximum-allocation-vcores = 7 * yarn.nodemanager.resource.memory-mb = 70 GiB * yarn.scheduler.maximum-allocation-mb = 40 GiB Queue configuration (note that only lq1, lq2 and lq3 are used in the current tests) * root.default cap = 5% * root.tek cap = 95% * root.tek.lq1, .lq2, .lq3, .lq4 cap = 17% each * root.tek.lq5 .lq6 cap = 16% each For all lqN (leaf queues): * Minimum User Limit = 25% * User Limit Factor = 100 (intentionally set high to allow user to exceed queue capacity when idle capacity exists) * max cap = 100% * max AM res limit = 20% * inter / intra queue preemption: Enabled * ordering policy = Fair Spark config (this is our default spark config, though some of the spark jobs in the pipelines we're testing set executor mem and overhead mem higher to support more memory intensive work. Our work is memory constrained, and additional cores per executor have never yielded more optimal throughput). * spark.executor.cores=1 * spark.executor.memory=5G * spark.driver.memory=4G * spark.driver.maxResultSize=2G * spark.executor.memoryOverhead=1024 * spark.dynamicAllocation.enabled = true was (Author: novaboy): Eric - thanks for the response and apologies for the absence. Currently we have not been able to reproduce outside of our particular pipeline, though we stopped in earnest once our platform vendor indicated they were able to reproduce with a purpose-built MR job (we are currently working the issue with them). I will try to get details. Essentially what we see is a single job (in lq1) with several thousand pending containers taking the entire cluster (expected, via dynamic allocation). When a second job enters lq2, it fails to receive executors despite having a guaranteed minimum capacity of 17% (approx 4 cores.. 28 * 0.95 * 0.17). On occasion it also fails to receive an AM. If a third job enters lq3 at this point, it also fails to receive executors. The jobs continue to starve until the first job begins attriting resources as pending containers fall to zero. YARN Resources (4 NM's, so 280 GiB / 28c total YARN resources) * yarn.nodemanager.resource.cpu-vcores = 7 * yarn.scheduler.maximum-allocation-vcores = 7 * yarn.nodemanager.resource.memory-mb = 70 GiB * yarn.scheduler.maximum-allocation-mb = 40 GiB Queue configuration (note that only lq1, lq2 and lq3 are used in the current tests) * root.default cap = 5% * root.tek cap = 95% * root.tek.lq1, .lq2, .lq3, .lq4 cap = 17% each * root.tek.lq5 .lq6 cap = 16% each For all lqN (leaf queues): * Minimum User Limit = 25% * User Limit Factor = 100 (intentionally set high to allow user to exceed queue capacity when idle capacity exists) * max cap = 100% * max AM res limit = 20% * inter / intra queue preemption: Enabled * ordering policy = Fair Spark config * spark.executor.cores=1 * spark.executor.memory=5G * spark.driver.memory=4G * spark.driver.maxResultSize=2G * spark.executor.memoryOverhead=1024 * spark.dynamicAllocation.enabled = true > Inter Queue preemption is not happening when DRF is configured > -- > > Key: YARN-6538 > URL: https://issues.apache.org/jira/browse/YARN-6538 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, scheduler preemption >Affects Versions: 2.8.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Major > > Cluster capacity of . Here memory is more and vcores > are less. If applications have more demand, vcores might be exhausted
[jira] [Commented] (YARN-6538) Inter Queue preemption is not happening when DRF is configured
[ https://issues.apache.org/jira/browse/YARN-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307191#comment-17307191 ] Michael Zeoli commented on YARN-6538: - Eric - thanks for the response and apologies for the absence. Currently we have not been able to reproduce outside of our particular pipeline, though we stopped in earnest once our platform vendor indicated they were able to reproduce with a purpose-built MR job (we are currently working the issue with them). I will try to get details. Essentially what we see is a single job (in lq1) with several thousand pending containers taking the entire cluster (expected, via dynamic allocation). When a second job enters lq2, it fails to receive executors despite having a guaranteed minimum capacity of 17% (approx 4 cores.. 28 * 0.95 * 0.17). On occasion it also fails to receive an AM. If a third job enters lq3 at this point, it also fails to receive executors. The jobs continue to starve until the first job begins attriting resources as pending containers fall to zero. YARN Resources (4 NM's, so 280 GiB / 28c total YARN resources) * yarn.nodemanager.resource.cpu-vcores = 7 * yarn.scheduler.maximum-allocation-vcores = 7 * yarn.nodemanager.resource.memory-mb = 70 GiB * yarn.scheduler.maximum-allocation-mb = 40 GiB Queue configuration (note that only lq1, lq2 and lq3 are used in the current tests) * root.default cap = 5% * root.tek cap = 95% * root.tek.lq1, .lq2, .lq3, .lq4 cap = 17% each * root.tek.lq5 .lq6 cap = 16% each For all lqN (leaf queues): * Minimum User Limit = 25% * User Limit Factor = 100 (intentionally set high to allow user to exceed queue capacity when idle capacity exists) * max cap = 100% * max AM res limit = 20% * inter / intra queue preemption: Enabled * ordering policy = Fair Spark config * spark.executor.cores=1 * spark.executor.memory=5G * spark.driver.memory=4G * spark.driver.maxResultSize=2G * spark.executor.memoryOverhead=1024 * spark.dynamicAllocation.enabled = true > Inter Queue preemption is not happening when DRF is configured > -- > > Key: YARN-6538 > URL: https://issues.apache.org/jira/browse/YARN-6538 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, scheduler preemption >Affects Versions: 2.8.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Major > > Cluster capacity of . Here memory is more and vcores > are less. If applications have more demand, vcores might be exhausted. > Inter queue preemption ideally has to be kicked in once vcores is over > utilized. However preemption is not happening. > Analysis: > In {{AbstractPreemptableResourceCalculator.computeFixpointAllocation}}, > {code} > // assign all cluster resources until no more demand, or no resources are > // left > while (!orderedByNeed.isEmpty() && Resources.greaterThan(rc, totGuarant, > unassigned, Resources.none())) { > {code} > will loop even when vcores are 0 (because memory is still +ve). Hence we are > having more vcores in idealAssigned which cause no-preemption cases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10697) Resources are displayed in bytes in UI for schedulers other than capacity
[ https://issues.apache.org/jira/browse/YARN-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307114#comment-17307114 ] Jim Brennan commented on YARN-10697: Thanks for the update [~BilwaST]! I am +1 on patch 003. [~epayne]. [~jhung], if there are no objections I will commit this later today. > Resources are displayed in bytes in UI for schedulers other than capacity > - > > Key: YARN-10697 > URL: https://issues.apache.org/jira/browse/YARN-10697 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-10697.001.patch, YARN-10697.002.patch, > YARN-10697.003.patch, image-2021-03-17-11-30-57-216.png > > > Resources.newInstance expects MB as memory whereas in MetricsOverviewTable > passes resources in bytes . Also we should display memory in GB for better > readability for user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307061#comment-17307061 ] Qi Zhu commented on YARN-10674: --- Thanks [~gandras] for update and discuss with [~pbacsko]. It make sense to me. I have updated in latest patch.:D > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch, > YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, > YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, > YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, > YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, > YARN-10674.015.patch, YARN-10674.016.patch, YARN-10674.017.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu updated YARN-10674: -- Attachment: YARN-10674.017.patch > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch, > YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, > YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, > YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, > YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, > YARN-10674.015.patch, YARN-10674.016.patch, YARN-10674.017.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306981#comment-17306981 ] Andras Gyori commented on YARN-10674: - Thank you [~zhuqi] for the patch. Sorry for coming up with this, but I think I did not explain what I had in mind well. My suggestion is the following: * Change the description of DISABLE_PREEMPTION to state that enabled is the default: {code:java} DISABLE_PREEMPTION("disable preemption", "dp", "disable-preemption", "Disable the preemption with nopolicy or observeonly mode. " + "Preemption is enabled by default. " + "nopolicy removes ProportionalCapacityPreemptionPolicy from " + "the list of monitor policies, " + "observeonly sets " + "yarn.resourcemanager.monitor.capacity.preemption.observe_only " + "to true.", true), {code} * Change PreemptionMode to include the ENABLED variant (the fromString could throw an exception on illegal string). You do not need the private boolean enabled field, because we have the ENABLED variant for this. {code:java} public enum PreemptionMode { ENABLED("enabled"), NO_POLICY("nopolicy"), OBSERVE_ONLY("observeonly"); private String cliOption; PreemptionMode(String cliOption) { this.cliOption = cliOption; } public String getCliOption() { return cliOption; } public static PreemptionMode fromString(String cliOption) { if (cliOption.trim(). equals(PreemptionMode.OBSERVE_ONLY.getCliOption())) { return PreemptionMode.OBSERVE_ONLY; } else if (cliOption.trim(). equals(PreemptionMode.NO_POLICY.getCliOption())) { return PreemptionMode.NO_POLICY; } else { return PreemptionMode.ENABLED; } } } {code} * You could then simplify emitDisablePreemptionForObserveOnlyMode as because PreemptionMode has an ENABLED variant, therefore the OBSERVE_ONLY variant already means, that the preemption is not enabled: {code:java} private void emitDisablePreemptionForObserveOnlyMode() { if (preemptionMode == FSConfigToCSConfigConverterParams .PreemptionMode.OBSERVE_ONLY) { capacitySchedulerConfig. setBoolean(CapacitySchedulerConfiguration. PREEMPTION_OBSERVE_ONLY, true); } } {code} * The same applies for convertSiteProperties: {code:java} if (preemptionMode == FSConfigToCSConfigConverterParams.PreemptionMode.NO_POLICY) { yarnSiteConfig.set(YarnConfiguration.RM_SCHEDULER_MONITOR_POLICIES, ""); } {code} > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch, > YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, > YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, > YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, > YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, > YARN-10674.015.patch, YARN-10674.016.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10697) Resources are displayed in bytes in UI for schedulers other than capacity
[ https://issues.apache.org/jira/browse/YARN-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306970#comment-17306970 ] Hadoop QA commented on YARN-10697: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 25s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 48s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 47s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 0s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 27s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 40s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 50s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 13s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 32s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 26m 10s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 4m 3s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 31s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 54s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 54s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 49s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 49s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 35s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 46s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 58
[jira] [Updated] (YARN-10518) Add metrics for custom resource types in NodeManagerMetrics
[ https://issues.apache.org/jira/browse/YARN-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10518: Description: This Jira deals with updating NodeManager metrics with custom resource types. It includes allocated, available resources. (was: This Jira deals with updating NodeManager metrics with custom resource types. It includes allocated, available and total resources.) > Add metrics for custom resource types in NodeManagerMetrics > > > Key: YARN-10518 > URL: https://issues.apache.org/jira/browse/YARN-10518 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-10518.v1.patch > > > This Jira deals with updating NodeManager metrics with custom resource types. > It includes allocated, available resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10518) Add metrics for custom resource types in NodeManagerMetrics
[ https://issues.apache.org/jira/browse/YARN-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306968#comment-17306968 ] Hadoop QA commented on YARN-10518: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 44s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 51s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 6s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 18s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 46s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 42s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 48s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 24s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 26m 13s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 3m 43s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 15s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 23s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 23s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 36s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 36s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 37s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 38s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 8
[jira] [Commented] (YARN-10708) Remove NULL check before instanceof
[ https://issues.apache.org/jira/browse/YARN-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306962#comment-17306962 ] Steve Loughran commented on YARN-10708: --- FWIW there's some really good instanceof enhancements of in java ; it'll be time to do another refresh then too > Remove NULL check before instanceof > --- > > Key: YARN-10708 > URL: https://issues.apache.org/jira/browse/YARN-10708 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Jiajun Jiang >Priority: Minor > Labels: pull-request-available > Attachments: YARN-10708.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Submitted patch to remove the NULL check before instanceof check in several > classes. Same issue with YARN-9340. > Classes involved. > * M > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetAllResourceProfilesResponse.java > * M > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetAllResourceTypeInfoResponse.java > * M > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetResourceProfileRequest.java > * M > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetResourceProfileResponse.java > * M > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/LightWeightResource.java > * M > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Log4jWarningErrorMetricsAppender.java > * M > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/volume/csi/VolumeId.java > * M > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/PrivilegedOperation.java > * M > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/deviceframework/AssignedDevice.java > * M > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java > * M > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java > * M > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/runtime/ContainerRuntimeContext.java > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10711) Make CSQueueMetrics configured related field to support nodelabel.
Qi Zhu created YARN-10711: - Summary: Make CSQueueMetrics configured related field to support nodelabel. Key: YARN-10711 URL: https://issues.apache.org/jira/browse/YARN-10711 Project: Hadoop YARN Issue Type: Improvement Reporter: Qi Zhu Assignee: Qi Zhu {code:java} // Update configured capacity/max-capacity for default partition only CSQueueUtils.updateConfiguredCapacityMetrics(resourceCalculator, labelManager.getResourceByLabel(null, clusterResource), RMNodeLabelsManager.NO_LABEL, this); {code} Now configured capacity/max-capacity only support default partition. We should support nodelabel. cc [~pbacsko] [~gandras] [~ebadger] [~Jim_Brennan] [~epayne] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10518) Add metrics for custom resource types in NodeManagerMetrics
[ https://issues.apache.org/jira/browse/YARN-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10518: Attachment: YARN-10518.v1.patch > Add metrics for custom resource types in NodeManagerMetrics > > > Key: YARN-10518 > URL: https://issues.apache.org/jira/browse/YARN-10518 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-10518.v1.patch > > > This Jira deals with updating NodeManager metrics with custom resource types. > It includes allocated, available and total resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10697) Resources are displayed in bytes in UI for schedulers other than capacity
[ https://issues.apache.org/jira/browse/YARN-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306824#comment-17306824 ] Bilwa S T commented on YARN-10697: -- [~Jim_Brennan] I have changed method name. Please check updated patch. Thanks > Resources are displayed in bytes in UI for schedulers other than capacity > - > > Key: YARN-10697 > URL: https://issues.apache.org/jira/browse/YARN-10697 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-10697.001.patch, YARN-10697.002.patch, > YARN-10697.003.patch, image-2021-03-17-11-30-57-216.png > > > Resources.newInstance expects MB as memory whereas in MetricsOverviewTable > passes resources in bytes . Also we should display memory in GB for better > readability for user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10697) Resources are displayed in bytes in UI for schedulers other than capacity
[ https://issues.apache.org/jira/browse/YARN-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bilwa S T updated YARN-10697: - Attachment: YARN-10697.003.patch > Resources are displayed in bytes in UI for schedulers other than capacity > - > > Key: YARN-10697 > URL: https://issues.apache.org/jira/browse/YARN-10697 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-10697.001.patch, YARN-10697.002.patch, > YARN-10697.003.patch, image-2021-03-17-11-30-57-216.png > > > Resources.newInstance expects MB as memory whereas in MetricsOverviewTable > passes resources in bytes . Also we should display memory in GB for better > readability for user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org