date:20210323

[jira] [Updated] (YARN-10683) Add total resource in NodeManager metrics

2021-03-23 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10683:

Attachment: YARN-10683.v1.patch

> Add total resource in NodeManager metrics
> -
>
> Key: YARN-10683
> URL: https://issues.apache.org/jira/browse/YARN-10683
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Minor
> Attachments: YARN-10683.v1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.

2021-03-23 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307585#comment-17307585
 ] 

Hadoop QA commented on YARN-10503:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
24s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
42s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 53s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 19m 
59s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
49s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green}{color} | {color:green} The patch has no ill-formed 
XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  7s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-1

[jira] [Comment Edited] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU and other custom resources.

2021-03-23 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307540#comment-17307540
 ] 

Qi Zhu edited comment on YARN-10704 at 3/24/21, 3:54 AM:
-

Thanks [~ebadger] for review.

The UI v2 has not effective absolute related, i think it should be handled in 
ui v1 first.

Thanks.


was (Author: zhuqi):
Thanks [~ebadger] for review.

The UI v2 has not effective absolute related, i think it should be handled in 
ui v1 first.

Thanks.

> The CS effective capacity for absolute mode in UI should support GPU and 
> other custom resources.
> 
>
> Key: YARN-10704
> URL: https://issues.apache.org/jira/browse/YARN-10704
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10704.001.patch, YARN-10704.002.patch, 
> YARN-10704.003.patch, image-2021-03-19-12-05-28-412.png, 
> image-2021-03-19-12-08-35-273.png
>
>
> Actually there are no information about the effective capacity about GPU in 
> UI for absolute resource mode.
> !image-2021-03-19-12-05-28-412.png|width=873,height=136!
> But we have this information in QueueMetrics:
> !image-2021-03-19-12-08-35-273.png|width=613,height=268!
>  
> It's very important for our GPU users to use in absolute mode, there still 
> have nothing to know GPU absolute information in CS Queue UI. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU and other custom resources.

2021-03-23 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307540#comment-17307540
 ] 

Qi Zhu commented on YARN-10704:
---

Thanks [~ebadger] for review.

The UI v2 has not effective absolute related, i think it should be handled in 
ui v1 first.

Thanks.

> The CS effective capacity for absolute mode in UI should support GPU and 
> other custom resources.
> 
>
> Key: YARN-10704
> URL: https://issues.apache.org/jira/browse/YARN-10704
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10704.001.patch, YARN-10704.002.patch, 
> YARN-10704.003.patch, image-2021-03-19-12-05-28-412.png, 
> image-2021-03-19-12-08-35-273.png
>
>
> Actually there are no information about the effective capacity about GPU in 
> UI for absolute resource mode.
> !image-2021-03-19-12-05-28-412.png|width=873,height=136!
> But we have this information in QueueMetrics:
> !image-2021-03-19-12-08-35-273.png|width=613,height=268!
>  
> It's very important for our GPU users to use in absolute mode, there still 
> have nothing to know GPU absolute information in CS Queue UI. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.

2021-03-23 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307529#comment-17307529
 ] 

Qi Zhu edited comment on YARN-10503 at 3/24/21, 3:35 AM:
-

Thanks [~ebadger] for review.

[~pbacsko] Your suggestion is valid, it make sense to me.

I have updated in latest patch consistent with YARN-9936 description:
 * In an absolute amount of resources ( e.g. memory=4GB,vcores=20,yarn.io/gpu=4 
). The amount of all resources in the queues has to be less than or equal to 
all resources in the cluster.{color:#de350b}Actually, the above is not 
supported, we only support memory and vcores now in absolute mode, we should 
extend in {color}YARN-10503.

[~pbacsko] [~gandras]  [~ebadger]

If you any other advice about this?

Thanks.

 


was (Author: zhuqi):
Thanks [~ebadger] for review.

[~pbacsko] Your suggestion is valid, it make sense to me.

I have updated in latest patch consistent with YARN-9936 description:
 * In an absolute amount of resources ( e.g. memory=4GB,vcores=20,yarn.io/gpu=4 
). The amount of all resources in the queues has to be less than or equal to 
all resources in the cluster.{color:#de350b}Actually, the above is not 
supported, we only support memory and vcores now in absolute mode, we should 
extend in {color}YARN-10503.

[~pbacsko] [~gandras]  [~ebadger]

If you any other advice about this?

 

> Support queue capacity in terms of absolute resources with custom 
> resourceType.
> ---
>
> Key: YARN-10503
> URL: https://issues.apache.org/jira/browse/YARN-10503
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10503.001.patch, YARN-10503.002.patch, 
> YARN-10503.003.patch, YARN-10503.004.patch
>
>
> Now the absolute resources are memory and cores.
> {code:java}
> /**
>  * Different resource types supported.
>  */
> public enum AbsoluteResourceType {
>   MEMORY, VCORES;
> }{code}
> But in our GPU production clusters, we need to support more resourceTypes.
> It's very import for cluster scaling when with different resourceType 
> absolute demands.
>  
> This Jira will handle GPU first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.

2021-03-23 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307529#comment-17307529
 ] 

Qi Zhu commented on YARN-10503:
---

Thanks [~ebadger] for review.

[~pbacsko] Your suggestion is valid, it make sense to me.

I have updated in latest patch consistent with YARN-9936 description:
 * In an absolute amount of resources ( e.g. memory=4GB,vcores=20,yarn.io/gpu=4 
). The amount of all resources in the queues has to be less than or equal to 
all resources in the cluster.{color:#de350b}Actually, the above is not 
supported, we only support memory and vcores now in absolute mode, we should 
extend in {color}YARN-10503.

[~pbacsko] [~gandras]  [~ebadger]

If you any other advice about this?

 

> Support queue capacity in terms of absolute resources with custom 
> resourceType.
> ---
>
> Key: YARN-10503
> URL: https://issues.apache.org/jira/browse/YARN-10503
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10503.001.patch, YARN-10503.002.patch, 
> YARN-10503.003.patch, YARN-10503.004.patch
>
>
> Now the absolute resources are memory and cores.
> {code:java}
> /**
>  * Different resource types supported.
>  */
> public enum AbsoluteResourceType {
>   MEMORY, VCORES;
> }{code}
> But in our GPU production clusters, we need to support more resourceTypes.
> It's very import for cluster scaling when with different resourceType 
> absolute demands.
>  
> This Jira will handle GPU first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.

2021-03-23 Thread Qi Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10503:
--
Attachment: YARN-10503.004.patch

> Support queue capacity in terms of absolute resources with custom 
> resourceType.
> ---
>
> Key: YARN-10503
> URL: https://issues.apache.org/jira/browse/YARN-10503
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10503.001.patch, YARN-10503.002.patch, 
> YARN-10503.003.patch, YARN-10503.004.patch
>
>
> Now the absolute resources are memory and cores.
> {code:java}
> /**
>  * Different resource types supported.
>  */
> public enum AbsoluteResourceType {
>   MEMORY, VCORES;
> }{code}
> But in our GPU production clusters, we need to support more resourceTypes.
> It's very import for cluster scaling when with different resourceType 
> absolute demands.
>  
> This Jira will handle GPU first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10517) QueueMetrics has incorrect Allocated Resource when labelled partitions updated

2021-03-23 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307509#comment-17307509
 ] 

Qi Zhu commented on YARN-10517:
---

Thanks [~ebadger] for review.

[~pbacsko] [~epayne]

Could you help review it when you are free. 

Thanks.

> QueueMetrics has incorrect Allocated Resource when labelled partitions updated
> --
>
> Key: YARN-10517
> URL: https://issues.apache.org/jira/browse/YARN-10517
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0, 3.3.0
>Reporter: sibyl.lv
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10517-branch-3.2.001.patch, YARN-10517.001.patch, 
> wrong metrics.png
>
>
> After https://issues.apache.org/jira/browse/YARN-9596, QueueMetrics still has 
> incorrect allocated jmx, such as  {color:#660e7a}allocatedMB, 
> {color}{color:#660e7a}allocatedVCores and 
> {color}{color:#660e7a}allocatedContainers, {color}when the node partition is 
> updated from "DEFAULT" to other label and there are  running applications.
> Steps to reproduce
> ==
>  # Configure capacity-scheduler.xml with label configuration
>  # Submit one application to default partition and run
>  # Add label "tpcds" to cluster and replace label on node1 and node2 to be 
> "tpcds" when the above application is running
>  # Note down "VCores Used" at Web UI
>  # When the application is finished, the metrics get wrong (screenshots 
> attached).
> ==
>  
> FiCaSchedulerApp doesn't update queue metrics when CapacityScheduler handles 
> this event {color:#660e7a}NODE_LABELS_UPDATE.{color}
> So we should release container resource from old partition and add used 
> resource to new partition, just as updating queueUsage.
> {code:java}
> // code placeholder
> public void nodePartitionUpdated(RMContainer rmContainer, String oldPartition,
> String newPartition) {
>   Resource containerResource = rmContainer.getAllocatedResource();
>   this.attemptResourceUsage.decUsed(oldPartition, containerResource);
>   this.attemptResourceUsage.incUsed(newPartition, containerResource);
>   getCSLeafQueue().decUsedResource(oldPartition, containerResource, this);
>   getCSLeafQueue().incUsedResource(newPartition, containerResource, this);
>   // Update new partition name if container is AM and also update AM resource
>   if (rmContainer.isAMContainer()) {
> setAppAMNodePartitionName(newPartition);
> this.attemptResourceUsage.decAMUsed(oldPartition, containerResource);
> this.attemptResourceUsage.incAMUsed(newPartition, containerResource);
> getCSLeafQueue().decAMUsedResource(oldPartition, containerResource, this);
> getCSLeafQueue().incAMUsedResource(newPartition, containerResource, this);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10493) RunC container repository v2

2021-03-23 Thread Eric Badger (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307481#comment-17307481
 ] 

Eric Badger commented on YARN-10493:


Additionally, I've run into some issues while testing.

{noformat:title=CLI Invocation}
hadoop jar ./hadoop-tools/hadoop-runc/target/hadoop-runc-3.4.0-SNAPSHOT.jar  
org.apache.hadoop.runc.tools.ImportDockerImage -r docker.foobar.com: 
hadoop-images/hadoop/rhel7 hadoop/rhel7
{noformat}

{noformat}
[ebadger@foo hadoop]$ hadoop fs -ls 
/runc-root/meta/hadoop/rhel7@latest.properties
-rw---  10 ebadger supergroup236 2021-03-24 00:15 
/runc-root/meta/hadoop/rhel7@latest.properties
{noformat}
Here's the properties file after the CLI tool completes.

{noformat}
  

yarn.nodemanager.runtime.linux.runc.image-tag-to-manifest-plugin

org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.runc.ImageTagToManifestV2Plugin
  

  

yarn.nodemanager.runtime.linux.runc.manifest-to-resources-plugin

org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.runc.HdfsManifestToResourcesV2Plugin
  
{noformat}
Then I set these properties as well as adding {{runc}} to the allowed-runtimes 
config.

{noformat}
export 
vars="YARN_CONTAINER_RUNTIME_TYPE=runc,YARN_CONTAINER_RUNTIME_RUNC_IMAGE=hadoop/rhel7";
 $HADOOP_HOME/bin/hadoop jar 
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.*-tests.jar
 sleep -Dyarn.app.mapreduce.am.env="HADOOP_MAPRED_HOME=$HADOOP_HOME" 
-Dmapreduce.admin.user.env="HADOOP_MAPRED_HOME=$HADOOP_HOME" 
-Dyarn.app.mapreduce.am.env=$vars -Dmapreduce.map.env=$vars 
-Dmapreduce.reduce.env=$vars -mt 1 -rt 1 -m 1 -r 1
{noformat}
I ran a sleep job using this command.

{noformat}
2021-03-24 00:26:07,823 DEBUG [NM ContainerManager dispatcher] 
runc.ImageTagToManifestV2Plugin 
(ImageTagToManifestV2Plugin.java:getHdfsImageToHashReader(144)) - Checking HDFS 
for image file: /runc-root/meta/library/hadoop/rhel7@latest.properties
2021-03-24 00:26:07,825 WARN  [NM ContainerManager dispatcher] 
runc.ImageTagToManifestV2Plugin 
(ImageTagToManifestV2Plugin.java:getHdfsImageToHashReader(148)) - Did not load 
the hdfs image to hash properties file, file doesn't exist
2021-03-24 00:26:07,828 WARN  [NM ContainerManager dispatcher] 
container.ContainerImpl (ContainerImpl.java:transition(1261)) - Failed to parse 
resource-request
java.io.FileNotFoundException: File does not exist: 
/runc-root/manifest/ha/hadoop/rhel7
{noformat}
Then I got this error in the NM when it was trying to resolve the tag. It added 
the default {{metaNamespaceDir}} (which is library) into the path when looking 
for the properties file. But when the CLI tool ran, it didn't add the 
{{metaNamespaceDir}}. I didn't have the config set in my configs at all, so the 
NM was using the conf default. 

I'm not sure if I did anything wrong here or not, but it seems inconsistent to 
me. Let me know what you think

> RunC container repository v2
> 
>
> Key: YARN-10493
> URL: https://issues.apache.org/jira/browse/YARN-10493
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, yarn
>Affects Versions: 3.3.0
>Reporter: Craig Condit
>Assignee: Matthew Sharp
>Priority: Major
>  Labels: pull-request-available
> Attachments: runc-container-repository-v2-design.pdf
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current runc container repository design has scalability and usability 
> issues which will likely limit widespread adoption. We should address this 
> with a new, V2 layout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10493) RunC container repository v2

2021-03-23 Thread Eric Badger (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307467#comment-17307467
 ] 

Eric Badger commented on YARN-10493:


[~MatthewSharp], thanks for the PR. Just starting to take a look at this now. I 
am wondering if the document is still up to date though. Is the PR you put up 
still a good reflection of what's in the document? Just want to make sure

> RunC container repository v2
> 
>
> Key: YARN-10493
> URL: https://issues.apache.org/jira/browse/YARN-10493
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, yarn
>Affects Versions: 3.3.0
>Reporter: Craig Condit
>Assignee: Matthew Sharp
>Priority: Major
>  Labels: pull-request-available
> Attachments: runc-container-repository-v2-design.pdf
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current runc container repository design has scalability and usability 
> issues which will likely limit widespread adoption. We should address this 
> with a new, V2 layout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10517) QueueMetrics has incorrect Allocated Resource when labelled partitions updated

2021-03-23 Thread Eric Badger (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307457#comment-17307457
 ] 

Eric Badger commented on YARN-10517:


[~epayne], this change looks reasonable to me, but I'd like to get an extra 
pair of eyes on it as it has to do with scheduler internals

> QueueMetrics has incorrect Allocated Resource when labelled partitions updated
> --
>
> Key: YARN-10517
> URL: https://issues.apache.org/jira/browse/YARN-10517
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0, 3.3.0
>Reporter: sibyl.lv
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10517-branch-3.2.001.patch, YARN-10517.001.patch, 
> wrong metrics.png
>
>
> After https://issues.apache.org/jira/browse/YARN-9596, QueueMetrics still has 
> incorrect allocated jmx, such as  {color:#660e7a}allocatedMB, 
> {color}{color:#660e7a}allocatedVCores and 
> {color}{color:#660e7a}allocatedContainers, {color}when the node partition is 
> updated from "DEFAULT" to other label and there are  running applications.
> Steps to reproduce
> ==
>  # Configure capacity-scheduler.xml with label configuration
>  # Submit one application to default partition and run
>  # Add label "tpcds" to cluster and replace label on node1 and node2 to be 
> "tpcds" when the above application is running
>  # Note down "VCores Used" at Web UI
>  # When the application is finished, the metrics get wrong (screenshots 
> attached).
> ==
>  
> FiCaSchedulerApp doesn't update queue metrics when CapacityScheduler handles 
> this event {color:#660e7a}NODE_LABELS_UPDATE.{color}
> So we should release container resource from old partition and add used 
> resource to new partition, just as updating queueUsage.
> {code:java}
> // code placeholder
> public void nodePartitionUpdated(RMContainer rmContainer, String oldPartition,
> String newPartition) {
>   Resource containerResource = rmContainer.getAllocatedResource();
>   this.attemptResourceUsage.decUsed(oldPartition, containerResource);
>   this.attemptResourceUsage.incUsed(newPartition, containerResource);
>   getCSLeafQueue().decUsedResource(oldPartition, containerResource, this);
>   getCSLeafQueue().incUsedResource(newPartition, containerResource, this);
>   // Update new partition name if container is AM and also update AM resource
>   if (rmContainer.isAMContainer()) {
> setAppAMNodePartitionName(newPartition);
> this.attemptResourceUsage.decAMUsed(oldPartition, containerResource);
> this.attemptResourceUsage.incAMUsed(newPartition, containerResource);
> getCSLeafQueue().decAMUsedResource(oldPartition, containerResource, this);
> getCSLeafQueue().incAMUsedResource(newPartition, containerResource, this);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10707) Support gpu in ResourceUtilization, and update Node GPU Utilization to use.

2021-03-23 Thread Eric Badger (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307427#comment-17307427
 ] 

Eric Badger commented on YARN-10707:


Similar to my 
[comment|https://issues.apache.org/jira/browse/YARN-10503?focusedCommentId=17307421&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17307421]
 on YARN-10503, I believe that the approach we should take here should allow 
for arbitrary resources, not hardcoded for GPUs. It's a lot of work to make 
GPUs a first class resource, but should only be a little more work in addition 
to make arbitrary resources (which can include GPUs) a first class resource.

> Support gpu in ResourceUtilization, and update Node GPU Utilization to use.
> ---
>
> Key: YARN-10707
> URL: https://issues.apache.org/jira/browse/YARN-10707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10707.001.patch, YARN-10707.002.patch, 
> YARN-10707.003.patch
>
>
> Support gpu in ResourceUtilization, and update Node GPU Utilization to use 
> first.
> It will be very helpful for other use cases about GPU utilization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.

2021-03-23 Thread Eric Badger (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307421#comment-17307421
 ] 

Eric Badger commented on YARN-10503:


bq. Do we want to treat GPUs and FPGAs like that? In other parts of the code, 
we have mem/vcore as primary resources, then an array of other resources. 
I believe the correct approach is to leave memroy and vcores as "first class" 
resources and then add on logic to add arbitrary extended resources, such as 
GPU or FPGA. The arbitrary extended resources should not be hardcoded values. 
The point is that we're doing the work right now to support GPUs. But in 2 
years if some new resource needs to be tracked and used, we don't want to have 
to redo all of this work again. We should make sure that our work right here is 
extended to any future arbitrary resources

> Support queue capacity in terms of absolute resources with custom 
> resourceType.
> ---
>
> Key: YARN-10503
> URL: https://issues.apache.org/jira/browse/YARN-10503
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10503.001.patch, YARN-10503.002.patch, 
> YARN-10503.003.patch
>
>
> Now the absolute resources are memory and cores.
> {code:java}
> /**
>  * Different resource types supported.
>  */
> public enum AbsoluteResourceType {
>   MEMORY, VCORES;
> }{code}
> But in our GPU production clusters, we need to support more resourceTypes.
> It's very import for cluster scaling when with different resourceType 
> absolute demands.
>  
> This Jira will handle GPU first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-9618) NodeListManager event improvement

2021-03-23 Thread Eric Badger (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307413#comment-17307413
 ] 

Eric Badger edited comment on YARN-9618 at 3/23/21, 8:52 PM:
-

bq. Actually, why we use an other async dispatcher here is try to make the 
rmDispatcher#eventQueue not boom to affect other event process. The boom will 
transformed to nodeListManagerDispatcher#eventQueue.
I think [~gandras]'s point is that all of the events are going to go through 
{{rmDispatcher}} either way. Without the proposed change, {{rmDispatcher}} will 
get the event in the eventQueue and will also do the processing. With this 
proposed change, {{rmDispatcher}} will get the event and then it will copy it 
over to {{nodeListManagerDispatcher}}. Then {{nodeListManagerDispatcher}} will 
do the processing. But in both cases, {{rmDispatcher}} is dealing with 
{{RMAppNodeUpdateEvent}} in some way. 

So the question is whether copying the event or processing the event takes more 
time. If copying the event takes more time than processing the event, then this 
change only makes things worse. If processing the event takes more time than 
copying the event to the new async dispatcher, then this change makes sense and 
will remove some load on the {{rmDispatcher}}.

[~gandras], is that right?


was (Author: ebadger):
bq. Actually, why we use an other async dispatcher here is try to make the 
rmDispatcher#eventQueue not boom to affect other event process. The boom will 
transformed to nodeListManagerDispatcher#eventQueue.
I think [~gandras]'s point is that all of the events are going to go through 
{{rmDispatcher}} either way. Without the proposed change, {{rmDispatcher}} will 
get the event in the eventQueue and will also do the processing. With this 
proposed change, {{rmDispatcher}} will get the event and then it will copy it 
over to {{nodeListManagerDispatcher}}. Then {{nodeListManagerDispatcher}} will 
do the processing. But in both cases, {{rmDispatcher}} is dealing with 
{{RMAppNodeUpdateEvent}}s in some way. 

So the question is whether copying the event or processing the event takes more 
time. If copying the event takes more time than processing the event, then this 
change only makes things worse. If processing the event takes more time than 
copying the event to the new async dispatcher, then this change makes sense and 
will remove some load on the {{rmDispatcher}}.

[~gandras], is that right?

> NodeListManager event improvement
> -
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-9618.001.patch, YARN-9618.002.patch, 
> YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9618) NodeListManager event improvement

2021-03-23 Thread Eric Badger (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307413#comment-17307413
 ] 

Eric Badger commented on YARN-9618:
---

bq. Actually, why we use an other async dispatcher here is try to make the 
rmDispatcher#eventQueue not boom to affect other event process. The boom will 
transformed to nodeListManagerDispatcher#eventQueue.
I think [~gandras]'s point is that all of the events are going to go through 
{{rmDispatcher}} either way. Without the proposed change, {{rmDispatcher}} will 
get the event in the eventQueue and will also do the processing. With this 
proposed change, {{rmDispatcher}} will get the event and then it will copy it 
over to {{nodeListManagerDispatcher}}. Then {{nodeListManagerDispatcher}} will 
do the processing. But in both cases, {{rmDispatcher}} is dealing with 
{{RMAppNodeUpdateEvent}}s in some way. 

So the question is whether copying the event or processing the event takes more 
time. If copying the event takes more time than processing the event, then this 
change only makes things worse. If processing the event takes more time than 
copying the event to the new async dispatcher, then this change makes sense and 
will remove some load on the {{rmDispatcher}}.

[~gandras], is that right?

> NodeListManager event improvement
> -
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-9618.001.patch, YARN-9618.002.patch, 
> YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU and other custom resources.

2021-03-23 Thread Eric Badger (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307379#comment-17307379
 ] 

Eric Badger commented on YARN-10704:


I'm not very familiar with the new YARN UI v2. Will this change automatically 
apply to both UIs? Or do we need to add extra stuff for it to be supported in 
both?

> The CS effective capacity for absolute mode in UI should support GPU and 
> other custom resources.
> 
>
> Key: YARN-10704
> URL: https://issues.apache.org/jira/browse/YARN-10704
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10704.001.patch, YARN-10704.002.patch, 
> YARN-10704.003.patch, image-2021-03-19-12-05-28-412.png, 
> image-2021-03-19-12-08-35-273.png
>
>
> Actually there are no information about the effective capacity about GPU in 
> UI for absolute resource mode.
> !image-2021-03-19-12-05-28-412.png|width=873,height=136!
> But we have this information in QueueMetrics:
> !image-2021-03-19-12-08-35-273.png|width=613,height=268!
>  
> It's very important for our GPU users to use in absolute mode, there still 
> have nothing to know GPU absolute information in CS Queue UI. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-23 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307209#comment-17307209
 ] 

Hadoop QA commented on YARN-10674:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
40s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
23s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
9s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m  3s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 21m 
43s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  2m 
10s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
57s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
7s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green}{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 13 unchanged - 7 fixed = 13 total (was 20) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 45s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.0

[jira] [Comment Edited] (YARN-6538) Inter Queue preemption is not happening when DRF is configured

2021-03-23 Thread Michael Zeoli (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307191#comment-17307191
 ] 

Michael Zeoli edited comment on YARN-6538 at 3/23/21, 4:03 PM:
---

Eric - thanks for the response and apologies for the absence.  Currently we 
have not been able to reproduce outside of our particular pipeline, though we 
stopped in earnest once our platform vendor indicated they were able to 
reproduce with a purpose-built MR job  (we are currently working the issue with 
them).  I will try to get details.

Essentially what we see is a single job (in lq1) with several thousand pending 
containers taking the entire cluster (expected, via dynamic allocation).  When 
a second job enters lq2, it fails to receive executors despite having a 
guaranteed minimum capacity of 17% (approx 4 cores..   28 * 0.95 * 0.17).  On 
occasion it also fails to receive an AM.  If a third job enters lq3 at this 
point, it also fails to receive executors.  The jobs continue to starve until 
the first job begins attriting resources as pending containers fall to zero.   

 

YARN Resources  (4 NM's, so 280 GiB / 28c total YARN resources)
 * yarn.nodemanager.resource.cpu-vcores = 7
 * yarn.scheduler.maximum-allocation-vcores = 7
 * yarn.nodemanager.resource.memory-mb = 70 GiB
 * yarn.scheduler.maximum-allocation-mb = 40 GiB

 
  
 Queue configuration  (note that only lq1, lq2 and lq3 are used in the current 
tests)
 * root.default cap = 5%
 * root.tek cap = 95%
 * root.tek.lq1, .lq2, .lq3, .lq4 cap = 17% each
 * root.tek.lq5 .lq6 cap = 16% each

 

For all lqN (leaf queues):  
 * Minimum User Limit = 25%
 * User Limit Factor = 100  (intentionally set high to allow user to exceed 
queue capacity when idle capacity exists)
 * max cap = 100%
 * max AM res limit = 20%
 * inter / intra queue preemption: Enabled
 * ordering policy = Fair

 

Spark config  (this is our default spark config, though some of the spark jobs 
in the pipelines we're testing set executor mem and overhead mem higher to 
support more memory intensive work.  Our work is memory constrained, and 
additional cores per executor have never yielded more optimal throughput).
 * spark.executor.cores=1
 * spark.executor.memory=5G
 * spark.driver.memory=4G
 * spark.driver.maxResultSize=2G
 * spark.executor.memoryOverhead=1024
 * spark.dynamicAllocation.enabled = true

 


was (Author: novaboy):
Eric - thanks for the response and apologies for the absence.  Currently we 
have not been able to reproduce outside of our particular pipeline, though we 
stopped in earnest once our platform vendor indicated they were able to 
reproduce with a purpose-built MR job  (we are currently working the issue with 
them).  I will try to get details.

Essentially what we see is a single job (in lq1) with several thousand pending 
containers taking the entire cluster (expected, via dynamic allocation).  When 
a second job enters lq2, it fails to receive executors despite having a 
guaranteed minimum capacity of 17% (approx 4 cores..   28 * 0.95 * 0.17).  On 
occasion it also fails to receive an AM.  If a third job enters lq3 at this 
point, it also fails to receive executors.  The jobs continue to starve until 
the first job begins attriting resources as pending containers fall to zero.   

 

YARN Resources  (4 NM's, so 280 GiB / 28c total YARN resources)
 * yarn.nodemanager.resource.cpu-vcores = 7
 * yarn.scheduler.maximum-allocation-vcores = 7
 * yarn.nodemanager.resource.memory-mb = 70 GiB
 * yarn.scheduler.maximum-allocation-mb = 40 GiB

 
 
Queue configuration  (note that only lq1, lq2 and lq3 are used in the current 
tests)
 * root.default cap = 5%
 * root.tek cap = 95%
 * root.tek.lq1, .lq2, .lq3, .lq4 cap = 17% each
 * root.tek.lq5 .lq6 cap = 16% each

 

For all lqN (leaf queues):  
 * Minimum User Limit = 25%
 * User Limit Factor = 100  (intentionally set high to allow user to exceed 
queue capacity when idle capacity exists)
 * max cap = 100%
 * max AM res limit = 20%
 * inter / intra queue preemption: Enabled
 * ordering policy = Fair

 

Spark config
 * spark.executor.cores=1
 * spark.executor.memory=5G
 * spark.driver.memory=4G
 * spark.driver.maxResultSize=2G
 * spark.executor.memoryOverhead=1024
 * spark.dynamicAllocation.enabled = true

 

> Inter Queue preemption is not happening when DRF is configured
> --
>
> Key: YARN-6538
> URL: https://issues.apache.org/jira/browse/YARN-6538
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.8.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
>
> Cluster capacity of . Here memory is more and vcores 
> are less. If applications have more demand, vcores might be exhausted

[jira] [Commented] (YARN-6538) Inter Queue preemption is not happening when DRF is configured

2021-03-23 Thread Michael Zeoli (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307191#comment-17307191
 ] 

Michael Zeoli commented on YARN-6538:
-

Eric - thanks for the response and apologies for the absence.  Currently we 
have not been able to reproduce outside of our particular pipeline, though we 
stopped in earnest once our platform vendor indicated they were able to 
reproduce with a purpose-built MR job  (we are currently working the issue with 
them).  I will try to get details.

Essentially what we see is a single job (in lq1) with several thousand pending 
containers taking the entire cluster (expected, via dynamic allocation).  When 
a second job enters lq2, it fails to receive executors despite having a 
guaranteed minimum capacity of 17% (approx 4 cores..   28 * 0.95 * 0.17).  On 
occasion it also fails to receive an AM.  If a third job enters lq3 at this 
point, it also fails to receive executors.  The jobs continue to starve until 
the first job begins attriting resources as pending containers fall to zero.   

 

YARN Resources  (4 NM's, so 280 GiB / 28c total YARN resources)
 * yarn.nodemanager.resource.cpu-vcores = 7
 * yarn.scheduler.maximum-allocation-vcores = 7
 * yarn.nodemanager.resource.memory-mb = 70 GiB
 * yarn.scheduler.maximum-allocation-mb = 40 GiB

 
 
Queue configuration  (note that only lq1, lq2 and lq3 are used in the current 
tests)
 * root.default cap = 5%
 * root.tek cap = 95%
 * root.tek.lq1, .lq2, .lq3, .lq4 cap = 17% each
 * root.tek.lq5 .lq6 cap = 16% each

 

For all lqN (leaf queues):  
 * Minimum User Limit = 25%
 * User Limit Factor = 100  (intentionally set high to allow user to exceed 
queue capacity when idle capacity exists)
 * max cap = 100%
 * max AM res limit = 20%
 * inter / intra queue preemption: Enabled
 * ordering policy = Fair

 

Spark config
 * spark.executor.cores=1
 * spark.executor.memory=5G
 * spark.driver.memory=4G
 * spark.driver.maxResultSize=2G
 * spark.executor.memoryOverhead=1024
 * spark.dynamicAllocation.enabled = true

 

> Inter Queue preemption is not happening when DRF is configured
> --
>
> Key: YARN-6538
> URL: https://issues.apache.org/jira/browse/YARN-6538
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.8.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
>
> Cluster capacity of . Here memory is more and vcores 
> are less. If applications have more demand, vcores might be exhausted. 
> Inter queue preemption ideally has to be kicked in once vcores is over 
> utilized. However preemption is not happening.
> Analysis:
> In {{AbstractPreemptableResourceCalculator.computeFixpointAllocation}}, 
> {code}
> // assign all cluster resources until no more demand, or no resources are
> // left
> while (!orderedByNeed.isEmpty() && Resources.greaterThan(rc, totGuarant,
> unassigned, Resources.none())) {
> {code}
>  will loop even when vcores are 0 (because memory is still +ve). Hence we are 
> having more vcores in idealAssigned which cause no-preemption cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10697) Resources are displayed in bytes in UI for schedulers other than capacity

2021-03-23 Thread Jim Brennan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307114#comment-17307114
 ] 

Jim Brennan commented on YARN-10697:


Thanks for the update [~BilwaST]!  I am +1 on patch 003.  [~epayne]. [~jhung], 
if there are no objections I will commit this later today.


> Resources are displayed in bytes in UI for schedulers other than capacity
> -
>
> Key: YARN-10697
> URL: https://issues.apache.org/jira/browse/YARN-10697
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10697.001.patch, YARN-10697.002.patch, 
> YARN-10697.003.patch, image-2021-03-17-11-30-57-216.png
>
>
> Resources.newInstance expects MB as memory whereas in MetricsOverviewTable 
> passes resources in bytes . Also we should display memory in GB for better 
> readability for user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-23 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307061#comment-17307061
 ] 

Qi Zhu commented on YARN-10674:
---

Thanks [~gandras] for update and discuss with [~pbacsko].

It make sense to me.

I have updated in latest patch.:D

 

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch, 
> YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, 
> YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, 
> YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, 
> YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, 
> YARN-10674.015.patch, YARN-10674.016.patch, YARN-10674.017.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-23 Thread Qi Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10674:
--
Attachment: YARN-10674.017.patch

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch, 
> YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, 
> YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, 
> YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, 
> YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, 
> YARN-10674.015.patch, YARN-10674.016.patch, YARN-10674.017.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-23 Thread Andras Gyori (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306981#comment-17306981
 ] 

Andras Gyori commented on YARN-10674:
-

Thank you [~zhuqi] for the patch. Sorry for coming up with this, but I think I 
did not explain what I had in mind well. My suggestion is the following:
 * Change the description of DISABLE_PREEMPTION to state that enabled is the 
default:
{code:java}
DISABLE_PREEMPTION("disable preemption", "dp", "disable-preemption",
"Disable the preemption with nopolicy or observeonly mode. " +
"Preemption is enabled by default. " + 
"nopolicy removes ProportionalCapacityPreemptionPolicy from " +
"the list of monitor policies, " +
"observeonly sets " +
"yarn.resourcemanager.monitor.capacity.preemption.observe_only " +
"to true.", true),
{code}

 * Change PreemptionMode to include the ENABLED variant (the fromString could 
throw an exception on illegal string). You do not need the private boolean 
enabled field, because we have the ENABLED variant for this.
{code:java}
public enum PreemptionMode {
ENABLED("enabled"),
NO_POLICY("nopolicy"),
OBSERVE_ONLY("observeonly");

private String cliOption;

PreemptionMode(String cliOption) {
  this.cliOption = cliOption;
}

public String getCliOption() {
  return cliOption;
}

public static PreemptionMode fromString(String cliOption) {
  if (cliOption.trim().
  equals(PreemptionMode.OBSERVE_ONLY.getCliOption())) {
return PreemptionMode.OBSERVE_ONLY;
  } else if (cliOption.trim().
  equals(PreemptionMode.NO_POLICY.getCliOption())) {
return PreemptionMode.NO_POLICY;
  } else {
return PreemptionMode.ENABLED;
  }
}
  }
{code}

 * You could then simplify emitDisablePreemptionForObserveOnlyMode as because 
PreemptionMode has an ENABLED variant, therefore the OBSERVE_ONLY variant 
already means, that the preemption is not enabled:
{code:java}
private void emitDisablePreemptionForObserveOnlyMode() {
if (preemptionMode == FSConfigToCSConfigConverterParams
.PreemptionMode.OBSERVE_ONLY) {
  capacitySchedulerConfig.
  setBoolean(CapacitySchedulerConfiguration.
  PREEMPTION_OBSERVE_ONLY, true);
}
  }
{code}

 * The same applies for convertSiteProperties:
{code:java}
if (preemptionMode == 
FSConfigToCSConfigConverterParams.PreemptionMode.NO_POLICY) {
yarnSiteConfig.set(YarnConfiguration.RM_SCHEDULER_MONITOR_POLICIES, "");
}
{code}

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch, 
> YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, 
> YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, 
> YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, 
> YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, 
> YARN-10674.015.patch, YARN-10674.016.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10697) Resources are displayed in bytes in UI for schedulers other than capacity

2021-03-23 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306970#comment-17306970
 ] 

Hadoop QA commented on YARN-10697:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
25s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
48s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
47s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m  
0s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
27s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
40s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
50s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 13s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
32s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
23s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 26m 
10s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  4m  
3s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
31s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
54s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
49s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
49s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
35s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
46s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 58

[jira] [Updated] (YARN-10518) Add metrics for custom resource types in NodeManagerMetrics

2021-03-23 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10518:

Description: This Jira deals with updating NodeManager metrics with custom 
resource types. It includes allocated, available  resources.  (was: This Jira 
deals with updating NodeManager metrics with custom resource types. It includes 
allocated, available and total resources.)

> Add metrics for custom resource types in NodeManagerMetrics 
> 
>
> Key: YARN-10518
> URL: https://issues.apache.org/jira/browse/YARN-10518
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10518.v1.patch
>
>
> This Jira deals with updating NodeManager metrics with custom resource types. 
> It includes allocated, available  resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10518) Add metrics for custom resource types in NodeManagerMetrics

2021-03-23 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306968#comment-17306968
 ] 

Hadoop QA commented on YARN-10518:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
44s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
51s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
 6s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
18s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
46s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
42s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
48s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 24s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
27s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
42s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 26m 
13s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  3m 
43s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
15s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
23s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
23s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
36s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
36s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
37s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
38s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  8

[jira] [Commented] (YARN-10708) Remove NULL check before instanceof

2021-03-23 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306962#comment-17306962
 ] 

Steve Loughran commented on YARN-10708:
---

FWIW there's some really good instanceof enhancements of in java ; it'll be 
time to do another refresh then too

> Remove NULL check before instanceof
> ---
>
> Key: YARN-10708
> URL: https://issues.apache.org/jira/browse/YARN-10708
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Jiajun Jiang
>Priority: Minor
>  Labels: pull-request-available
> Attachments: YARN-10708.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Submitted patch to remove the NULL check before instanceof check in several 
> classes. Same issue with YARN-9340.
> Classes involved.
> *  M  
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetAllResourceProfilesResponse.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetAllResourceTypeInfoResponse.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetResourceProfileRequest.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetResourceProfileResponse.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/LightWeightResource.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Log4jWarningErrorMetricsAppender.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/volume/csi/VolumeId.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/PrivilegedOperation.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/deviceframework/AssignedDevice.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/runtime/ContainerRuntimeContext.java
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10711) Make CSQueueMetrics configured related field to support nodelabel.

2021-03-23 Thread Qi Zhu (Jira)

Qi Zhu created YARN-10711:
-

 Summary: Make CSQueueMetrics configured related field to support 
nodelabel.
 Key: YARN-10711
 URL: https://issues.apache.org/jira/browse/YARN-10711
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Qi Zhu
Assignee: Qi Zhu


{code:java}
// Update configured capacity/max-capacity for default partition only
CSQueueUtils.updateConfiguredCapacityMetrics(resourceCalculator,
labelManager.getResourceByLabel(null, clusterResource),
RMNodeLabelsManager.NO_LABEL, this);
{code}
Now configured capacity/max-capacity only support default partition.

We should support nodelabel. 

 

cc [~pbacsko] [~gandras] [~ebadger] [~Jim_Brennan] [~epayne] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10518) Add metrics for custom resource types in NodeManagerMetrics

2021-03-23 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10518:

Attachment: YARN-10518.v1.patch

> Add metrics for custom resource types in NodeManagerMetrics 
> 
>
> Key: YARN-10518
> URL: https://issues.apache.org/jira/browse/YARN-10518
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10518.v1.patch
>
>
> This Jira deals with updating NodeManager metrics with custom resource types. 
> It includes allocated, available and total resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10697) Resources are displayed in bytes in UI for schedulers other than capacity

2021-03-23 Thread Bilwa S T (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306824#comment-17306824
 ] 

Bilwa S T commented on YARN-10697:
--

[~Jim_Brennan] I have changed method name. Please check updated patch. Thanks

> Resources are displayed in bytes in UI for schedulers other than capacity
> -
>
> Key: YARN-10697
> URL: https://issues.apache.org/jira/browse/YARN-10697
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10697.001.patch, YARN-10697.002.patch, 
> YARN-10697.003.patch, image-2021-03-17-11-30-57-216.png
>
>
> Resources.newInstance expects MB as memory whereas in MetricsOverviewTable 
> passes resources in bytes . Also we should display memory in GB for better 
> readability for user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10697) Resources are displayed in bytes in UI for schedulers other than capacity

2021-03-23 Thread Bilwa S T (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-10697:
-
Attachment: YARN-10697.003.patch

> Resources are displayed in bytes in UI for schedulers other than capacity
> -
>
> Key: YARN-10697
> URL: https://issues.apache.org/jira/browse/YARN-10697
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10697.001.patch, YARN-10697.002.patch, 
> YARN-10697.003.patch, image-2021-03-17-11-30-57-216.png
>
>
> Resources.newInstance expects MB as memory whereas in MetricsOverviewTable 
> passes resources in bytes . Also we should display memory in GB for better 
> readability for user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10683) Add total resource in NodeManager metrics

[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.

[jira] [Comment Edited] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU and other custom resources.

[jira] [Commented] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU and other custom resources.

[jira] [Comment Edited] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.

[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.

[jira] [Updated] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.

[jira] [Commented] (YARN-10517) QueueMetrics has incorrect Allocated Resource when labelled partitions updated

[jira] [Commented] (YARN-10493) RunC container repository v2

[jira] [Commented] (YARN-10493) RunC container repository v2

[jira] [Commented] (YARN-10517) QueueMetrics has incorrect Allocated Resource when labelled partitions updated

[jira] [Commented] (YARN-10707) Support gpu in ResourceUtilization, and update Node GPU Utilization to use.

[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.

[jira] [Comment Edited] (YARN-9618) NodeListManager event improvement

[jira] [Commented] (YARN-9618) NodeListManager event improvement

[jira] [Commented] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU and other custom resources.

[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

[jira] [Comment Edited] (YARN-6538) Inter Queue preemption is not happening when DRF is configured

[jira] [Commented] (YARN-6538) Inter Queue preemption is not happening when DRF is configured

[jira] [Commented] (YARN-10697) Resources are displayed in bytes in UI for schedulers other than capacity

[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

[jira] [Updated] (YARN-10674) fs2cs: should support auto created queue deletion.

[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

[jira] [Commented] (YARN-10697) Resources are displayed in bytes in UI for schedulers other than capacity

[jira] [Updated] (YARN-10518) Add metrics for custom resource types in NodeManagerMetrics

[jira] [Commented] (YARN-10518) Add metrics for custom resource types in NodeManagerMetrics

[jira] [Commented] (YARN-10708) Remove NULL check before instanceof

[jira] [Created] (YARN-10711) Make CSQueueMetrics configured related field to support nodelabel.

[jira] [Updated] (YARN-10518) Add metrics for custom resource types in NodeManagerMetrics

[jira] [Commented] (YARN-10697) Resources are displayed in bytes in UI for schedulers other than capacity

[jira] [Updated] (YARN-10697) Resources are displayed in bytes in UI for schedulers other than capacity

31 matches

Site Navigation

Mail list logo

Footer information