[jira] [Commented] (YARN-10504) Implement weight mode in Capacity Scheduler

2021-01-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261984#comment-17261984
 ] 

Hadoop QA commented on YARN-10504:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
17s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 9 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
32s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 33s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
48s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
46s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 43s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/450/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 78 new + 758 unchanged - 13 fixed = 836 total (was 771) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 27s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| 

[jira] [Commented] (YARN-10504) Implement weight mode in Capacity Scheduler

2021-01-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261962#comment-17261962
 ] 

Hadoop QA commented on YARN-10504:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
15s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 9 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
13s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 30s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
48s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
46s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 44s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/449/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 78 new + 758 unchanged - 13 fixed = 836 total (was 771) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 28s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| 

[jira] [Commented] (YARN-10504) Implement weight mode in Capacity Scheduler

2021-01-09 Thread Benjamin Teke (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261961#comment-17261961
 ] 

Benjamin Teke commented on YARN-10504:
--

[~wangda], [~zhuqi]

 

Added a small new patch. ver.6 had a small issue in 
ParentQueue.getCapacityConfigurationTypeForQueues, where if the passed queues 
collection was empty the iterator where the mixed mode check was implemented 
threw a NoSuchElementException, causing all of the AutoCreatedTests to fail. 
Also setChildQueues has a similar mixed mode check, where the root queue is 
included in the check, causing an unnecessary exception. So I extended that if 
as well. Additionally I added [~zhuqi]'s suggestion to 
LeafQueue.updateClusterResource.

There is one more issue: now 2 of the TestAbsoluteResourceWithAutoQueue tests 
are failing, because the apps are stuck in SUBMITTED state. [~wangda] do you 
have an idea on this one?

> Implement weight mode in Capacity Scheduler
> ---
>
> Key: YARN-10504
> URL: https://issues.apache.org/jira/browse/YARN-10504
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10504.001.patch, YARN-10504.002.patch, 
> YARN-10504.003.patch, YARN-10504.004.patch, YARN-10504.005.patch, 
> YARN-10504.006.patch, YARN-10504.007.patch, YARN-10504.ver-1.patch, 
> YARN-10504.ver-2.patch, YARN-10504.ver-3.patch
>
>
> To allow the possibility to flexibly create queues in Capacity Scheduler a 
> weight mode should be introduced. The existing \{{capacity }}property should 
> be used with a different syntax, i.e:
> root.users.capacity = (1.0) or ~1.0 or ^1.0 or @1.0
> root.users.capacity = 1.0w
> root.users.capacity = w:1.0
> Weight support should not impact the existing functionality.
>  
> The new functionality should: 
>  * accept and validate the new weight values
>  * enforce a singular mode on the whole queue tree
>  * (re)calculate the relative (percentage-based) capacities based on the 
> weights during launch and every time the queue structure changes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10504) Implement weight mode in Capacity Scheduler

2021-01-09 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke updated YARN-10504:
-
Attachment: YARN-10504.007.patch

> Implement weight mode in Capacity Scheduler
> ---
>
> Key: YARN-10504
> URL: https://issues.apache.org/jira/browse/YARN-10504
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10504.001.patch, YARN-10504.002.patch, 
> YARN-10504.003.patch, YARN-10504.004.patch, YARN-10504.005.patch, 
> YARN-10504.006.patch, YARN-10504.007.patch, YARN-10504.ver-1.patch, 
> YARN-10504.ver-2.patch, YARN-10504.ver-3.patch
>
>
> To allow the possibility to flexibly create queues in Capacity Scheduler a 
> weight mode should be introduced. The existing \{{capacity }}property should 
> be used with a different syntax, i.e:
> root.users.capacity = (1.0) or ~1.0 or ^1.0 or @1.0
> root.users.capacity = 1.0w
> root.users.capacity = w:1.0
> Weight support should not impact the existing functionality.
>  
> The new functionality should: 
>  * accept and validate the new weight values
>  * enforce a singular mode on the whole queue tree
>  * (re)calculate the relative (percentage-based) capacities based on the 
> weights during launch and every time the queue structure changes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10504) Implement weight mode in Capacity Scheduler

2021-01-09 Thread Benjamin Teke (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261954#comment-17261954
 ] 

Benjamin Teke commented on YARN-10504:
--

[~wangda]/[~zhuqi]/[~gandras],

 

Let me take care of the suggestions in [~zhuqi]'s comment above. I'll base it 
on the ver.6 patch.

> Implement weight mode in Capacity Scheduler
> ---
>
> Key: YARN-10504
> URL: https://issues.apache.org/jira/browse/YARN-10504
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10504.001.patch, YARN-10504.002.patch, 
> YARN-10504.003.patch, YARN-10504.004.patch, YARN-10504.005.patch, 
> YARN-10504.006.patch, YARN-10504.ver-1.patch, YARN-10504.ver-2.patch, 
> YARN-10504.ver-3.patch
>
>
> To allow the possibility to flexibly create queues in Capacity Scheduler a 
> weight mode should be introduced. The existing \{{capacity }}property should 
> be used with a different syntax, i.e:
> root.users.capacity = (1.0) or ~1.0 or ^1.0 or @1.0
> root.users.capacity = 1.0w
> root.users.capacity = w:1.0
> Weight support should not impact the existing functionality.
>  
> The new functionality should: 
>  * accept and validate the new weight values
>  * enforce a singular mode on the whole queue tree
>  * (re)calculate the relative (percentage-based) capacities based on the 
> weights during launch and every time the queue structure changes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10504) Implement weight mode in Capacity Scheduler

2021-01-09 Thread Wangda Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-10504:
--
Attachment: YARN-10504.006.patch

> Implement weight mode in Capacity Scheduler
> ---
>
> Key: YARN-10504
> URL: https://issues.apache.org/jira/browse/YARN-10504
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10504.001.patch, YARN-10504.002.patch, 
> YARN-10504.003.patch, YARN-10504.004.patch, YARN-10504.005.patch, 
> YARN-10504.006.patch, YARN-10504.ver-1.patch, YARN-10504.ver-2.patch, 
> YARN-10504.ver-3.patch
>
>
> To allow the possibility to flexibly create queues in Capacity Scheduler a 
> weight mode should be introduced. The existing \{{capacity }}property should 
> be used with a different syntax, i.e:
> root.users.capacity = (1.0) or ~1.0 or ^1.0 or @1.0
> root.users.capacity = 1.0w
> root.users.capacity = w:1.0
> Weight support should not impact the existing functionality.
>  
> The new functionality should: 
>  * accept and validate the new weight values
>  * enforce a singular mode on the whole queue tree
>  * (re)calculate the relative (percentage-based) capacities based on the 
> weights during launch and every time the queue structure changes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10504) Implement weight mode in Capacity Scheduler

2021-01-09 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261943#comment-17261943
 ] 

Wangda Tan commented on YARN-10504:
---

Updated ver.6 patch, which includes the following: 

1) Unit tests to test end to end capability and mixed percentage/weight mode. 

2) Rewrite ParentQueue.setChildQueues. It was a bit messy before. Now it 
enforced more strict checks, and rewrote check statements for better 
readability.

[~zhuqi]/[~bteke] /[~gandras] , I haven't addressed your comment, so it will be 
nice if you can help to make the changes. (And again, please add a comment if 
you plan to do that to avoid editing the same code).

> Implement weight mode in Capacity Scheduler
> ---
>
> Key: YARN-10504
> URL: https://issues.apache.org/jira/browse/YARN-10504
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10504.001.patch, YARN-10504.002.patch, 
> YARN-10504.003.patch, YARN-10504.004.patch, YARN-10504.005.patch, 
> YARN-10504.ver-1.patch, YARN-10504.ver-2.patch, YARN-10504.ver-3.patch
>
>
> To allow the possibility to flexibly create queues in Capacity Scheduler a 
> weight mode should be introduced. The existing \{{capacity }}property should 
> be used with a different syntax, i.e:
> root.users.capacity = (1.0) or ~1.0 or ^1.0 or @1.0
> root.users.capacity = 1.0w
> root.users.capacity = w:1.0
> Weight support should not impact the existing functionality.
>  
> The new functionality should: 
>  * accept and validate the new weight values
>  * enforce a singular mode on the whole queue tree
>  * (re)calculate the relative (percentage-based) capacities based on the 
> weights during launch and every time the queue structure changes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10504) Implement weight mode in Capacity Scheduler

2021-01-09 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261921#comment-17261921
 ] 

Wangda Tan commented on YARN-10504:
---

Make sense [~zhuqi].

[~bteke]/[~gandras] , if you're not making changes to the patch, [~zhuqi] can 
you take care of the issue? 

I plan to add a few more test case coverages for the weight mode 
today/tomorrow. I will not touch any existing logic. 

> Implement weight mode in Capacity Scheduler
> ---
>
> Key: YARN-10504
> URL: https://issues.apache.org/jira/browse/YARN-10504
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10504.001.patch, YARN-10504.002.patch, 
> YARN-10504.003.patch, YARN-10504.004.patch, YARN-10504.005.patch, 
> YARN-10504.ver-1.patch, YARN-10504.ver-2.patch, YARN-10504.ver-3.patch
>
>
> To allow the possibility to flexibly create queues in Capacity Scheduler a 
> weight mode should be introduced. The existing \{{capacity }}property should 
> be used with a different syntax, i.e:
> root.users.capacity = (1.0) or ~1.0 or ^1.0 or @1.0
> root.users.capacity = 1.0w
> root.users.capacity = w:1.0
> Weight support should not impact the existing functionality.
>  
> The new functionality should: 
>  * accept and validate the new weight values
>  * enforce a singular mode on the whole queue tree
>  * (re)calculate the relative (percentage-based) capacities based on the 
> weights during launch and every time the queue structure changes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10504) Implement weight mode in Capacity Scheduler

2021-01-09 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261872#comment-17261872
 ] 

zhuqi edited comment on YARN-10504 at 1/9/21, 2:07 PM:
---

[~wangda]  [~bteke] [~gandras]

1. The {{updateAbsoluteCapacitiesAndRelatedFields should update 
maxApplications, but in some case, for example:}}

{\{ in TestCapacitySchedulerAutoQueueCreation -> 
}}testAutoCreatedQueueActivationDeactivation 

 
{code:java}
//submit user_3 app. This cant be allocated since there is no capacity
// in NO_LABEL, SSD but can be in GPU label
submitApp(mockRM, parentQueue, USER3, USER3, 4, 1);
final CSQueue user3LeafQueue = cs.getQueue(USER3);
validateCapacities((AutoCreatedLeafQueue) user3LeafQueue, 0.0f, 0.0f,
1.0f, 1.0f);
validateCapacitiesByLabel((ManagedParentQueue) parentQueue,
(AutoCreatedLeafQueue)
user3LeafQueue, NODEL_LABEL_GPU);
{code}
The case is no capacity in user_3 autoCreatedLeafQueue, so in 
{{updateAbsoluteCapacitiesAndRelatedFields}}

 

 
{code:java}
private void updateAbsoluteCapacitiesAndRelatedFields() {
  updateAbsoluteCapacities();
  CapacitySchedulerConfiguration schedulerConf = csContext.getConfiguration();

  // If maxApplications not set, use the system total max app, apply newly
  // calculated abs capacity of the queue.
  if (maxApplications <= 0) {
int maxSystemApps = schedulerConf.
getMaximumSystemApplications();
maxApplications =
(int) (maxSystemApps * queueCapacities.getAbsoluteCapacity());
  }
  maxApplicationsPerUser = Math.min(maxApplications,
  (int) (maxApplications * (usersManager.getUserLimit() / 100.0f)
  * usersManager.getUserLimitFactor()));
}
// because capacities will update to 0
if (availableCapacity >= leafQueueTemplateCapacities
.getAbsoluteCapacity(nodeLabel)) {
  updateCapacityFromTemplate(capacities, nodeLabel);
  activate(leafQueue, nodeLabel);
} else{
  updateToZeroCapacity(capacities, nodeLabel);
}

// And because, the update will be after reinitializeFromTemplate
final AutoCreatedLeafQueueConfig initialLeafQueueTemplate =
queueManagementPolicy.getInitialLeafQueueConfiguration(leafQueue);
leafQueue.reinitializeFromTemplate(initialLeafQueueTemplate);

// Do one update cluster resource call to make sure all absolute resources
// effective resources are updated.
updateClusterResource(this.csContext.getClusterResource(),
new ResourceLimits(this.csContext.getClusterResource()));{code}
The maxApplications and maxApplicationsPerUser will be 0. 

 

So will should handle in new logic in 

//TODO recalculate max applications because they can depend on capacity 

The todo should be removed, just pass the AutoCreatedLeafQueue case now, or add 
logic to make this case's  maxApplications to a fixed default num.

 

2. As mentioned by [~bteke] 

"Sharing my latest findings on TestAbsoluteResourceWithAutoQueue failure: 
{{AutoCreatedLeafQueue#reinitializeFromTemplate }}was refactored, now the 
getting and merging the QueueCapacities happens *before* calling the 
{{ParentQueue#updateClusterResource}} (and 
{{LeafQueue#updateClusterResource}}). In \{{LeafQueue#updateClusterResource 
}}the \{{AbstractCSQueue#updateEffectiveResources }}is called where the 
effectiveMinResource of the created queue is overridden with the template's 
effectiveMinResources which is exactly the same the test is getting in the 
asserts."

We should changed the \{{LeafQueue updateClusterResource }}to:
{code:java}
// public void updateClusterResource(Resource clusterResource,
ResourceLimits currentResourceLimits) {
  writeLock.lock();
  try {
...

if (!(this instanceof AutoCreatedLeafQueue)) {
  super.updateEffectiveResources(clusterResource);
}

}{code}
It will fix absolute case TestAbsoluteResourceWithAutoQueue . 

If you any other advice?

Thanks.


was (Author: zhuqi):
[~wangda]  [~bteke]

1. The {{updateAbsoluteCapacitiesAndRelatedFields should update 
maxApplications, but in some case, for example:}}

{{ in TestCapacitySchedulerAutoQueueCreation -> 
}}testAutoCreatedQueueActivationDeactivation 

 
{code:java}
//submit user_3 app. This cant be allocated since there is no capacity
// in NO_LABEL, SSD but can be in GPU label
submitApp(mockRM, parentQueue, USER3, USER3, 4, 1);
final CSQueue user3LeafQueue = cs.getQueue(USER3);
validateCapacities((AutoCreatedLeafQueue) user3LeafQueue, 0.0f, 0.0f,
1.0f, 1.0f);
validateCapacitiesByLabel((ManagedParentQueue) parentQueue,
(AutoCreatedLeafQueue)
user3LeafQueue, NODEL_LABEL_GPU);
{code}
The case is no capacity in user_3 autoCreatedLeafQueue, so in 
{{updateAbsoluteCapacitiesAndRelatedFields}}

 

 
{code:java}
private void updateAbsoluteCapacitiesAndRelatedFields() {
  updateAbsoluteCapacities();
  CapacitySchedulerConfiguration schedulerConf = csContext.getConfiguration();

  // If maxApplications not set, use the system total max app, apply newly
  // 

[jira] [Commented] (YARN-10504) Implement weight mode in Capacity Scheduler

2021-01-09 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261872#comment-17261872
 ] 

zhuqi commented on YARN-10504:
--

[~wangda]  [~bteke]

1. The {{updateAbsoluteCapacitiesAndRelatedFields should update 
maxApplications, but in some case, for example:}}

{{ in TestCapacitySchedulerAutoQueueCreation -> 
}}testAutoCreatedQueueActivationDeactivation 

 
{code:java}
//submit user_3 app. This cant be allocated since there is no capacity
// in NO_LABEL, SSD but can be in GPU label
submitApp(mockRM, parentQueue, USER3, USER3, 4, 1);
final CSQueue user3LeafQueue = cs.getQueue(USER3);
validateCapacities((AutoCreatedLeafQueue) user3LeafQueue, 0.0f, 0.0f,
1.0f, 1.0f);
validateCapacitiesByLabel((ManagedParentQueue) parentQueue,
(AutoCreatedLeafQueue)
user3LeafQueue, NODEL_LABEL_GPU);
{code}
The case is no capacity in user_3 autoCreatedLeafQueue, so in 
{{updateAbsoluteCapacitiesAndRelatedFields}}

 

 
{code:java}
private void updateAbsoluteCapacitiesAndRelatedFields() {
  updateAbsoluteCapacities();
  CapacitySchedulerConfiguration schedulerConf = csContext.getConfiguration();

  // If maxApplications not set, use the system total max app, apply newly
  // calculated abs capacity of the queue.
  if (maxApplications <= 0) {
int maxSystemApps = schedulerConf.
getMaximumSystemApplications();
maxApplications =
(int) (maxSystemApps * queueCapacities.getAbsoluteCapacity());
  }
  maxApplicationsPerUser = Math.min(maxApplications,
  (int) (maxApplications * (usersManager.getUserLimit() / 100.0f)
  * usersManager.getUserLimitFactor()));
}
// because capacities will update to 0
if (availableCapacity >= leafQueueTemplateCapacities
.getAbsoluteCapacity(nodeLabel)) {
  updateCapacityFromTemplate(capacities, nodeLabel);
  activate(leafQueue, nodeLabel);
} else{
  updateToZeroCapacity(capacities, nodeLabel);
}

// And because, the update will be after reinitializeFromTemplate
final AutoCreatedLeafQueueConfig initialLeafQueueTemplate =
queueManagementPolicy.getInitialLeafQueueConfiguration(leafQueue);
leafQueue.reinitializeFromTemplate(initialLeafQueueTemplate);

// Do one update cluster resource call to make sure all absolute resources
// effective resources are updated.
updateClusterResource(this.csContext.getClusterResource(),
new ResourceLimits(this.csContext.getClusterResource()));{code}
The maxApplications and maxApplicationsPerUser will be 0. 

 

So will should handle in new logic in 

//TODO recalculate max applications because they can depend on capacity 

The todo should be removed, just pass the AutoCreatedLeafQueue case now, or add 
logic to make this case's  maxApplications to a fixed default num.

 

2. As mentioned by [~bteke] 

"Sharing my latest findings on TestAbsoluteResourceWithAutoQueue failure: 
{{AutoCreatedLeafQueue#reinitializeFromTemplate }}was refactored, now the 
getting and merging the QueueCapacities happens *before* calling the 
{{ParentQueue#updateClusterResource}} (and 
{{LeafQueue#updateClusterResource}}). In \{{LeafQueue#updateClusterResource 
}}the \{{AbstractCSQueue#updateEffectiveResources }}is called where the 
effectiveMinResource of the created queue is overridden with the template's 
effectiveMinResources which is exactly the same the test is getting in the 
asserts."

We should changed the {{LeafQueue updateClusterResource }}to:
{code:java}
// public void updateClusterResource(Resource clusterResource,
ResourceLimits currentResourceLimits) {
  writeLock.lock();
  try {
...

if (!(this instanceof AutoCreatedLeafQueue)) {
  super.updateEffectiveResources(clusterResource);
}

}{code}
It will fix absolute case TestAbsoluteResourceWithAutoQueue . 

If you any other advice?

Thanks.

> Implement weight mode in Capacity Scheduler
> ---
>
> Key: YARN-10504
> URL: https://issues.apache.org/jira/browse/YARN-10504
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10504.001.patch, YARN-10504.002.patch, 
> YARN-10504.003.patch, YARN-10504.004.patch, YARN-10504.005.patch, 
> YARN-10504.ver-1.patch, YARN-10504.ver-2.patch, YARN-10504.ver-3.patch
>
>
> To allow the possibility to flexibly create queues in Capacity Scheduler a 
> weight mode should be introduced. The existing \{{capacity }}property should 
> be used with a different syntax, i.e:
> root.users.capacity = (1.0) or ~1.0 or ^1.0 or @1.0
> root.users.capacity = 1.0w
> root.users.capacity = w:1.0
> Weight support should not impact the existing functionality.
>  
> The new functionality should: 
>  * accept and validate the new weight values
>  * enforce a singular mode on the whole queue tree
>  * (re)calculate the relative (percentage-based) 

[jira] [Updated] (YARN-10558) Fix failure of TestDistributedShell#testDSShellWithOpportunisticContainers

2021-01-09 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-10558:

Fix Version/s: 3.3.1

> Fix failure of TestDistributedShell#testDSShellWithOpportunisticContainers
> --
>
> Key: YARN-10558
> URL: https://issues.apache.org/jira/browse/YARN-10558
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The TestDistributedShell#testDSShellWithOpportunisticContainers always fails 
> due to insufficient test configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10334) TestDistributedShell leaks resources on timeout/failure

2021-01-09 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-10334:

Fix Version/s: 3.3.1

> TestDistributedShell leaks resources on timeout/failure
> ---
>
> Key: YARN-10334
> URL: https://issues.apache.org/jira/browse/YARN-10334
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-shell, test, yarn
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: newbie, pull-request-available, test
> Fix For: 3.4.0, 3.3.1
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {{TestDistributedShell}} times out on trunk. I found that the application, 
> and containers will stay running in the background long after the unit test 
> has failed.
> This causes failure of other test cases and several false positives failures 
> as result of:
> * Ports will stay busy, so other tests cases fail to launch.
> * Unit tests fail because of memory restrictions.
> Although the unit test is already broken on trunk, we do not want its 
> failures to other unit tests.
> {{TestDistributedShell}} needs to be revisited to make sure that all 
> {{YarnClients}}, and {{YarnApplications}} are closed properly at the end of 
> the each unit test (including exception and timeouts)
> Steps to reproduce:
> {code:bash}
> mvn test -Dtest=TestDistributedShell#testDSShellWithOpportunisticContainers
> ## this will timeout as
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 90.234 s <<< FAILURE! - in 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
> [ERROR] 
> testDSShellWithOpportunisticContainers(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 90.018 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 9 
> milliseconds
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:1117)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:1089)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers(TestDistributedShell.java:1438)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> [INFO] 
> [INFO] Results:
> [INFO] 
> [ERROR] Errors: 
> [ERROR]   TestDistributedShell.testDSShellWithOpportunisticContainers:1438 » 
> TestTimedOut
> [INFO] 
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
> {code}
> Using {{ps}} command, you can find the yarn processes are still in the 
> background
> {code:bash}
> /bin/bash -c $JRE_HOME/bin/java -Xmx512m 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster 
> --container_type OPPORTUNISTIC --container_memory 128 --container_vcores 1 
> --num_containers 2 --priority 0 --appname DistributedShell --homedir 
> file:/Users/ahussein 
> 1>$WORK_DIR8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1593554710896_0001/container_1593554710896_0001_01_01/AppMaster.stdout
>  
> 2>$WORK_DIR8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1593554710896_0001/container_1593554710896_0001_01_01/AppMaster.stderr
> $JRE_HOME/bin/java -Xmx512m 
> 

[jira] [Updated] (YARN-10536) Client in distributedShell swallows interrupt exceptions

2021-01-09 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-10536:

Fix Version/s: 3.3.1

> Client in distributedShell swallows interrupt exceptions
> 
>
> Key: YARN-10536
> URL: https://issues.apache.org/jira/browse/YARN-10536
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, distributed-shell
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.1
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In {{applications.distributedshell.Client}} , the method 
> {{monitorApplication}} loops waiting for the following conditions:
> * Application fails: reaches {{YarnApplicationState.KILLED}}, or 
> {{YarnApplicationState.FAILED}}
> * Application succeeds: {{FinalApplicationStatus.SUCCEEDED}} or 
> {{YarnApplicationState.FINISHED}}
> * the time spent waiting is longer than {{clientTimeout}} (if it exists in 
> the parameters).
> When the Client thread is interrupted, it ignores the exception:
> {code:java}
>   // Check app status every 1 second.
>   try {
> Thread.sleep(1000);
>   } catch (InterruptedException e) {
> LOG.debug("Thread sleep in monitoring loop interrupted");
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org