[jira] [Commented] (YARN-10463) For Federation, we should support getApplicationAttemptReport.

2020-12-17 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251542#comment-17251542
 ] 

Hadoop QA commented on YARN-10463:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
51s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 34m 
31s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 16s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
49s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
47s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 14s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/397/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router: 
The patch generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  9s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | 

[jira] [Comment Edited] (YARN-10463) For Federation, we should support getApplicationAttemptReport.

2020-12-17 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251518#comment-17251518
 ] 

zhuqi edited comment on YARN-10463 at 12/18/20, 6:45 AM:
-

[~ztang]

I rebased it, and triggered a new CI now.

Also i add corresponding routerMetrics test to confirm it.

Add github PR to trigger CI.

Thanks.


was (Author: zhuqi):
[~ztang]

I rebased it, and triggered a new CI now.

Also i add corresponding routerMetrics test to confirm it.

Thanks.

> For Federation, we should support getApplicationAttemptReport.
> --
>
> Key: YARN-10463
> URL: https://issues.apache.org/jira/browse/YARN-10463
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-10463.001.patch, YARN-10463.002.patch, 
> YARN-10463.003.patch, YARN-10463.004.patch, YARN-10463.005.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10463) For Federation, we should support getApplicationAttemptReport.

2020-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-10463:
--
Labels: pull-request-available  (was: )

> For Federation, we should support getApplicationAttemptReport.
> --
>
> Key: YARN-10463
> URL: https://issues.apache.org/jira/browse/YARN-10463
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-10463.001.patch, YARN-10463.002.patch, 
> YARN-10463.003.patch, YARN-10463.004.patch, YARN-10463.005.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package

2020-12-17 Thread Minni Mittal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251517#comment-17251517
 ] 

Minni Mittal commented on YARN-10519:
-

I've addressed the comments for new line and changing visibility in new patch. 

For UTs, reference in QueueMetrics is required. 

> Refactor QueueMetricsForCustomResources class to move to yarn-common package
> 
>
> Key: YARN-10519
> URL: https://issues.apache.org/jira/browse/YARN-10519
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, 
> YARN-10519.v3.patch, YARN-10519.v4.patch, YARN-10519.v5.patch
>
>
> Refactor the code for QueueMetricsForCustomResources to move the base classes 
> to yarn-common package. This helps in reusing the class in adding custom 
> resource types at NM level also. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10463) For Federation, we should support getApplicationAttemptReport.

2020-12-17 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251518#comment-17251518
 ] 

zhuqi commented on YARN-10463:
--

[~ztang]

I rebased it, and triggered a new CI now.

Also i add corresponding routerMetrics test to confirm it.

Thanks.

> For Federation, we should support getApplicationAttemptReport.
> --
>
> Key: YARN-10463
> URL: https://issues.apache.org/jira/browse/YARN-10463
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
> Attachments: YARN-10463.001.patch, YARN-10463.002.patch, 
> YARN-10463.003.patch, YARN-10463.004.patch, YARN-10463.005.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package

2020-12-17 Thread Minni Mittal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10519:

Attachment: YARN-10519.v5.patch

> Refactor QueueMetricsForCustomResources class to move to yarn-common package
> 
>
> Key: YARN-10519
> URL: https://issues.apache.org/jira/browse/YARN-10519
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, 
> YARN-10519.v3.patch, YARN-10519.v4.patch, YARN-10519.v5.patch
>
>
> Refactor the code for QueueMetricsForCustomResources to move the base classes 
> to yarn-common package. This helps in reusing the class in adding custom 
> resource types at NM level also. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10463) For Federation, we should support getApplicationAttemptReport.

2020-12-17 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251497#comment-17251497
 ] 

Zhankun Tang commented on YARN-10463:
-

[~zhuqi], I triggered a new CI and it failed. I guess it needs a rebase to the 
newest trunk. Could you please help to rebase it and trigger the CI again?

> For Federation, we should support getApplicationAttemptReport.
> --
>
> Key: YARN-10463
> URL: https://issues.apache.org/jira/browse/YARN-10463
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
> Attachments: YARN-10463.001.patch, YARN-10463.002.patch, 
> YARN-10463.003.patch, YARN-10463.004.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10537) Change type of LogAggregationService threadPool

2020-12-17 Thread Ankit Kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Kumar reassigned YARN-10537:
--

Assignee: Ankit Kumar

> Change type of LogAggregationService threadPool
> ---
>
> Key: YARN-10537
> URL: https://issues.apache.org/jira/browse/YARN-10537
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Xie YiFan
>Assignee: Ankit Kumar
>Priority: Minor
>
> Now, LogAggregationService threadPool is FixedThreadPool which of default 
> threadPoolSize is 100. LogAggregationService will construct AppLogAggregator 
> for new come application and submit to threadPool. AppLogAggregator do while 
> loop unitl application finished. Some application may run very long time due 
> to reason such as no enough resource or other. As result, it occupy one 
> thread of threadPool. When this application size greater than threadPoolSize, 
> the later short-live application can't upload logs until previous long-live 
> application finished. So, i think we should replace FixedThreadPool to 
> CachedThreadPool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10463) For Federation, we should support getApplicationAttemptReport.

2020-12-17 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251493#comment-17251493
 ] 

Hadoop QA commented on YARN-10463:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
52s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  9m 
10s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/395/artifact/out/branch-mvninstall-root.txt{color}
 | {color:red} root in trunk failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
21s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/395/artifact/out/branch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.txt{color}
 | {color:red} hadoop-yarn-server-router in trunk failed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
16s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/395/artifact/out/branch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01.txt{color}
 | {color:red} hadoop-yarn-server-router in trunk failed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 10s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/395/artifact/out/buildtool-branch-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.txt{color}
 | {color:orange} The patch fails to run checkstyle in 
hadoop-yarn-server-router {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
38s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/395/artifact/out/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.txt{color}
 | {color:red} hadoop-yarn-server-router in trunk failed. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  1m 
27s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/395/artifact/out/branch-shadedclient.txt{color}
 | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
28s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/395/artifact/out/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.txt{color}
 | {color:red} hadoop-yarn-server-router in trunk failed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
28s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/395/artifact/out/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01.txt{color}
 | {color:red} hadoop-yarn-server-router in trunk failed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01. {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
51s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
26s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/395/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.txt{color}
 | {color:red} hadoop-yarn-server-router in trunk failed. {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
23s{color} | 

[jira] [Created] (YARN-10537) Change type of LogAggregationService threadPool

2020-12-17 Thread Xie YiFan (Jira)
Xie YiFan created YARN-10537:


 Summary: Change type of LogAggregationService threadPool
 Key: YARN-10537
 URL: https://issues.apache.org/jira/browse/YARN-10537
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Xie YiFan


Now, LogAggregationService threadPool is FixedThreadPool which of default 
threadPoolSize is 100. LogAggregationService will construct AppLogAggregator 
for new come application and submit to threadPool. AppLogAggregator do while 
loop unitl application finished. Some application may run very long time due to 
reason such as no enough resource or other. As result, it occupy one thread of 
threadPool. When this application size greater than threadPoolSize, the later 
short-live application can't upload logs until previous long-live application 
finished. So, i think we should replace FixedThreadPool to CachedThreadPool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10537) Change type of LogAggregationService threadPool

2020-12-17 Thread Xie YiFan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xie YiFan updated YARN-10537:
-
Priority: Minor  (was: Major)

> Change type of LogAggregationService threadPool
> ---
>
> Key: YARN-10537
> URL: https://issues.apache.org/jira/browse/YARN-10537
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Xie YiFan
>Priority: Minor
>
> Now, LogAggregationService threadPool is FixedThreadPool which of default 
> threadPoolSize is 100. LogAggregationService will construct AppLogAggregator 
> for new come application and submit to threadPool. AppLogAggregator do while 
> loop unitl application finished. Some application may run very long time due 
> to reason such as no enough resource or other. As result, it occupy one 
> thread of threadPool. When this application size greater than threadPoolSize, 
> the later short-live application can't upload logs until previous long-live 
> application finished. So, i think we should replace FixedThreadPool to 
> CachedThreadPool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10463) For Federation, we should support getApplicationAttemptReport.

2020-12-17 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251466#comment-17251466
 ] 

zhuqi commented on YARN-10463:
--

[~BilwaST] 

If you any other advice.

[~ztang] will help to merge it.

> For Federation, we should support getApplicationAttemptReport.
> --
>
> Key: YARN-10463
> URL: https://issues.apache.org/jira/browse/YARN-10463
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
> Attachments: YARN-10463.001.patch, YARN-10463.002.patch, 
> YARN-10463.003.patch, YARN-10463.004.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10165) Effective Capacities goes beyond 100% when queues are configured with mixed values - Percentage and Absolute Resource

2020-12-17 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251446#comment-17251446
 ] 

zhuqi commented on YARN-10165:
--

[~tanu.ajmera] 

The mixed mode is not supported now, i will fix it later in[YARN-10169 
|https://issues.apache.org/jira/browse/YARN-10169]

> Effective Capacities goes beyond 100% when queues are configured with mixed 
> values - Percentage and Absolute Resource
> -
>
> Key: YARN-10165
> URL: https://issues.apache.org/jira/browse/YARN-10165
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Tanu Ajmera
>Assignee: Tanu Ajmera
>Priority: Major
> Attachments: Screenshot 2020-02-26 at 12.39.49 PM.png, Screenshot 
> 2020-02-26 at 12.40.01 PM.png
>
>
> There are two queues - default and batch whose capacities have been 
> configured with mixed values. Resource available is 9GB.
> Default queue has been configured with Absolute Resource [memory=6000] and 
> Batch queue has been configured with Capacity Percentage 50%. In the Resource 
> Manager UI, Effective Capacities goes beyond 100%, for Default queue its 
> 65.1% and for Batch queue its 50%.  
>  
> !Screenshot 2020-02-26 at 12.39.49 PM.png|height=200|width=20!
>  !Screenshot 2020-02-26 at 12.40.01 PM.png|height=200|width=20!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10169) Mixed absolute resource value and percentage-based resource value in CapacityScheduler should fail

2020-12-17 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251110#comment-17251110
 ] 

zhuqi edited comment on YARN-10169 at 12/18/20, 1:49 AM:
-

[~leftnoteasy] [~pbacsko], [~snemeth], [~sunilg], [~bteke]

When i write a unit test, to confirm that percentage below absolute, the queue 
will always get wrong resource, 100 of the parent.

Becasue of the logic:
{code:java}
//Set absolute capacities for {capacity, maximum-capacity}
private static void updateAbsoluteCapacitiesByNodeLabels(
QueueCapacities queueCapacities, QueueCapacities parentQueueCapacities) {
  for (String label : queueCapacities.getExistingNodeLabels()) {
float capacity = queueCapacities.getCapacity(label);
if (capacity > 0f) {
  queueCapacities.setAbsoluteCapacity(
  label,
  capacity
  * (parentQueueCapacities == null ? 1 : parentQueueCapacities
  .getAbsoluteCapacity(label)));
}

float maxCapacity = queueCapacities.getMaximumCapacity(label);
if (maxCapacity > 0f) {
  queueCapacities.setAbsoluteMaximumCapacity(
  label,
  maxCapacity
  * (parentQueueCapacities == null ? 1 : parentQueueCapacities
  .getAbsoluteMaximumCapacity(label)));
}
  }
}
{code}
If we use absolute in parent, and also the capacity use the single absolute 
resource.  Here is the maxCapacity related code:
{code:java}
public float getNonLabeledQueueMaximumCapacity(String queue) {
  String configuredCapacity = get(getQueuePrefix(queue) + MAXIMUM_CAPACITY);
  boolean matcher = (configuredCapacity != null)
  && RESOURCE_PATTERN.matcher(configuredCapacity).find();
  if (matcher) {
// Return capacity in percentage as 0 for non-root queues and 100 for
// root.From AbstractCSQueue, absolute resource will be parsed and
// updated. Once nodes are added/removed in cluster, capacity in
// percentage will also be re-calculated.
return 100.0f;
  }

  float maxCapacity = (configuredCapacity == null)
  ? MAXIMUM_CAPACITY_VALUE
  : Float.parseFloat(configuredCapacity);
  maxCapacity = (maxCapacity == DEFAULT_MAXIMUM_CAPACITY_VALUE)
  ? MAXIMUM_CAPACITY_VALUE
  : maxCapacity;
  return maxCapacity;
}
{code}
In capacity absolute resource mode, it will return 100.0f, then the maxCapacity 
will be 100.

We should change it to support mixed mode,and if we now use mixed mode in 
maxCapacity, it will not throw exception, but it will be wrong.

Also we should fix it in auto create queue.


was (Author: zhuqi):
[~leftnoteasy] [~pbacsko], [~snemeth], [~sunilg], [~bteke]

When i write a unit test, to confirm that percentage below absolute, the queue 
will always get wrong resource, 100 of the parent.

Becasue of the logic:
{code:java}
//Set absolute capacities for {capacity, maximum-capacity}
private static void updateAbsoluteCapacitiesByNodeLabels(
QueueCapacities queueCapacities, QueueCapacities parentQueueCapacities) {
  for (String label : queueCapacities.getExistingNodeLabels()) {
float capacity = queueCapacities.getCapacity(label);
if (capacity > 0f) {
  queueCapacities.setAbsoluteCapacity(
  label,
  capacity
  * (parentQueueCapacities == null ? 1 : parentQueueCapacities
  .getAbsoluteCapacity(label)));
}

float maxCapacity = queueCapacities.getMaximumCapacity(label);
if (maxCapacity > 0f) {
  queueCapacities.setAbsoluteMaximumCapacity(
  label,
  maxCapacity
  * (parentQueueCapacities == null ? 1 : parentQueueCapacities
  .getAbsoluteMaximumCapacity(label)));
}
  }
}
{code}
If we use absolute in parent, and also the capacity use the single absolute 
resource. The parent's parentQueueCapacities will be null,  then the value will 
be maxCapacity, here is the maxCapacity related code:
{code:java}
public float getNonLabeledQueueMaximumCapacity(String queue) {
  String configuredCapacity = get(getQueuePrefix(queue) + MAXIMUM_CAPACITY);
  boolean matcher = (configuredCapacity != null)
  && RESOURCE_PATTERN.matcher(configuredCapacity).find();
  if (matcher) {
// Return capacity in percentage as 0 for non-root queues and 100 for
// root.From AbstractCSQueue, absolute resource will be parsed and
// updated. Once nodes are added/removed in cluster, capacity in
// percentage will also be re-calculated.
return 100.0f;
  }

  float maxCapacity = (configuredCapacity == null)
  ? MAXIMUM_CAPACITY_VALUE
  : Float.parseFloat(configuredCapacity);
  maxCapacity = (maxCapacity == DEFAULT_MAXIMUM_CAPACITY_VALUE)
  ? MAXIMUM_CAPACITY_VALUE
  : maxCapacity;
  return maxCapacity;
}
{code}
In capacity absolute resource mode, it will return 100.0f, then the maxCapacity 
will be 100.

We should change it to support mixed mode,and if we now use mixed mode in 

[jira] [Resolved] (YARN-10536) Client in distributedShell swallows interrupt exceptions

2020-12-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri resolved YARN-10536.

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Client in distributedShell swallows interrupt exceptions
> 
>
> Key: YARN-10536
> URL: https://issues.apache.org/jira/browse/YARN-10536
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, distributed-shell
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In {{applications.distributedshell.Client}} , the method 
> {{monitorApplication}} loops waiting for the following conditions:
> * Application fails: reaches {{YarnApplicationState.KILLED}}, or 
> {{YarnApplicationState.FAILED}}
> * Application succeeds: {{FinalApplicationStatus.SUCCEEDED}} or 
> {{YarnApplicationState.FINISHED}}
> * the time spent waiting is longer than {{clientTimeout}} (if it exists in 
> the parameters).
> When the Client thread is interrupted, it ignores the exception:
> {code:java}
>   // Check app status every 1 second.
>   try {
> Thread.sleep(1000);
>   } catch (InterruptedException e) {
> LOG.debug("Thread sleep in monitoring loop interrupted");
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10536) Client in distributedShell swallows interrupt exceptions

2020-12-17 Thread Jira


[ 
https://issues.apache.org/jira/browse/YARN-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251384#comment-17251384
 ] 

Íñigo Goiri commented on YARN-10536:


Thanks [~ahussein] for the fix, merged the PR to trunk.

> Client in distributedShell swallows interrupt exceptions
> 
>
> Key: YARN-10536
> URL: https://issues.apache.org/jira/browse/YARN-10536
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, distributed-shell
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In {{applications.distributedshell.Client}} , the method 
> {{monitorApplication}} loops waiting for the following conditions:
> * Application fails: reaches {{YarnApplicationState.KILLED}}, or 
> {{YarnApplicationState.FAILED}}
> * Application succeeds: {{FinalApplicationStatus.SUCCEEDED}} or 
> {{YarnApplicationState.FINISHED}}
> * the time spent waiting is longer than {{clientTimeout}} (if it exists in 
> the parameters).
> When the Client thread is interrupted, it ignores the exception:
> {code:java}
>   // Check app status every 1 second.
>   try {
> Thread.sleep(1000);
>   } catch (InterruptedException e) {
> LOG.debug("Thread sleep in monitoring loop interrupted");
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package

2020-12-17 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251325#comment-17251325
 ] 

Hadoop QA commented on YARN-10519:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
43s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 13m 
53s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
21s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
49s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
35s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
23s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
8s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 45s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
44s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
51s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
57s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
55s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
30s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
16s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
16s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
33s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
33s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green}{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 71 unchanged - 2 
fixed = 71 total (was 73) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
57s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 12s{color} | {color:green}{color} | 

[jira] [Commented] (YARN-10334) TestDistributedShell leaks resources on timeout/failure

2020-12-17 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251298#comment-17251298
 ] 

Ahmed Hussein commented on YARN-10334:
--

Those are the steps going to fix the problem 
* YARN-10536 is going to make the thread responsive in. handling exceptions.
* Pass {{timeout}} argument to the {{DistributedShell.Client}}. This timeout 
has to be smaller than the {{TestDistributedShell.timeout}} rule.
* Optional: Client and YarnClient have no interfaces to shutdown/close. Adding 
such methods to be accessed by the unit tests will be a good addition in order 
to clean out the code.

> TestDistributedShell leaks resources on timeout/failure
> ---
>
> Key: YARN-10334
> URL: https://issues.apache.org/jira/browse/YARN-10334
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-shell, test, yarn
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: newbie, test
>
> {{TestDistributedShell}} times out on trunk. I found that the application, 
> and containers will stay running in the background long after the unit test 
> has failed.
> This causes failure of other test cases and several false positives failures 
> as result of:
> * Ports will stay busy, so other tests cases fail to launch.
> * Unit tests fail because of memory restrictions.
> Although the unit test is already broken on trunk, we do not want its 
> failures to other unit tests.
> {{TestDistributedShell}} needs to be revisited to make sure that all 
> {{YarnClients}}, and {{YarnApplications}} are closed properly at the end of 
> the each unit test (including exception and timeouts)
> Steps to reproduce:
> {code:bash}
> mvn test -Dtest=TestDistributedShell#testDSShellWithOpportunisticContainers
> ## this will timeout as
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 90.234 s <<< FAILURE! - in 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
> [ERROR] 
> testDSShellWithOpportunisticContainers(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 90.018 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 9 
> milliseconds
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:1117)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:1089)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers(TestDistributedShell.java:1438)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> [INFO] 
> [INFO] Results:
> [INFO] 
> [ERROR] Errors: 
> [ERROR]   TestDistributedShell.testDSShellWithOpportunisticContainers:1438 » 
> TestTimedOut
> [INFO] 
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
> {code}
> Using {{ps}} command, you can find the yarn processes are still in the 
> background
> {code:bash}
> /bin/bash -c $JRE_HOME/bin/java -Xmx512m 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster 
> --container_type OPPORTUNISTIC --container_memory 128 --container_vcores 1 
> --num_containers 2 --priority 0 --appname DistributedShell --homedir 
> file:/Users/ahussein 
> 

[jira] [Commented] (YARN-10499) TestRouterWebServicesREST fails

2020-12-17 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251292#comment-17251292
 ] 

Ahmed Hussein commented on YARN-10499:
--

[~aajisaka] .. You are the man :)

It feels great to see the failing list down to:

 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/358/#showFailuresLink
{code:bash}
Test Result (6 failures / -202)
org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks.testSetRepIncWithUnderReplicatedBlocks
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testReadLockCanBeDisabledByConfig
org.apache.hadoop.yarn.sls.appmaster.TestAMSimulator.testAMSimulatorWithNodeLabels[1]
org.apache.hadoop.tools.dynamometer.TestDynamometerInfra.org.apache.hadoop.tools.dynamometer.TestDynamometerInfra
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType
{code}


 

> TestRouterWebServicesREST fails
> ---
>
> Key: YARN-10499
> URL: https://issues.apache.org/jira/browse/YARN-10499
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: 
> patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.txt
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2488/1/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn.txt]
> {noformat}
> [ERROR] Failures: 
> [ERROR]   
> TestRouterWebServicesREST.testAppAttemptXML:720->performGetCalls:274 
> expected:<200> but was:<204>
> [ERROR]   
> TestRouterWebServicesREST.testAppPriorityXML:796->performGetCalls:274 
> expected:<200> but was:<204>
> [ERROR]   TestRouterWebServicesREST.testAppQueueXML:846->performGetCalls:274 
> expected:<200> but was:<204>
> [ERROR]   TestRouterWebServicesREST.testAppStateXML:744->performGetCalls:274 
> expected:<200> but was:<204>
> [ERROR]   
> TestRouterWebServicesREST.testAppTimeoutXML:920->performGetCalls:274 
> expected:<200> but was:<204>
> [ERROR]   
> TestRouterWebServicesREST.testAppTimeoutsXML:896->performGetCalls:274 
> expected:<200> but was:<204>
> [ERROR]   TestRouterWebServicesREST.testAppXML:696->performGetCalls:274 
> expected:<200> but was:<204>
> [ERROR]   TestRouterWebServicesREST.testUpdateAppPriorityXML:832 
> expected:<200> but was:<500>
> [ERROR]   TestRouterWebServicesREST.testUpdateAppQueueXML:882 expected:<200> 
> but was:<500>
> [ERROR]   TestRouterWebServicesREST.testUpdateAppStateXML:782 expected:<202> 
> but was:<500>
> [ERROR] Errors: 
> [ERROR]   
> TestRouterWebServicesREST.testGetAppAttemptXML:1292->getAppAttempt:1464 » 
> ClientHandler
> [ERROR]   
> TestRouterWebServicesREST.testGetAppsMultiThread:1337->testGetContainersXML:1317->getAppAttempt:1464
>  » ClientHandler
> [ERROR]   
> TestRouterWebServicesREST.testGetContainersXML:1317->getAppAttempt:1464 » 
> ClientHandler {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10536) Client in distributedShell swallows interrupt exceptions

2020-12-17 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251173#comment-17251173
 ] 

Ahmed Hussein commented on YARN-10536:
--

[~ayushsaxena], [~inigoiri], [~epayne]
Can you please take a look at that small change?
After it is gets merged I will work on YARN-10536 to reduce the overhead of 
running those tests.

> Client in distributedShell swallows interrupt exceptions
> 
>
> Key: YARN-10536
> URL: https://issues.apache.org/jira/browse/YARN-10536
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, distributed-shell
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In {{applications.distributedshell.Client}} , the method 
> {{monitorApplication}} loops waiting for the following conditions:
> * Application fails: reaches {{YarnApplicationState.KILLED}}, or 
> {{YarnApplicationState.FAILED}}
> * Application succeeds: {{FinalApplicationStatus.SUCCEEDED}} or 
> {{YarnApplicationState.FINISHED}}
> * the time spent waiting is longer than {{clientTimeout}} (if it exists in 
> the parameters).
> When the Client thread is interrupted, it ignores the exception:
> {code:java}
>   // Check app status every 1 second.
>   try {
> Thread.sleep(1000);
>   } catch (InterruptedException e) {
> LOG.debug("Thread sleep in monitoring loop interrupted");
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10169) Mixed absolute resource value and percentage-based resource value in CapacityScheduler should fail

2020-12-17 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251110#comment-17251110
 ] 

zhuqi edited comment on YARN-10169 at 12/17/20, 2:00 PM:
-

[~leftnoteasy] [~pbacsko], [~snemeth], [~sunilg], [~bteke]

When i write a unit test, to confirm that percentage below absolute, the queue 
will always get wrong resource, 100 of the parent.

Becasue of the logic:
{code:java}
//Set absolute capacities for {capacity, maximum-capacity}
private static void updateAbsoluteCapacitiesByNodeLabels(
QueueCapacities queueCapacities, QueueCapacities parentQueueCapacities) {
  for (String label : queueCapacities.getExistingNodeLabels()) {
float capacity = queueCapacities.getCapacity(label);
if (capacity > 0f) {
  queueCapacities.setAbsoluteCapacity(
  label,
  capacity
  * (parentQueueCapacities == null ? 1 : parentQueueCapacities
  .getAbsoluteCapacity(label)));
}

float maxCapacity = queueCapacities.getMaximumCapacity(label);
if (maxCapacity > 0f) {
  queueCapacities.setAbsoluteMaximumCapacity(
  label,
  maxCapacity
  * (parentQueueCapacities == null ? 1 : parentQueueCapacities
  .getAbsoluteMaximumCapacity(label)));
}
  }
}
{code}
If we use absolute in parent, and also the capacity use the single absolute 
resource. The parent's parentQueueCapacities will be null,  then the value will 
be maxCapacity, here is the maxCapacity related code:
{code:java}
public float getNonLabeledQueueMaximumCapacity(String queue) {
  String configuredCapacity = get(getQueuePrefix(queue) + MAXIMUM_CAPACITY);
  boolean matcher = (configuredCapacity != null)
  && RESOURCE_PATTERN.matcher(configuredCapacity).find();
  if (matcher) {
// Return capacity in percentage as 0 for non-root queues and 100 for
// root.From AbstractCSQueue, absolute resource will be parsed and
// updated. Once nodes are added/removed in cluster, capacity in
// percentage will also be re-calculated.
return 100.0f;
  }

  float maxCapacity = (configuredCapacity == null)
  ? MAXIMUM_CAPACITY_VALUE
  : Float.parseFloat(configuredCapacity);
  maxCapacity = (maxCapacity == DEFAULT_MAXIMUM_CAPACITY_VALUE)
  ? MAXIMUM_CAPACITY_VALUE
  : maxCapacity;
  return maxCapacity;
}
{code}
In capacity absolute resource mode, it will return 100.0f, then the maxCapacity 
will be 100.

We should change it to support mixed mode,and if we now use mixed mode in 
maxCapacity, it will not throw exception, but it will be wrong.

Also we should fix it in auto create queue.


was (Author: zhuqi):
[~leftnoteasy] 

When i write a unit test, to confirm that percentage below absolute, the queue 
will always get wrong resource, 100 of the parent.

Becasue of the logic:
{code:java}
//Set absolute capacities for {capacity, maximum-capacity}
private static void updateAbsoluteCapacitiesByNodeLabels(
QueueCapacities queueCapacities, QueueCapacities parentQueueCapacities) {
  for (String label : queueCapacities.getExistingNodeLabels()) {
float capacity = queueCapacities.getCapacity(label);
if (capacity > 0f) {
  queueCapacities.setAbsoluteCapacity(
  label,
  capacity
  * (parentQueueCapacities == null ? 1 : parentQueueCapacities
  .getAbsoluteCapacity(label)));
}

float maxCapacity = queueCapacities.getMaximumCapacity(label);
if (maxCapacity > 0f) {
  queueCapacities.setAbsoluteMaximumCapacity(
  label,
  maxCapacity
  * (parentQueueCapacities == null ? 1 : parentQueueCapacities
  .getAbsoluteMaximumCapacity(label)));
}
  }
}
{code}
If we use absolute in parent, and also the capacity use the single absolute 
resource. The parent's parentQueueCapacities will be null,  then the value will 
be maxCapacity, here is the maxCapacity related code:
{code:java}
public float getNonLabeledQueueMaximumCapacity(String queue) {
  String configuredCapacity = get(getQueuePrefix(queue) + MAXIMUM_CAPACITY);
  boolean matcher = (configuredCapacity != null)
  && RESOURCE_PATTERN.matcher(configuredCapacity).find();
  if (matcher) {
// Return capacity in percentage as 0 for non-root queues and 100 for
// root.From AbstractCSQueue, absolute resource will be parsed and
// updated. Once nodes are added/removed in cluster, capacity in
// percentage will also be re-calculated.
return 100.0f;
  }

  float maxCapacity = (configuredCapacity == null)
  ? MAXIMUM_CAPACITY_VALUE
  : Float.parseFloat(configuredCapacity);
  maxCapacity = (maxCapacity == DEFAULT_MAXIMUM_CAPACITY_VALUE)
  ? MAXIMUM_CAPACITY_VALUE
  : maxCapacity;
  return maxCapacity;
}
{code}
In capacity absolute resource mode, it will return 100.0f, then the maxCapacity 
will be 100.

We should change it to support mixed 

[jira] [Commented] (YARN-10169) Mixed absolute resource value and percentage-based resource value in CapacityScheduler should fail

2020-12-17 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251110#comment-17251110
 ] 

zhuqi commented on YARN-10169:
--

[~leftnoteasy] 

When i write a unit test, to confirm that percentage below absolute, the queue 
will always get wrong resource, 100 of the parent.

Becasue of the logic:
{code:java}
//Set absolute capacities for {capacity, maximum-capacity}
private static void updateAbsoluteCapacitiesByNodeLabels(
QueueCapacities queueCapacities, QueueCapacities parentQueueCapacities) {
  for (String label : queueCapacities.getExistingNodeLabels()) {
float capacity = queueCapacities.getCapacity(label);
if (capacity > 0f) {
  queueCapacities.setAbsoluteCapacity(
  label,
  capacity
  * (parentQueueCapacities == null ? 1 : parentQueueCapacities
  .getAbsoluteCapacity(label)));
}

float maxCapacity = queueCapacities.getMaximumCapacity(label);
if (maxCapacity > 0f) {
  queueCapacities.setAbsoluteMaximumCapacity(
  label,
  maxCapacity
  * (parentQueueCapacities == null ? 1 : parentQueueCapacities
  .getAbsoluteMaximumCapacity(label)));
}
  }
}
{code}
If we use absolute in parent, and also the capacity use the single absolute 
resource. The parent's parentQueueCapacities will be null,  then the value will 
be maxCapacity, here is the maxCapacity related code:
{code:java}
public float getNonLabeledQueueMaximumCapacity(String queue) {
  String configuredCapacity = get(getQueuePrefix(queue) + MAXIMUM_CAPACITY);
  boolean matcher = (configuredCapacity != null)
  && RESOURCE_PATTERN.matcher(configuredCapacity).find();
  if (matcher) {
// Return capacity in percentage as 0 for non-root queues and 100 for
// root.From AbstractCSQueue, absolute resource will be parsed and
// updated. Once nodes are added/removed in cluster, capacity in
// percentage will also be re-calculated.
return 100.0f;
  }

  float maxCapacity = (configuredCapacity == null)
  ? MAXIMUM_CAPACITY_VALUE
  : Float.parseFloat(configuredCapacity);
  maxCapacity = (maxCapacity == DEFAULT_MAXIMUM_CAPACITY_VALUE)
  ? MAXIMUM_CAPACITY_VALUE
  : maxCapacity;
  return maxCapacity;
}
{code}
In capacity absolute resource mode, it will return 100.0f, then the maxCapacity 
will be 100.

We should change it to support mixed mode,and if we now use mixed mode in 
maxCapacity, it will not throw exception, but it will be wrong.

 

> Mixed absolute resource value and percentage-based resource value in 
> CapacityScheduler should fail
> --
>
> Key: YARN-10169
> URL: https://issues.apache.org/jira/browse/YARN-10169
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: zhuqi
>Priority: Blocker
> Attachments: YARN-10169.001.patch, YARN-10169.002.patch, 
> YARN-10169.003.patch
>
>
> To me this is a bug: if there's a queue has capacity set to float, and 
> maximum-capacity set to absolute value. Existing logic allows the behavior.
> For example:
> {code:java}
> queue.capacity = 0.8 
> queue.maximum-capacity = [mem=x, vcore=y] {code}
> We should throw exception when configured like this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10528) maxAMShare should only be accepted for leaf queues, not parent queues

2020-12-17 Thread Siddharth Ahuja (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251020#comment-17251020
 ] 

Siddharth Ahuja commented on YARN-10528:


Thank you [~snemeth]! Please take your time.

> maxAMShare should only be accepted for leaf queues, not parent queues
> -
>
> Key: YARN-10528
> URL: https://issues.apache.org/jira/browse/YARN-10528
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Major
> Attachments: YARN-10528.001.patch, maxAMShare for root.users (parent 
> queue) has no effect as child queue does not inherit it.png
>
>
> Based on [Hadoop 
> documentation|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html],
>  it is clear that {{maxAMShare}} property can only be used for *leaf queues*. 
> This is similar to the {{reservation}} setting.
> However, existing code only ensures that the reservation setting is not 
> accepted for "parent" queues (see 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/allocation/AllocationFileQueueParser.java#L226
>  and 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/allocation/AllocationFileQueueParser.java#L233)
>  but it is missing the checks for {{maxAMShare}}. Due to this, it is 
> currently possible to have an allocation similar to below:
> {code}
> 
> 
> 
> 1.0
> drf
> *
> *
> 
> 1.0
> drf
> 
> 
> 1.0
> drf
> 1.0
> 
> 
> fair
> 
> 
> 
> 
> 
> 
> 
> 
> {code}
> where {{maxAMShare}} is 1.0f meaning, it is possible allocate 100% of the 
> queue's resources for Application Masters. Notice above that root.users is a 
> parent queue, however, it still gladly accepts {{maxAMShare}}. This is 
> contrary to the documentation and in fact, it is very misleading because the 
> child queues like root.users. actually do not inherit this setting at 
> all and they still go on and use the default of 0.5 instead of 1.0, see the 
> attached screenshot as an example.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10528) maxAMShare should only be accepted for leaf queues, not parent queues

2020-12-17 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250993#comment-17250993
 ] 

Szilard Nemeth commented on YARN-10528:
---

Thanks [~sahuja],

Very detailed description and testing steps.

Will take a look at your patch soon.

> maxAMShare should only be accepted for leaf queues, not parent queues
> -
>
> Key: YARN-10528
> URL: https://issues.apache.org/jira/browse/YARN-10528
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Major
> Attachments: YARN-10528.001.patch, maxAMShare for root.users (parent 
> queue) has no effect as child queue does not inherit it.png
>
>
> Based on [Hadoop 
> documentation|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html],
>  it is clear that {{maxAMShare}} property can only be used for *leaf queues*. 
> This is similar to the {{reservation}} setting.
> However, existing code only ensures that the reservation setting is not 
> accepted for "parent" queues (see 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/allocation/AllocationFileQueueParser.java#L226
>  and 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/allocation/AllocationFileQueueParser.java#L233)
>  but it is missing the checks for {{maxAMShare}}. Due to this, it is 
> currently possible to have an allocation similar to below:
> {code}
> 
> 
> 
> 1.0
> drf
> *
> *
> 
> 1.0
> drf
> 
> 
> 1.0
> drf
> 1.0
> 
> 
> fair
> 
> 
> 
> 
> 
> 
> 
> 
> {code}
> where {{maxAMShare}} is 1.0f meaning, it is possible allocate 100% of the 
> queue's resources for Application Masters. Notice above that root.users is a 
> parent queue, however, it still gladly accepts {{maxAMShare}}. This is 
> contrary to the documentation and in fact, it is very misleading because the 
> child queues like root.users. actually do not inherit this setting at 
> all and they still go on and use the default of 0.5 instead of 1.0, see the 
> attached screenshot as an example.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org