[jira] [Updated] (YARN-10769) Reconnect transition in RMNodeImpl all containers are considered as G

2021-05-31 Thread Cyrus Jackson (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyrus Jackson updated YARN-10769:
-
Attachment: YARN-10769.002.patch

> Reconnect transition in RMNodeImpl all containers are considered as G 
> --
>
> Key: YARN-10769
> URL: https://issues.apache.org/jira/browse/YARN-10769
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin Chundatt
>Assignee: Cyrus Jackson
>Priority: Minor
> Attachments: YARN-10769.001.patch, YARN-10769.002.patch
>
>
> In RMNodeImpl#handleNMContainerStatus  *createContainerStatus* is not 
> considering the container execution type for creating ContainerStatus



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10795) Improve Capacity Scheduler reinitialisation performance

2021-05-31 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354809#comment-17354809
 ] 

Qi Zhu commented on YARN-10795:
---

Thanks [~gandras] for this work.

It will be very helpful to clusters with many queues. :D

> Improve Capacity Scheduler reinitialisation performance
> ---
>
> Key: YARN-10795
> URL: https://issues.apache.org/jira/browse/YARN-10795
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Andras Gyori
>Priority: Major
>
> Mostly due to CapacitySchedulerConfiguration#getPropsWithPrefix or similar 
> methods, the CapacityScheduler#reinit method has some quadratic complexity 
> part with respect to queue numbers. Over 1000+ queues, it is a matter of 
> minutes, which is too high to be a viable option when it is used in mutation 
> api.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10787) Queue submit ACL check is wrong when CS queue is ambiguous

2021-05-31 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354797#comment-17354797
 ] 

Hadoop QA commented on YARN-10787:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
30s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
58s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 15s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m 
29s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
52s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 40s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1030/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 8 new + 21 unchanged - 2 fixed = 29 total (was 23) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  8s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:g

[jira] [Updated] (YARN-10796) Capacity Scheduler: dynamic queue cannot scale out properly if its capacity is 0%

2021-05-31 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10796:

Summary: Capacity Scheduler: dynamic queue cannot scale out properly if its 
capacity is 0%  (was: Capacity Scheduler: dynamic queue cannot scale out 
properly if it's capacity is 0%)

> Capacity Scheduler: dynamic queue cannot scale out properly if its capacity 
> is 0%
> -
>
> Key: YARN-10796
> URL: https://issues.apache.org/jira/browse/YARN-10796
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: capacity scheduler, capacityscheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> If we have a dynamic queue (AutoCreatedLeafQueue) with capacity = 0%, then it 
> cannot properly scale even if it's max-capacity and the parent's max-capacity 
> would allow it.
> Example:
> {noformat}
> Cluster Capacity:  16 GB / 16cpu (2 nodes, each with 8 GB / 8 cpu )
> Container allocation size: 1G / 1 vcore
> root.dynamic 
> Effective Capacity:   ( 50.0%)
> Effective Max Capacity:   (100.0%) 
> Template:
> Capacity:   40%
> Max Capacity:   100%
> User Limit Factor:  4
>  {noformat}
> leaf-queue-template.capacity = 40%
>  leaf-queue-template.maximum-capacity = 100%
>  leaf-queue-template.maximum-am-resource-percent = 50%
>  leaf-queue-template.minimum-user-limit-percent =100%
>  leaf-queue-template.user-limit-factor = 4
> "root.dynamic" has a maximum capacity of 100% and a capacity of 50%.
> Let's assume there are running containers in these dynamic queues (MR sleep 
> jobs):
>  root.dynamic.user1 = 1 AM + 3 container (capacity = 40%)
>  root.dynamic.user2 = 1 AM + 3 container (capacity = 40%)
>  root.dynamic.user3 = 1 AM + 15 container (capacity = 0%)
> This scenario will result in an underutilized cluster. There will be approx 
> 18% unused capacity. On the other hand, it's still possible to submit a new 
> application to root.dynamic.user1 or root.dynamic.user2 and reaching a 100% 
> utilization is possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10796) Capacity Scheduler: dynamic queue cannot scale out properly if it's capacity is 0%

2021-05-31 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10796:

Description: 
If we have a dynamic queue (AutoCreatedLeafQueue) with capacity = 0%, then it 
cannot properly scale even if it's max-capacity and the parent's max-capacity 
would allow it.

Example:
{noformat}
Cluster Capacity:  16 GB / 16cpu (2 nodes, each with 8 GB / 8 cpu )
Container allocation size: 1G / 1 vcore

root.dynamic 
Effective Capacity:   ( 50.0%)
Effective Max Capacity:   (100.0%) 

Template:
Capacity:   40%
Max Capacity:   100%
User Limit Factor:  4
 {noformat}
leaf-queue-template.capacity = 40%
 leaf-queue-template.maximum-capacity = 100%
 leaf-queue-template.maximum-am-resource-percent = 50%
 leaf-queue-template.minimum-user-limit-percent =100%
 leaf-queue-template.user-limit-factor = 4

"root.dynamic" has a maximum capacity of 100% and a capacity of 50%.

Let's assume there are running containers in these dynamic queues (MR sleep 
jobs):
 root.dynamic.user1 = 1 AM + 3 container (capacity = 40%)
 root.dynamic.user2 = 1 AM + 3 container (capacity = 40%)
 root.dynamic.user3 = 1 AM + 15 container (capacity = 0%)

This scenario will result in an underutilized cluster. There will be approx 18% 
unused capacity. On the other hand, it's still possible to submit a new 
application to root.dynamic.user1 or root.dynamic.user2 and reaching a 100% 
utilization is possible.

  was:
If we have a dynamic queue (AutoCreatedLeafQueue) with capacity = 0%, then it 
cannot properly scale even if it's max-capacity and the parent's max-capacity 
would allow it.

Example:
{noformat}
Cluster Capacity:  16 GB / 16cpu (2 nodes, each with 8 GB / 8 cpu )
Container allocation size: 1G / 1 vcore

Root.dynamic 
Effective Capacity:   ( 50.0%)
Effective Max Capacity:   (100.0%) 

Template:
Capacity:   40%
Max Capacity:   100%
User Limit Factor:  4
 {noformat}
leaf-queue-template.capacity = 40%
 leaf-queue-template.maximum-capacity = 100%
 leaf-queue-template.maximum-am-resource-percent = 50%
 leaf-queue-template.minimum-user-limit-percent =100%
 leaf-queue-template.user-limit-factor = 4

"root.dynamic" has a maximum capacity of 100% and a capacity of 50%.

Let's assume there are running containers in these dynamic queues (MR sleep 
jobs):
 root.dynamic.user1 = 1 AM + 3 container (capacity = 40%)
 root.dynamic.user2 = 1 AM + 3 container (capacity = 40%)
 root.dynamic.user3 = 1 AM + 15 container (capacity = 0%)

This scenario will result in an underutilized cluster. There will be approx 18% 
unused capacity. On the other hand, it's still possible to submit a new 
application to root.dynamic.user1 or root.dynamic.user2 and reaching a 100% 
utilization is possible.


> Capacity Scheduler: dynamic queue cannot scale out properly if it's capacity 
> is 0%
> --
>
> Key: YARN-10796
> URL: https://issues.apache.org/jira/browse/YARN-10796
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: capacity scheduler, capacityscheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> If we have a dynamic queue (AutoCreatedLeafQueue) with capacity = 0%, then it 
> cannot properly scale even if it's max-capacity and the parent's max-capacity 
> would allow it.
> Example:
> {noformat}
> Cluster Capacity:  16 GB / 16cpu (2 nodes, each with 8 GB / 8 cpu )
> Container allocation size: 1G / 1 vcore
> root.dynamic 
> Effective Capacity:   ( 50.0%)
> Effective Max Capacity:   (100.0%) 
> Template:
> Capacity:   40%
> Max Capacity:   100%
> User Limit Factor:  4
>  {noformat}
> leaf-queue-template.capacity = 40%
>  leaf-queue-template.maximum-capacity = 100%
>  leaf-queue-template.maximum-am-resource-percent = 50%
>  leaf-queue-template.minimum-user-limit-percent =100%
>  leaf-queue-template.user-limit-factor = 4
> "root.dynamic" has a maximum capacity of 100% and a capacity of 50%.
> Let's assume there are running containers in these dynamic queues (MR sleep 
> jobs):
>  root.dynamic.user1 = 1 AM + 3 container (capacity = 40%)
>  root.dynamic.user2 = 1 AM + 3 container (capacity = 40%)
>  root.dynamic.user3 = 1 AM + 15 container (capacity = 0%)
> This scenario will result in an underutilized cluster. There will be approx 
> 18% unused capacity. On the other hand, it's still possible to submit a new 
> application to root.dynamic.user1 or root.dynamic.user2 and reaching a 100% 
> utilization is possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issue

[jira] [Created] (YARN-10796) Capacity Scheduler: dynamic queue cannot scale out properly if it's capacity is 0%

2021-05-31 Thread Peter Bacsko (Jira)
Peter Bacsko created YARN-10796:
---

 Summary: Capacity Scheduler: dynamic queue cannot scale out 
properly if it's capacity is 0%
 Key: YARN-10796
 URL: https://issues.apache.org/jira/browse/YARN-10796
 Project: Hadoop YARN
  Issue Type: Task
  Components: capacity scheduler, capacityscheduler
Reporter: Peter Bacsko
Assignee: Peter Bacsko


If we have a dynamic queue (AutoCreatedLeafQueue) with capacity = 0%, then it 
cannot properly scale even if it's max-capacity and the parent's max-capacity 
would allow it.

Example:
{noformat}
Cluster Capacity:  16 GB / 16cpu (2 nodes, each with 8 GB / 8 cpu )
Container allocation size: 1G / 1 vcore

Root.dynamic 
Effective Capacity:   ( 50.0%)
Effective Max Capacity:   (100.0%) 

Template:
Capacity:   40%
Max Capacity:   100%
User Limit Factor:  4
 {noformat}
leaf-queue-template.capacity = 40%
 leaf-queue-template.maximum-capacity = 100%
 leaf-queue-template.maximum-am-resource-percent = 50%
 leaf-queue-template.minimum-user-limit-percent =100%
 leaf-queue-template.user-limit-factor = 4

"root.dynamic" has a maximum capacity of 100% and a capacity of 50%.

Let's assume there are running containers in these dynamic queues (MR sleep 
jobs):
 root.dynamic.user1 = 1 AM + 3 container (capacity = 40%)
 root.dynamic.user2 = 1 AM + 3 container (capacity = 40%)
 root.dynamic.user3 = 1 AM + 15 container (capacity = 0%)

This scenario will result in an underutilized cluster. There will be approx 18% 
unused capacity. On the other hand, it's still possible to submit a new 
application to root.dynamic.user1 or root.dynamic.user2 and reaching a 100% 
utilization is possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10233) [YARN UI2] No Logs were found in "YARN Daemon Logs" page

2021-05-31 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354401#comment-17354401
 ] 

Wei-Chiu Chuang commented on YARN-10233:


I don't see this commit landed in 3.3.0 and 3.3.1. Reset target version.

> [YARN UI2] No Logs were found in "YARN Daemon Logs" page
> 
>
> Key: YARN-10233
> URL: https://issues.apache.org/jira/browse/YARN-10233
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Blocker
> Fix For: 3.4.0
>
> Attachments: YARN-10233.001.patch, 
> YARN_UI2_Tools_Daemon_Logs_Page_Fixed.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10233) [YARN UI2] No Logs were found in "YARN Daemon Logs" page

2021-05-31 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated YARN-10233:
---
Fix Version/s: (was: 3.3.0)

> [YARN UI2] No Logs were found in "YARN Daemon Logs" page
> 
>
> Key: YARN-10233
> URL: https://issues.apache.org/jira/browse/YARN-10233
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Blocker
> Fix For: 3.4.0
>
> Attachments: YARN-10233.001.patch, 
> YARN_UI2_Tools_Daemon_Logs_Page_Fixed.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10233) [YARN UI2] No Logs were found in "YARN Daemon Logs" page

2021-05-31 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated YARN-10233:
---
Target Version/s: 3.3.2  (was: 3.3.0)

> [YARN UI2] No Logs were found in "YARN Daemon Logs" page
> 
>
> Key: YARN-10233
> URL: https://issues.apache.org/jira/browse/YARN-10233
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Blocker
> Fix For: 3.4.0
>
> Attachments: YARN-10233.001.patch, 
> YARN_UI2_Tools_Daemon_Logs_Page_Fixed.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10780) Optimise retrieval of configured node labels in CS queues

2021-05-31 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354394#comment-17354394
 ] 

Hadoop QA commented on YARN-10780:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 
28s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 0s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 45s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 17m 
58s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
49s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 43s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1028/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 2 new + 206 unchanged - 2 fixed = 208 total (was 208) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 44s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {colo

[jira] [Commented] (YARN-10124) Remove restriction of ParentQueue capacity zero when childCapacities > 0

2021-05-31 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354393#comment-17354393
 ] 

Hadoop QA commented on YARN-10124:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 13s{color} 
| {color:red}{color} | {color:red} YARN-10124 does not apply to trunk. Rebase 
required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for 
help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-10124 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12994612/YARN-10124-002.patch |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1029/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Remove restriction of ParentQueue capacity zero when childCapacities > 0
> 
>
> Key: YARN-10124
> URL: https://issues.apache.org/jira/browse/YARN-10124
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10124-001.patch, YARN-10124-002.patch
>
>
> ParentQueue capacity cannot be set to 0 when child capacities > 0. To disable 
> a parent queue temporarily, user can only STOP the queue but the capacity of 
> the queue cannot be used for other queues. Allowing 0 capacity for parent 
> queue will allow user to use the capacity for other queues and also to retain 
> the child queue capacity values. (else user has to set all child queue 
> capacities to 0)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10124) Remove restriction of ParentQueue capacity zero when childCapacities > 0

2021-05-31 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated YARN-10124:
---
Target Version/s: 3.3.2

> Remove restriction of ParentQueue capacity zero when childCapacities > 0
> 
>
> Key: YARN-10124
> URL: https://issues.apache.org/jira/browse/YARN-10124
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10124-001.patch, YARN-10124-002.patch
>
>
> ParentQueue capacity cannot be set to 0 when child capacities > 0. To disable 
> a parent queue temporarily, user can only STOP the queue but the capacity of 
> the queue cannot be used for other queues. Allowing 0 capacity for parent 
> queue will allow user to use the capacity for other queues and also to retain 
> the child queue capacity values. (else user has to set all child queue 
> capacities to 0)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10794) Submitting jobs to a single subcluster will fail while AMRMProxy is enabled

2021-05-31 Thread Song Jiacheng (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Song Jiacheng updated YARN-10794:
-
Description: 
Sorry for not knowing how to quote a issue...

https://issues.apache.org/jira/browse/YARN-9693

This issue has already raised this problem, but it seems that I can't submit 
job by the federation client while using the patch.

The original reason of this problem is that NM will set a local AMRMToken for 
AM if AMRMProxy is enabled, so that AM will fail if it contact with RM directly.

This problem makes it impossible to rolling upgrade to federation, cause we 
can't upgrade all the NMs and clients at one moment

So I developed another patch, using this patch I can submit jobs via the both 
ways.

My solution is that hold two tokens at the same time, and choose a right one 
during the building of RPC Client.

I tested this patch in some situations like AM recover, NM recover, no error 
found.

But still, I can't ensure this patch is good, so i wonder if there is a better 
solution.

 

  was:
Sorry for not knowing how to quote a issue...

https://issues.apache.org/jira/browse/YARN-9693

This issue has already raised this problem, but it seems that I can't submit 
job by the federation client while using the patch.

This problem makes it impossible to rolling upgrade to federation, cause we 
can't upgrade all the NMs and clients at one moment

So I developed another patch, using this patch I can submit jobs via the both 
ways.

I tested this patch in some situations like AM recover, NM recover, no error 
found.

But still, I can't ensure this patch is good, so i wonder if there is a better 
solution.

 


> Submitting jobs to a single subcluster will fail while AMRMProxy is enabled
> ---
>
> Key: YARN-10794
> URL: https://issues.apache.org/jira/browse/YARN-10794
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.2.1
>Reporter: Song Jiacheng
>Priority: Major
> Attachments: YARN-10794.v1.patch, YARN-10794.v2.patch
>
>
> Sorry for not knowing how to quote a issue...
> https://issues.apache.org/jira/browse/YARN-9693
> This issue has already raised this problem, but it seems that I can't submit 
> job by the federation client while using the patch.
> The original reason of this problem is that NM will set a local AMRMToken for 
> AM if AMRMProxy is enabled, so that AM will fail if it contact with RM 
> directly.
> This problem makes it impossible to rolling upgrade to federation, cause we 
> can't upgrade all the NMs and clients at one moment
> So I developed another patch, using this patch I can submit jobs via the both 
> ways.
> My solution is that hold two tokens at the same time, and choose a right one 
> during the building of RPC Client.
> I tested this patch in some situations like AM recover, NM recover, no error 
> found.
> But still, I can't ensure this patch is good, so i wonder if there is a 
> better solution.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-10775) Federation: Yarn running app web can't be unable to connect, because AppMaster can't redirect to the right address.

2021-05-31 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated YARN-10775:
---
Comment: was deleted

(was: I think maybe we need to construct a proxy server in nm to proxy am's web 
request, though it is tedious。)

> Federation: Yarn running app web can't be unable to connect, because 
> AppMaster can't redirect to the right address. 
> 
>
> Key: YARN-10775
> URL: https://issues.apache.org/jira/browse/YARN-10775
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>
> I setup a yarn federation cluster, I can't connect the running app web, but 
> the completed app's web works.
> By some debug info, I found the reason. Because AmIpFilter.PROXY_HOSTS is set 
> by configuration (do it in 
> AmFilterInitializer.getProxyHostsAndPortsForAmFilter). In client 
> configuration, there are no configuation about 'rm web port', so the url 
> can't not be redirect. 
> I think client should not know rm's web port, especially in yarn federation 
> mode.
> Note:
> Here is my specially configuration for client. I use ha for select router 
> randomly.
> || config || value ||
> | yarn.resourcemanager.ha.enabled   | true |
> | yarn.resourcemanager.ha.rm-ids | r1 |
> |yarn.resourcemanager.address.r1 | router_host:8050 |
> |yarn.resourcemanager.scheduler.address.r1 | localhost:8049 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10794) Submitting jobs to a single subcluster will fail while AMRMProxy is enabled

2021-05-31 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354374#comment-17354374
 ] 

Hadoop QA commented on YARN-10794:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
18s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
49s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
11s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
35s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
50s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
49s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
53s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
20m 24s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
49s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
40s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 27m 
50s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  3m 
58s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
44s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 
44s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m  
9s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m  
9s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
45s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  1

[jira] [Commented] (YARN-10789) RM HA startup can fail due to race conditions in ZKConfigurationStore

2021-05-31 Thread Tarun Parimi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354360#comment-17354360
 ] 

Tarun Parimi commented on YARN-10789:
-

Thanks [~snemeth] . Please also take a look at this when you get time.

> RM HA startup can fail due to race conditions in ZKConfigurationStore
> -
>
> Key: YARN-10789
> URL: https://issues.apache.org/jira/browse/YARN-10789
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-10789.001.patch, YARN-10789.002.patch
>
>
> We are observing below error randomly during hadoop install and RM initial 
> startup when HA is enabled and yarn.scheduler.configuration.store.class=zk is 
> configured. This causes one of the RMs to not startup.
> {code:java}
> 2021-05-26 12:59:18,986 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state INITED
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
> NodeExists for /confstore/CONF_STORE
> {code}
> We are trying to create the znode /confstore/CONF_STORE when we initialize 
> the ZKConfigurationStore. But the problem is that the ZKConfigurationStore is 
> initialized when CapacityScheduler does a serviceInit. This serviceInit is 
> done by both Active and Standby RM. So we can run into a race condition when 
> both Active and Standby try to create the same znode when both RM are started 
> at same time.
> ZKRMStateStore on the other hand avoids such race conditions, by creating the 
> znodes only after serviceStart. serviceStart only happens for the active RM 
> which won the leader election, unlike serviceInit which happens irrespective 
> of leader election.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9542) Fix LogsCLI guessAppOwner ignores custom file format suffix

2021-05-31 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated YARN-9542:
--
Target Version/s:   (was: 3.3.1)

> Fix LogsCLI guessAppOwner ignores custom file format suffix
> ---
>
> Key: YARN-9542
> URL: https://issues.apache.org/jira/browse/YARN-9542
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Fix For: 3.2.1, 3.1.3
>
> Attachments: YARN-9542-001.patch, YARN-9542-branch-3.2.001.patch, 
> YARN-9542-branch-3.2.002.patch
>
>
> LogsCLI guessAppOwner ignores custom file format suffix 
> yarn.log-aggregation.%s.remote-app-log-dir-suffix / Default 
> IndexedFileController Suffix 
> ({yarn.nodemanager.remote-app-log-dir-suffix}-ifile or logs-ifile). It 
> considers only yarn.nodemanager.remote-app-log-dir-suffix or default logs.
> *Repro:*
> {code}
> yarn-site.xml
> yarn.log-aggregation.file-formats ifile
> yarn.log-aggregation.file-controller.ifile.class 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController
> yarn.log-aggregation.ifile.remote-app-log-dir app-logs
> yarn.resourcemanager.connect.max-wait.ms 1000
> core-site.xml:
> ipc.client.connect.max.retries 3
> ipc.client.connect.retry.interval 10
> Run a Job with above configs and Stop the RM.
> [ambari-qa@yarn-ats-1 ~]$ yarn logs -applicationId 
> application_1557482389195_0001
> 2019-05-10 10:03:58,215 INFO client.RMProxy: Connecting to ResourceManager at 
> yarn-ats-1/172.26.81.91:8050
> Unable to get ApplicationState. Attempting to fetch logs directly from the 
> filesystem.
> Can not find the appOwner. Please specify the correct appOwner
> Could not locate application logs for application_1557482389195_0001
> [ambari-qa@yarn-ats-1 ~]$ hadoop fs -ls 
> /app-logs/ambari-qa/logs-ifile/application_1557482389195_0001
> Found 1 items
> -rw-r-   3 ambari-qa supergroup  18058 2019-05-10 10:01 
> /app-logs/ambari-qa/logs-ifile/application_1557482389195_0001/yarn-ats-1_45454
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9542) Fix LogsCLI guessAppOwner ignores custom file format suffix

2021-05-31 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated YARN-9542:
--
Target Version/s: 3.3.1

> Fix LogsCLI guessAppOwner ignores custom file format suffix
> ---
>
> Key: YARN-9542
> URL: https://issues.apache.org/jira/browse/YARN-9542
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Fix For: 3.2.1, 3.1.3
>
> Attachments: YARN-9542-001.patch, YARN-9542-branch-3.2.001.patch, 
> YARN-9542-branch-3.2.002.patch
>
>
> LogsCLI guessAppOwner ignores custom file format suffix 
> yarn.log-aggregation.%s.remote-app-log-dir-suffix / Default 
> IndexedFileController Suffix 
> ({yarn.nodemanager.remote-app-log-dir-suffix}-ifile or logs-ifile). It 
> considers only yarn.nodemanager.remote-app-log-dir-suffix or default logs.
> *Repro:*
> {code}
> yarn-site.xml
> yarn.log-aggregation.file-formats ifile
> yarn.log-aggregation.file-controller.ifile.class 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController
> yarn.log-aggregation.ifile.remote-app-log-dir app-logs
> yarn.resourcemanager.connect.max-wait.ms 1000
> core-site.xml:
> ipc.client.connect.max.retries 3
> ipc.client.connect.retry.interval 10
> Run a Job with above configs and Stop the RM.
> [ambari-qa@yarn-ats-1 ~]$ yarn logs -applicationId 
> application_1557482389195_0001
> 2019-05-10 10:03:58,215 INFO client.RMProxy: Connecting to ResourceManager at 
> yarn-ats-1/172.26.81.91:8050
> Unable to get ApplicationState. Attempting to fetch logs directly from the 
> filesystem.
> Can not find the appOwner. Please specify the correct appOwner
> Could not locate application logs for application_1557482389195_0001
> [ambari-qa@yarn-ats-1 ~]$ hadoop fs -ls 
> /app-logs/ambari-qa/logs-ifile/application_1557482389195_0001
> Found 1 items
> -rw-r-   3 ambari-qa supergroup  18058 2019-05-10 10:01 
> /app-logs/ambari-qa/logs-ifile/application_1557482389195_0001/yarn-ats-1_45454
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10786) Federation:We can't access the AM page while using federation

2021-05-31 Thread Song Jiacheng (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354305#comment-17354305
 ] 

Song Jiacheng edited comment on YARN-10786 at 5/31/21, 8:45 AM:


[~zhengchenyu] Thanks for the comment.

I set yarn.web-proxy.address to all the subcluster webapp addresses, so that 
all the subcluster can access the AM pages.

I have thought about other solutions, but all of them change a lot and may 
break some other rules.


was (Author: song jiacheng):
[~zhengchenyu] Thanks for the comment.
{panel:title=我的标题}
In the other way, If we have more than one subcluster, this way may be not good.
{panel}
I set yarn.web-proxy.address to all the subcluster webapp addresses, so that 
all the subcluster can access the AM pages.

I have thought about other solutions, but all of them change a lot and may 
break some other rules.

> Federation:We can't access the AM page while using federation
> -
>
> Key: YARN-10786
> URL: https://issues.apache.org/jira/browse/YARN-10786
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.1
>Reporter: Song Jiacheng
>Priority: Major
>  Labels: federation
> Fix For: 3.2.1
>
> Attachments: YARN-10786.v1.patch, 
> n_v25156273211c049f8b396dcf15fcd9a84.png, 
> v1.1_dataplat_Hadoop平台_HDP3_2_1版本升级_YARN_26_Federation严重BUG--无法查看AM_WebHome_1621584160759-478.png
>
>
> The reason of this is that AM gets the proxy URI from config 
> yarn.web-proxy.address, and if it does not exist, it will get the URI from 
> yarn.resourcemanager.webapp.address.
> But in federation, we don't know which RM will be the home cluster of an 
> application, so I do this fix:
> 1. Add this config in the yarn-site.xml on client.
> 
>  yarn.web-proxy.address
>  rm1:9088,rm2:9088
>  
> 2. Change the way to get the config from Configuration#get to 
> Configuration#getStrings in WebAppUtils#getProxyHostsAndPortsForAmFilter.
> So that I can access the AM page now.
> This config needs to be added in the client side, so it will affect 
> application only.
> Before fixing, click the AM link in RM or Router:
> !v1.1_dataplat_Hadoop平台_HDP3_2_1版本升级_YARN_26_Federation严重BUG--无法查看AM_WebHome_1621584160759-478.png!
>  And after the fix, we can access the AM page as normal...
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10786) Federation:We can't access the AM page while using federation

2021-05-31 Thread Song Jiacheng (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354305#comment-17354305
 ] 

Song Jiacheng commented on YARN-10786:
--

[~zhengchenyu] Thanks for the comment.
{panel:title=我的标题}
In the other way, If we have more than one subcluster, this way may be not good.
{panel}
I set yarn.web-proxy.address to all the subcluster webapp addresses, so that 
all the subcluster can access the AM pages.

I have thought about other solutions, but all of them change a lot and may 
break some other rules.

> Federation:We can't access the AM page while using federation
> -
>
> Key: YARN-10786
> URL: https://issues.apache.org/jira/browse/YARN-10786
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.1
>Reporter: Song Jiacheng
>Priority: Major
>  Labels: federation
> Fix For: 3.2.1
>
> Attachments: YARN-10786.v1.patch, 
> n_v25156273211c049f8b396dcf15fcd9a84.png, 
> v1.1_dataplat_Hadoop平台_HDP3_2_1版本升级_YARN_26_Federation严重BUG--无法查看AM_WebHome_1621584160759-478.png
>
>
> The reason of this is that AM gets the proxy URI from config 
> yarn.web-proxy.address, and if it does not exist, it will get the URI from 
> yarn.resourcemanager.webapp.address.
> But in federation, we don't know which RM will be the home cluster of an 
> application, so I do this fix:
> 1. Add this config in the yarn-site.xml on client.
> 
>  yarn.web-proxy.address
>  rm1:9088,rm2:9088
>  
> 2. Change the way to get the config from Configuration#get to 
> Configuration#getStrings in WebAppUtils#getProxyHostsAndPortsForAmFilter.
> So that I can access the AM page now.
> This config needs to be added in the client side, so it will affect 
> application only.
> Before fixing, click the AM link in RM or Router:
> !v1.1_dataplat_Hadoop平台_HDP3_2_1版本升级_YARN_26_Federation严重BUG--无法查看AM_WebHome_1621584160759-478.png!
>  And after the fix, we can access the AM page as normal...
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10786) Federation:We can't access the AM page while using federation

2021-05-31 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354292#comment-17354292
 ] 

zhengchenyu commented on YARN-10786:


I don't think it's a good way to solve this problem. 
I think yarn client shouldn't know resourcemanager's address in federation 
mode. we may adjust the back yarn sub-cluster, so resourcemanager's address may 
be not constant. 
In the other way, If we have more than one subcluster, this way may be not good.
In fact, we solve this problem by some solution like song jiacheng. But I think 
it's not a good solution .I create YARN-10775 before, hope to discuss this 
problem. 

> Federation:We can't access the AM page while using federation
> -
>
> Key: YARN-10786
> URL: https://issues.apache.org/jira/browse/YARN-10786
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.1
>Reporter: Song Jiacheng
>Priority: Major
>  Labels: federation
> Fix For: 3.2.1
>
> Attachments: YARN-10786.v1.patch, 
> n_v25156273211c049f8b396dcf15fcd9a84.png, 
> v1.1_dataplat_Hadoop平台_HDP3_2_1版本升级_YARN_26_Federation严重BUG--无法查看AM_WebHome_1621584160759-478.png
>
>
> The reason of this is that AM gets the proxy URI from config 
> yarn.web-proxy.address, and if it does not exist, it will get the URI from 
> yarn.resourcemanager.webapp.address.
> But in federation, we don't know which RM will be the home cluster of an 
> application, so I do this fix:
> 1. Add this config in the yarn-site.xml on client.
> 
>  yarn.web-proxy.address
>  rm1:9088,rm2:9088
>  
> 2. Change the way to get the config from Configuration#get to 
> Configuration#getStrings in WebAppUtils#getProxyHostsAndPortsForAmFilter.
> So that I can access the AM page now.
> This config needs to be added in the client side, so it will affect 
> application only.
> Before fixing, click the AM link in RM or Router:
> !v1.1_dataplat_Hadoop平台_HDP3_2_1版本升级_YARN_26_Federation严重BUG--无法查看AM_WebHome_1621584160759-478.png!
>  And after the fix, we can access the AM page as normal...
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10775) Federation: Yarn running app web can't be unable to connect, because AppMaster can't redirect to the right address.

2021-05-31 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354290#comment-17354290
 ] 

zhengchenyu commented on YARN-10775:


I think maybe we need to construct a proxy server in nm to proxy am's web 
request, though it is tedious。

> Federation: Yarn running app web can't be unable to connect, because 
> AppMaster can't redirect to the right address. 
> 
>
> Key: YARN-10775
> URL: https://issues.apache.org/jira/browse/YARN-10775
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>
> I setup a yarn federation cluster, I can't connect the running app web, but 
> the completed app's web works.
> By some debug info, I found the reason. Because AmIpFilter.PROXY_HOSTS is set 
> by configuration (do it in 
> AmFilterInitializer.getProxyHostsAndPortsForAmFilter). In client 
> configuration, there are no configuation about 'rm web port', so the url 
> can't not be redirect. 
> I think client should not know rm's web port, especially in yarn federation 
> mode.
> Note:
> Here is my specially configuration for client. I use ha for select router 
> randomly.
> || config || value ||
> | yarn.resourcemanager.ha.enabled   | true |
> | yarn.resourcemanager.ha.rm-ids | r1 |
> |yarn.resourcemanager.address.r1 | router_host:8050 |
> |yarn.resourcemanager.scheduler.address.r1 | localhost:8049 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10794) Submitting jobs to a single subcluster will fail while AMRMProxy is enabled

2021-05-31 Thread Song Jiacheng (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Song Jiacheng updated YARN-10794:
-
Description: 
Sorry for not knowing how to quote a issue...

https://issues.apache.org/jira/browse/YARN-9693

This issue has already raised this problem, but it seems that I can't submit 
job by the federation client while using the patch.

This problem makes it impossible to rolling upgrade to federation, cause we 
can't upgrade all the NMs and clients at one moment

So I developed another patch, using this patch I can submit jobs via the both 
ways.

I tested this patch in some situations like AM recover, NM recover, no error 
found.

But still, I can't ensure this patch is good, so i wonder if there is a better 
solution.

 

  was:
Sorry for not knowing how to quote a issue...

https://issues.apache.org/jira/browse/YARN-9693

This issue has already raised this problem, but it seems that I can't submit 
job by the federation client while using the patch.

This problem makes it impossible to rolling upgrade to federation, cause we 
can't upgrade all the NMs and clients at one moment

So I developed another patch, using this I can submit jobs via the both ways.

I tested this in some situations like AM recover, NM recover, no error found.

But still, I can't ensure this patch is good, so i wonder if there is a better 
solution.

 


> Submitting jobs to a single subcluster will fail while AMRMProxy is enabled
> ---
>
> Key: YARN-10794
> URL: https://issues.apache.org/jira/browse/YARN-10794
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.2.1
>Reporter: Song Jiacheng
>Priority: Major
> Attachments: YARN-10794.v1.patch, YARN-10794.v2.patch
>
>
> Sorry for not knowing how to quote a issue...
> https://issues.apache.org/jira/browse/YARN-9693
> This issue has already raised this problem, but it seems that I can't submit 
> job by the federation client while using the patch.
> This problem makes it impossible to rolling upgrade to federation, cause we 
> can't upgrade all the NMs and clients at one moment
> So I developed another patch, using this patch I can submit jobs via the both 
> ways.
> I tested this patch in some situations like AM recover, NM recover, no error 
> found.
> But still, I can't ensure this patch is good, so i wonder if there is a 
> better solution.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10794) Submitting jobs to a single subcluster will fail while AMRMProxy is enabled

2021-05-31 Thread Song Jiacheng (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354280#comment-17354280
 ] 

Song Jiacheng commented on YARN-10794:
--

I committed a patch based on the trunk.

> Submitting jobs to a single subcluster will fail while AMRMProxy is enabled
> ---
>
> Key: YARN-10794
> URL: https://issues.apache.org/jira/browse/YARN-10794
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.2.1
>Reporter: Song Jiacheng
>Priority: Major
> Attachments: YARN-10794.v1.patch, YARN-10794.v2.patch
>
>
> Sorry for not knowing how to quote a issue...
> https://issues.apache.org/jira/browse/YARN-9693
> This issue has already raised this problem, but it seems that I can't submit 
> job by the federation client while using the patch.
> This problem makes it impossible to rolling upgrade to federation, cause we 
> can't upgrade all the NMs and clients at one moment
> So I developed another patch, using this I can submit jobs via the both ways.
> I tested this in some situations like AM recover, NM recover, no error found.
> But still, I can't ensure this patch is good, so i wonder if there is a 
> better solution.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10780) Optimise retrieval of configured node labels in CS queues

2021-05-31 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10780:

Parent: YARN-10795
Issue Type: Sub-task  (was: Improvement)

> Optimise retrieval of configured node labels in CS queues
> -
>
> Key: YARN-10780
> URL: https://issues.apache.org/jira/browse/YARN-10780
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
>
> CapacitySchedulerConfiguration#getConfiguredNodeLabels scales poorly with 
> respect to queue numbers (its O(n*m), where n is the number of queues and m 
> is the number of properties set by each queue). During CS reinit, the node 
> labels are often queried, however looking at the code:
> {code:java}
> for (Entry stringStringEntry : this) {
>   e = stringStringEntry;
>   String key = e.getKey();
>   if (key.startsWith(getQueuePrefix(queuePath) + ACCESSIBLE_NODE_LABELS
>   + DOT)) {
> // Find  in
> // .accessible-node-labels..property
> int labelStartIdx =
> key.indexOf(ACCESSIBLE_NODE_LABELS)
> + ACCESSIBLE_NODE_LABELS.length() + 1;
> int labelEndIndx = key.indexOf('.', labelStartIdx);
> String labelName = key.substring(labelStartIdx, labelEndIndx);
> configuredNodeLabels.add(labelName);
>   }
> }
> {code}
>  This method iterates through ALL properties set in the configuration. For 
> example in case of initialising 2500 queues, each having at least 2 
> properties:
> 2500 * 5000 ~= over 12 million iteration + additional properties
> There are some ways to resolve this issue while keeping backward 
> compatibility:
>  # Create a property like the original accessible-node-labels, which contains 
> predefined labels. If it is set, then getConfiguredNodeLabels get the value 
> of this property, otherwise it falls back to the old logic. I think 
> accessible-node-labels are not used for this purpose (though I have a feeling 
> that it should have been).
>  # Collect node labels for all queues at the beginning of parseQueue and only 
> iterate through the properties once. This will increase the space complexity 
> in exchange of not requiring intervention from user's perspective. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10795) Improve Capacity Scheduler reinitialisation performance

2021-05-31 Thread Andras Gyori (Jira)
Andras Gyori created YARN-10795:
---

 Summary: Improve Capacity Scheduler reinitialisation performance
 Key: YARN-10795
 URL: https://issues.apache.org/jira/browse/YARN-10795
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler
Reporter: Andras Gyori


Mostly due to CapacitySchedulerConfiguration#getPropsWithPrefix or similar 
methods, the CapacityScheduler#reinit method has some quadratic complexity part 
with respect to queue numbers. Over 1000+ queues, it is a matter of minutes, 
which is too high to be a viable option when it is used in mutation api.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org