[jira] [Updated] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2019-11-03 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9768:
---
Attachment: YARN-9768.006.patch

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Priority: Major
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, 
> YARN-9768.006.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2019-11-03 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965689#comment-16965689
 ] 

Manikandan R commented on YARN-9768:


[~inigoiri] Thanks for your review. Addressed all comments in .006.patch.

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Priority: Major
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, 
> YARN-9768.006.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9950) Unset Ordering Policy of Leaf/Parent queue converted from Parent/Leaf queue respectively

2019-11-03 Thread Prabhu Joseph (Jira)
Prabhu Joseph created YARN-9950:
---

 Summary: Unset Ordering Policy of Leaf/Parent queue converted from 
Parent/Leaf queue respectively
 Key: YARN-9950
 URL: https://issues.apache.org/jira/browse/YARN-9950
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacity scheduler
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


RM fails to start when adding a queue (say root.A.A1) under a leaf queue (say 
root.A) with ordering policy fifo.

YARN supports fifo or fair for leaf queue and utilization or 
priority-utilization for parent queue. When the existing leaf queue (root.A) 
becomes parent queue - the ordering policy (fifo or fair) has to be unset. Else 
YARN RM will fail as fifo or fair is not a valid queue ordering policy for 
parent queue.

Similarly while removing a queue, unset ordering policy of leaf queue which 
converted from parent queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9950) Unset Ordering Policy of Leaf/Parent queue converted from Parent/Leaf queue respectively

2019-11-03 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9950:

Attachment: YARN-9950-001.patch

> Unset Ordering Policy of Leaf/Parent queue converted from Parent/Leaf queue 
> respectively
> 
>
> Key: YARN-9950
> URL: https://issues.apache.org/jira/browse/YARN-9950
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9950-001.patch
>
>
> RM fails to start when adding a queue (say root.A.A1) under a leaf queue (say 
> root.A) with ordering policy fifo.
> YARN supports fifo or fair for leaf queue and utilization or 
> priority-utilization for parent queue. When the existing leaf queue (root.A) 
> becomes parent queue - the ordering policy (fifo or fair) has to be unset. 
> Else YARN RM will fail as fifo or fair is not a valid queue ordering policy 
> for parent queue.
> Similarly while removing a queue, unset ordering policy of leaf queue which 
> converted from parent queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9950) Unset Ordering Policy of Leaf/Parent queue converted from Parent/Leaf queue respectively

2019-11-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966268#comment-16966268
 ] 

Hadoop QA commented on YARN-9950:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  3s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 45s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 
24s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}142m 45s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9950 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984723/YARN-9950-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 442715815c14 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d462308 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25083/testReport/ |
| Max. process+thread count | 815 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25083/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Unset Ordering Policy of Leaf/Parent q

[jira] [Updated] (YARN-9880) In YARN ui2 attempts tab, The running Application Attempt's ElapsedTime is incorrect.

2019-11-03 Thread jenny (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jenny updated YARN-9880:

Description: 
In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
zones, the Running Application Attempt's ElapsedTime is greater than the 
Application's ElapsedTime.

 In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from brower. 
Therefore, when the browser and the yarn server are in different time zones, 
the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 29 Secs, 
the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.

 

  was:
In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
zones, the Running Application Attempt's ElapsedTime is greater than the 
Application's ElapsedTime.

 

In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from brower. 
Therefore, when the browser and the yarn server are in different time zones, 
the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 29 Secs, 
the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.


> In YARN ui2 attempts tab, The running Application Attempt's ElapsedTime is 
> incorrect.
> -
>
> Key: YARN-9880
> URL: https://issues.apache.org/jira/browse/YARN-9880
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1, 3.2.1
>Reporter: jenny
>Priority: Major
>  Labels: pull-request-available
> Attachments: Yarn-AppAttempt-UI1.png, 
> Yarn-AppAttempt-UI2_GraphView-1.png, Yarn-AppAttempt-UI2_GraphView-2.png, 
> Yarn-AppAttempt-UI2_GridView-container.png, Yarn-AppAttempt-UI2_GridView.png, 
> image-2019-10-09-17-22-31-603.png
>
>
> In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
> zones, the Running Application Attempt's ElapsedTime is greater than the 
> Application's ElapsedTime.
>  In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from 
> brower. Therefore, when the browser and the yarn server are in different time 
> zones, the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 
> 29 Secs, the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9880) In YARN ui2 attempts tab, The running Application Attempt's ElapsedTime is incorrect.

2019-11-03 Thread jenny (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jenny updated YARN-9880:

Description: 
In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
zones, the Running Application Attempt's ElapsedTime is greater than the 
Application's ElapsedTime.

 

In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from brower. 
Therefore, when the browser and the yarn server are in different time zones, 
the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 29 Secs, 
the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.

  was:
In YARN ui2 attempts tab, The running Application Attempt's ElapsedTime is 
incorrect.
 When Browser and Yarn server are in different time zones, the Application 
Attempt's ElapsedTime is greater than the Application's ElapsedTime.



I found this problem where hadoop3.1.1 was used, want to upgrade to the latest 
version of hadoop, but found the running Application Attempt's ElapsedTime is 
also incorrect.

In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from brower. 
Therefore, when the brower and the yarn server are in different time zones, the 
ElapsedTime is incorrect. While the Application‘s ElapsedTime is 29 Secs, the 
Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.


> In YARN ui2 attempts tab, The running Application Attempt's ElapsedTime is 
> incorrect.
> -
>
> Key: YARN-9880
> URL: https://issues.apache.org/jira/browse/YARN-9880
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1, 3.2.1
>Reporter: jenny
>Priority: Major
>  Labels: pull-request-available
> Attachments: Yarn-AppAttempt-UI1.png, 
> Yarn-AppAttempt-UI2_GraphView-1.png, Yarn-AppAttempt-UI2_GraphView-2.png, 
> Yarn-AppAttempt-UI2_GridView-container.png, Yarn-AppAttempt-UI2_GridView.png, 
> image-2019-10-09-17-22-31-603.png
>
>
> In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
> zones, the Running Application Attempt's ElapsedTime is greater than the 
> Application's ElapsedTime.
>  
> In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from 
> brower. Therefore, when the browser and the yarn server are in different time 
> zones, the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 
> 29 Secs, the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9880) In YARN ui2 attempts tab, The running Application Attempt's ElapsedTime is incorrect.

2019-11-03 Thread jenny (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jenny updated YARN-9880:

Description: 
In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
zones, the Running Application Attempt's ElapsedTime is greater than the 
Application's ElapsedTime.

 In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from brower. 
Therefore, when the browser and the yarn server are in different time zones, 
the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 29 Secs, 
the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.

Yarn ui1 : the ElapsedTime of running application's attempt

!Yarn-AppAttempt-UI1.png|width=477,height=67!

 Yarn ui2 : the ElapsedTime of running application's attempt

 

  was:
In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
zones, the Running Application Attempt's ElapsedTime is greater than the 
Application's ElapsedTime.

 In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from brower. 
Therefore, when the browser and the yarn server are in different time zones, 
the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 29 Secs, 
the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.

 


> In YARN ui2 attempts tab, The running Application Attempt's ElapsedTime is 
> incorrect.
> -
>
> Key: YARN-9880
> URL: https://issues.apache.org/jira/browse/YARN-9880
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1, 3.2.1
>Reporter: jenny
>Priority: Major
>  Labels: pull-request-available
> Attachments: Yarn-AppAttempt-UI1.png, 
> Yarn-AppAttempt-UI2_GraphView-1.png, Yarn-AppAttempt-UI2_GraphView-2.png, 
> Yarn-AppAttempt-UI2_GridView-container.png, Yarn-AppAttempt-UI2_GridView.png, 
> image-2019-10-09-17-22-31-603.png
>
>
> In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
> zones, the Running Application Attempt's ElapsedTime is greater than the 
> Application's ElapsedTime.
>  In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from 
> brower. Therefore, when the browser and the yarn server are in different time 
> zones, the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 
> 29 Secs, the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.
> Yarn ui1 : the ElapsedTime of running application's attempt
> !Yarn-AppAttempt-UI1.png|width=477,height=67!
>  Yarn ui2 : the ElapsedTime of running application's attempt
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9880) In YARN ui2 attempts tab, The running Application Attempt's ElapsedTime is incorrect.

2019-11-03 Thread jenny (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jenny updated YARN-9880:

Description: 
In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
zones, the Running Application Attempt's ElapsedTime is greater than the 
Application's ElapsedTime.

 In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from brower. 
Therefore, when the browser and the yarn server are in different time zones, 
the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 29 Secs, 
the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.

Yarn ui1 : the ElapsedTime of running application's attempt

!Yarn-AppAttempt-UI1.png|width=477,height=67!

 Yarn ui2 : the ElapsedTime of running application's attempt

1. GraphView

  !Yarn-AppAttempt-UI2_GraphView-1.png|width=544,height=213!

  was:
In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
zones, the Running Application Attempt's ElapsedTime is greater than the 
Application's ElapsedTime.

 In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from brower. 
Therefore, when the browser and the yarn server are in different time zones, 
the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 29 Secs, 
the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.

Yarn ui1 : the ElapsedTime of running application's attempt

!Yarn-AppAttempt-UI1.png|width=477,height=67!

 Yarn ui2 : the ElapsedTime of running application's attempt

 


> In YARN ui2 attempts tab, The running Application Attempt's ElapsedTime is 
> incorrect.
> -
>
> Key: YARN-9880
> URL: https://issues.apache.org/jira/browse/YARN-9880
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1, 3.2.1
>Reporter: jenny
>Priority: Major
>  Labels: pull-request-available
> Attachments: Yarn-AppAttempt-UI1.png, 
> Yarn-AppAttempt-UI2_GraphView-1.png, Yarn-AppAttempt-UI2_GraphView-2.png, 
> Yarn-AppAttempt-UI2_GridView-container.png, Yarn-AppAttempt-UI2_GridView.png, 
> image-2019-10-09-17-22-31-603.png
>
>
> In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
> zones, the Running Application Attempt's ElapsedTime is greater than the 
> Application's ElapsedTime.
>  In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from 
> brower. Therefore, when the browser and the yarn server are in different time 
> zones, the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 
> 29 Secs, the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.
> Yarn ui1 : the ElapsedTime of running application's attempt
> !Yarn-AppAttempt-UI1.png|width=477,height=67!
>  Yarn ui2 : the ElapsedTime of running application's attempt
> 1. GraphView
>   !Yarn-AppAttempt-UI2_GraphView-1.png|width=544,height=213!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9880) In YARN ui2 attempts tab, The running Application Attempt's ElapsedTime is incorrect.

2019-11-03 Thread jenny (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jenny updated YARN-9880:

Description: 
In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
zones, the Running Application Attempt's ElapsedTime is greater than the 
Application's ElapsedTime.

 In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from brower. 
Therefore, when the browser and the yarn server are in different time zones, 
the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 29 Secs, 
the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.

Yarn ui1 : the ElapsedTime of running application's attempt

!Yarn-AppAttempt-UI1.png|width=477,height=67!

 Yarn ui2 : the ElapsedTime of running application's attempt

1. GraphView

  !Yarn-AppAttempt-UI2_GraphView-1.png|width=444,height=174!

!Yarn-AppAttempt-UI2_GraphView-2.png|width=440,height=168!

2. GridView

 

  was:
In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
zones, the Running Application Attempt's ElapsedTime is greater than the 
Application's ElapsedTime.

 In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from brower. 
Therefore, when the browser and the yarn server are in different time zones, 
the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 29 Secs, 
the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.

Yarn ui1 : the ElapsedTime of running application's attempt

!Yarn-AppAttempt-UI1.png|width=477,height=67!

 Yarn ui2 : the ElapsedTime of running application's attempt

1. GraphView

  !Yarn-AppAttempt-UI2_GraphView-1.png|width=544,height=213!


> In YARN ui2 attempts tab, The running Application Attempt's ElapsedTime is 
> incorrect.
> -
>
> Key: YARN-9880
> URL: https://issues.apache.org/jira/browse/YARN-9880
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1, 3.2.1
>Reporter: jenny
>Priority: Major
>  Labels: pull-request-available
> Attachments: Yarn-AppAttempt-UI1.png, 
> Yarn-AppAttempt-UI2_GraphView-1.png, Yarn-AppAttempt-UI2_GraphView-2.png, 
> Yarn-AppAttempt-UI2_GridView-container.png, Yarn-AppAttempt-UI2_GridView.png, 
> image-2019-10-09-17-22-31-603.png
>
>
> In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
> zones, the Running Application Attempt's ElapsedTime is greater than the 
> Application's ElapsedTime.
>  In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from 
> brower. Therefore, when the browser and the yarn server are in different time 
> zones, the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 
> 29 Secs, the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.
> Yarn ui1 : the ElapsedTime of running application's attempt
> !Yarn-AppAttempt-UI1.png|width=477,height=67!
>  Yarn ui2 : the ElapsedTime of running application's attempt
> 1. GraphView
>   !Yarn-AppAttempt-UI2_GraphView-1.png|width=444,height=174!
> !Yarn-AppAttempt-UI2_GraphView-2.png|width=440,height=168!
> 2. GridView
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9880) In YARN ui2 attempts tab, The running Application Attempt's ElapsedTime is incorrect.

2019-11-03 Thread jenny (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jenny updated YARN-9880:

Description: 
In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
zones, the Running Application Attempt's ElapsedTime is greater than the 
Application's ElapsedTime.

 In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from brower. 
Therefore, when the browser and the yarn server are in different time zones, 
the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 29 Secs, 
the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.

Yarn ui1 : the ElapsedTime of running application's attempt

!Yarn-AppAttempt-UI1.png|width=477,height=67!

 Yarn ui2 : the ElapsedTime of running application's attempt

1. GraphView

  !Yarn-AppAttempt-UI2_GraphView-1.png|width=444,height=174!

!Yarn-AppAttempt-UI2_GraphView-2.png|width=440,height=168!

2. GridView

  !Yarn-AppAttempt-UI2_GridView.png|width=451,height=165!

3. GridView - container

 

  was:
In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
zones, the Running Application Attempt's ElapsedTime is greater than the 
Application's ElapsedTime.

 In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from brower. 
Therefore, when the browser and the yarn server are in different time zones, 
the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 29 Secs, 
the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.

Yarn ui1 : the ElapsedTime of running application's attempt

!Yarn-AppAttempt-UI1.png|width=477,height=67!

 Yarn ui2 : the ElapsedTime of running application's attempt

1. GraphView

  !Yarn-AppAttempt-UI2_GraphView-1.png|width=444,height=174!

!Yarn-AppAttempt-UI2_GraphView-2.png|width=440,height=168!

2. GridView

 


> In YARN ui2 attempts tab, The running Application Attempt's ElapsedTime is 
> incorrect.
> -
>
> Key: YARN-9880
> URL: https://issues.apache.org/jira/browse/YARN-9880
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1, 3.2.1
>Reporter: jenny
>Priority: Major
>  Labels: pull-request-available
> Attachments: Yarn-AppAttempt-UI1.png, 
> Yarn-AppAttempt-UI2_GraphView-1.png, Yarn-AppAttempt-UI2_GraphView-2.png, 
> Yarn-AppAttempt-UI2_GridView-container.png, Yarn-AppAttempt-UI2_GridView.png, 
> image-2019-10-09-17-22-31-603.png
>
>
> In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
> zones, the Running Application Attempt's ElapsedTime is greater than the 
> Application's ElapsedTime.
>  In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from 
> brower. Therefore, when the browser and the yarn server are in different time 
> zones, the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 
> 29 Secs, the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.
> Yarn ui1 : the ElapsedTime of running application's attempt
> !Yarn-AppAttempt-UI1.png|width=477,height=67!
>  Yarn ui2 : the ElapsedTime of running application's attempt
> 1. GraphView
>   !Yarn-AppAttempt-UI2_GraphView-1.png|width=444,height=174!
> !Yarn-AppAttempt-UI2_GraphView-2.png|width=440,height=168!
> 2. GridView
>   !Yarn-AppAttempt-UI2_GridView.png|width=451,height=165!
> 3. GridView - container
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9880) In YARN ui2 attempts tab, The running Application Attempt's ElapsedTime is incorrect.

2019-11-03 Thread jenny (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jenny updated YARN-9880:

Description: 
In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
zones, the Running Application Attempt's ElapsedTime is greater than the 
Application's ElapsedTime.

 In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from brower. 
Therefore, when the browser and the yarn server are in different time zones, 
the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 29 Secs, 
the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.

Yarn ui1 : the ElapsedTime of running application's attempt

!Yarn-AppAttempt-UI1.png|width=477,height=67!

 Yarn ui2 : the ElapsedTime of running application's attempt

1. GraphView

  !Yarn-AppAttempt-UI2_GraphView-1.png|width=444,height=174!

!Yarn-AppAttempt-UI2_GraphView-2.png|width=440,height=168!

2. GridView

  !Yarn-AppAttempt-UI2_GridView.png|width=451,height=165!

3. GridView - container

!Yarn-AppAttempt-UI2_GridView-container.png|width=439,height=106!

  was:
In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
zones, the Running Application Attempt's ElapsedTime is greater than the 
Application's ElapsedTime.

 In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from brower. 
Therefore, when the browser and the yarn server are in different time zones, 
the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 29 Secs, 
the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.

Yarn ui1 : the ElapsedTime of running application's attempt

!Yarn-AppAttempt-UI1.png|width=477,height=67!

 Yarn ui2 : the ElapsedTime of running application's attempt

1. GraphView

  !Yarn-AppAttempt-UI2_GraphView-1.png|width=444,height=174!

!Yarn-AppAttempt-UI2_GraphView-2.png|width=440,height=168!

2. GridView

  !Yarn-AppAttempt-UI2_GridView.png|width=451,height=165!

3. GridView - container

 


> In YARN ui2 attempts tab, The running Application Attempt's ElapsedTime is 
> incorrect.
> -
>
> Key: YARN-9880
> URL: https://issues.apache.org/jira/browse/YARN-9880
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1, 3.2.1
>Reporter: jenny
>Priority: Major
>  Labels: pull-request-available
> Attachments: Yarn-AppAttempt-UI1.png, 
> Yarn-AppAttempt-UI2_GraphView-1.png, Yarn-AppAttempt-UI2_GraphView-2.png, 
> Yarn-AppAttempt-UI2_GridView-container.png, Yarn-AppAttempt-UI2_GridView.png, 
> image-2019-10-09-17-22-31-603.png
>
>
> In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
> zones, the Running Application Attempt's ElapsedTime is greater than the 
> Application's ElapsedTime.
>  In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from 
> brower. Therefore, when the browser and the yarn server are in different time 
> zones, the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 
> 29 Secs, the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.
> Yarn ui1 : the ElapsedTime of running application's attempt
> !Yarn-AppAttempt-UI1.png|width=477,height=67!
>  Yarn ui2 : the ElapsedTime of running application's attempt
> 1. GraphView
>   !Yarn-AppAttempt-UI2_GraphView-1.png|width=444,height=174!
> !Yarn-AppAttempt-UI2_GraphView-2.png|width=440,height=168!
> 2. GridView
>   !Yarn-AppAttempt-UI2_GridView.png|width=451,height=165!
> 3. GridView - container
> !Yarn-AppAttempt-UI2_GridView-container.png|width=439,height=106!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9880) In YARN ui2 attempts tab, The running Application Attempt's ElapsedTime is incorrect.

2019-11-03 Thread jenny (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jenny updated YARN-9880:

Attachment: (was: image-2019-10-09-17-22-31-603.png)

> In YARN ui2 attempts tab, The running Application Attempt's ElapsedTime is 
> incorrect.
> -
>
> Key: YARN-9880
> URL: https://issues.apache.org/jira/browse/YARN-9880
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1, 3.2.1
>Reporter: jenny
>Priority: Major
>  Labels: pull-request-available
> Attachments: Yarn-AppAttempt-UI1.png, 
> Yarn-AppAttempt-UI2_GraphView-1.png, Yarn-AppAttempt-UI2_GraphView-2.png, 
> Yarn-AppAttempt-UI2_GridView-container.png, Yarn-AppAttempt-UI2_GridView.png
>
>
> In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
> zones, the Running Application Attempt's ElapsedTime is greater than the 
> Application's ElapsedTime.
>  In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from 
> brower. Therefore, when the browser and the yarn server are in different time 
> zones, the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 
> 29 Secs, the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.
> Yarn ui1 : the ElapsedTime of running application's attempt
> !Yarn-AppAttempt-UI1.png|width=477,height=67!
>  Yarn ui2 : the ElapsedTime of running application's attempt
> 1. GraphView
>   !Yarn-AppAttempt-UI2_GraphView-1.png|width=444,height=174!
> !Yarn-AppAttempt-UI2_GraphView-2.png|width=440,height=168!
> 2. GridView
>   !Yarn-AppAttempt-UI2_GridView.png|width=451,height=165!
> 3. GridView - container
> !Yarn-AppAttempt-UI2_GridView-container.png|width=439,height=106!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9880) In YARN ui2 attempts tab, The running Application Attempt's ElapsedTime is incorrect.

2019-11-03 Thread jenny (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jenny updated YARN-9880:

Description: 
In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
zones, the Running Application Attempt's ElapsedTime is greater than the 
Application's ElapsedTime.

 In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from brower. 
Therefore, when the browser and the yarn server are in different time zones, 
the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 29 Secs, 
the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.

1.  Yarn ui1 : the ElapsedTime of running application's attempt

!Yarn-AppAttempt-UI1.png|width=477,height=67!

 2. Yarn ui2 : the ElapsedTime of running application's attempt

a) GraphView

  !Yarn-AppAttempt-UI2_GraphView-1.png|width=444,height=174!

!Yarn-AppAttempt-UI2_GraphView-2.png|width=440,height=168!

b) GridView

  !Yarn-AppAttempt-UI2_GridView.png|width=451,height=165!

c) GridView - container

!Yarn-AppAttempt-UI2_GridView-container.png|width=439,height=106!

  was:
In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
zones, the Running Application Attempt's ElapsedTime is greater than the 
Application's ElapsedTime.

 In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from brower. 
Therefore, when the browser and the yarn server are in different time zones, 
the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 29 Secs, 
the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.

Yarn ui1 : the ElapsedTime of running application's attempt

!Yarn-AppAttempt-UI1.png|width=477,height=67!

 Yarn ui2 : the ElapsedTime of running application's attempt

1. GraphView

  !Yarn-AppAttempt-UI2_GraphView-1.png|width=444,height=174!

!Yarn-AppAttempt-UI2_GraphView-2.png|width=440,height=168!

2. GridView

  !Yarn-AppAttempt-UI2_GridView.png|width=451,height=165!

3. GridView - container

!Yarn-AppAttempt-UI2_GridView-container.png|width=439,height=106!


> In YARN ui2 attempts tab, The running Application Attempt's ElapsedTime is 
> incorrect.
> -
>
> Key: YARN-9880
> URL: https://issues.apache.org/jira/browse/YARN-9880
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1, 3.2.1
>Reporter: jenny
>Priority: Major
>  Labels: pull-request-available
> Attachments: Yarn-AppAttempt-UI1.png, 
> Yarn-AppAttempt-UI2_GraphView-1.png, Yarn-AppAttempt-UI2_GraphView-2.png, 
> Yarn-AppAttempt-UI2_GridView-container.png, Yarn-AppAttempt-UI2_GridView.png
>
>
> In YARN ui2 attempts tab, When Browser and Yarn server are in different time 
> zones, the Running Application Attempt's ElapsedTime is greater than the 
> Application's ElapsedTime.
>  In UI1, get ElapsedTime from yarn server. In UI2, get currentTime from 
> brower. Therefore, when the browser and the yarn server are in different time 
> zones, the ElapsedTime is incorrect. While the Application‘s ElapsedTime is 
> 29 Secs, the Application Attempt's ElapsedTime is 10 Hrs : 49 Mins : 55 Secs.
> 1.  Yarn ui1 : the ElapsedTime of running application's attempt
> !Yarn-AppAttempt-UI1.png|width=477,height=67!
>  2. Yarn ui2 : the ElapsedTime of running application's attempt
> a) GraphView
>   !Yarn-AppAttempt-UI2_GraphView-1.png|width=444,height=174!
> !Yarn-AppAttempt-UI2_GraphView-2.png|width=440,height=168!
> b) GridView
>   !Yarn-AppAttempt-UI2_GridView.png|width=451,height=165!
> c) GridView - container
> !Yarn-AppAttempt-UI2_GridView-container.png|width=439,height=106!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9913) In YARN ui2 attempt container tab, The Container's ElapsedTime of running Application is incorrect when the browser and the yarn server are in different timezons.

2019-11-03 Thread jenny (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966340#comment-16966340
 ] 

jenny commented on YARN-9913:
-

Added a PR. Use the RestAPI's returnValue ElapsedTime as the container‘s 
ElapsedTime. ( Instead of using Date.now() of the browser)

 

>  In YARN ui2 attempt container tab, The Container's ElapsedTime of  running 
> Application is incorrect when the browser and the yarn server are in 
> different timezons.
> 
>
> Key: YARN-9913
> URL: https://issues.apache.org/jira/browse/YARN-9913
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1, 3.2.1
>Reporter: jenny
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2019-10-18-14-48-50-856.png, 
> image-2019-10-18-14-50-14-576.png
>
>
> In YARN ui2 attempt container tab, The Container's ElapsedTime of running 
> Application is incorrect when the browser and the yarn server are in 
> different timezons.
>  Please see the screenshots below:
>  Yarn UI2:
>     !image-2019-10-18-14-48-50-856.png|width=489,height=169!
> Yarn UI1:
> !image-2019-10-18-14-50-14-576.png|width=488,height=191!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'

2019-11-03 Thread kailiu_dev (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kailiu_dev updated YARN-9940:
-
Attachment: (was: 0001.patch)

> avoid continuous scheduling thread crashes while sorting nodes get 
> 'Comparison method violates its general contract'
> 
>
> Key: YARN-9940
> URL: https://issues.apache.org/jira/browse/YARN-9940
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: kailiu_dev
>Priority: Major
> Attachments: YARN-9940.patch
>
>
> 2019-10-16 09:14:51,215 ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception.
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>     at java.util.TimSort.mergeHi(TimSort.java:868)
>     at java.util.TimSort.mergeAt(TimSort.java:485)
>     at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
>     at java.util.TimSort.sort(TimSort.java:223)
>     at java.util.TimSort.sort(TimSort.java:173)
>     at java.util.Arrays.sort(Arrays.java:659)
>     at java.util.Collections.sort(Collections.java:217)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'

2019-11-03 Thread kailiu_dev (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kailiu_dev updated YARN-9940:
-
Attachment: YARN-9940.patch

> avoid continuous scheduling thread crashes while sorting nodes get 
> 'Comparison method violates its general contract'
> 
>
> Key: YARN-9940
> URL: https://issues.apache.org/jira/browse/YARN-9940
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: kailiu_dev
>Priority: Major
> Attachments: YARN-9940.patch
>
>
> 2019-10-16 09:14:51,215 ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception.
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>     at java.util.TimSort.mergeHi(TimSort.java:868)
>     at java.util.TimSort.mergeAt(TimSort.java:485)
>     at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
>     at java.util.TimSort.sort(TimSort.java:223)
>     at java.util.TimSort.sort(TimSort.java:173)
>     at java.util.Arrays.sort(Arrays.java:659)
>     at java.util.Collections.sort(Collections.java:217)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'

2019-11-03 Thread kailiu_dev (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kailiu_dev updated YARN-9940:
-
Attachment: (was: YARN-9940.patch)

> avoid continuous scheduling thread crashes while sorting nodes get 
> 'Comparison method violates its general contract'
> 
>
> Key: YARN-9940
> URL: https://issues.apache.org/jira/browse/YARN-9940
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: kailiu_dev
>Priority: Major
> Attachments: YARN-9940.patch
>
>
> 2019-10-16 09:14:51,215 ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception.
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>     at java.util.TimSort.mergeHi(TimSort.java:868)
>     at java.util.TimSort.mergeAt(TimSort.java:485)
>     at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
>     at java.util.TimSort.sort(TimSort.java:223)
>     at java.util.TimSort.sort(TimSort.java:173)
>     at java.util.Arrays.sort(Arrays.java:659)
>     at java.util.Collections.sort(Collections.java:217)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'

2019-11-03 Thread kailiu_dev (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kailiu_dev updated YARN-9940:
-
Attachment: (was: YARN-9940.patch)

> avoid continuous scheduling thread crashes while sorting nodes get 
> 'Comparison method violates its general contract'
> 
>
> Key: YARN-9940
> URL: https://issues.apache.org/jira/browse/YARN-9940
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: kailiu_dev
>Priority: Major
> Attachments: YARN-9940.patch
>
>
> 2019-10-16 09:14:51,215 ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception.
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>     at java.util.TimSort.mergeHi(TimSort.java:868)
>     at java.util.TimSort.mergeAt(TimSort.java:485)
>     at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
>     at java.util.TimSort.sort(TimSort.java:223)
>     at java.util.TimSort.sort(TimSort.java:173)
>     at java.util.Arrays.sort(Arrays.java:659)
>     at java.util.Collections.sort(Collections.java:217)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'

2019-11-03 Thread kailiu_dev (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kailiu_dev updated YARN-9940:
-
Attachment: YARN-9940.patch

> avoid continuous scheduling thread crashes while sorting nodes get 
> 'Comparison method violates its general contract'
> 
>
> Key: YARN-9940
> URL: https://issues.apache.org/jira/browse/YARN-9940
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: kailiu_dev
>Priority: Major
> Attachments: YARN-9940.patch
>
>
> 2019-10-16 09:14:51,215 ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception.
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>     at java.util.TimSort.mergeHi(TimSort.java:868)
>     at java.util.TimSort.mergeAt(TimSort.java:485)
>     at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
>     at java.util.TimSort.sort(TimSort.java:223)
>     at java.util.TimSort.sort(TimSort.java:173)
>     at java.util.Arrays.sort(Arrays.java:659)
>     at java.util.Collections.sort(Collections.java:217)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'

2019-11-03 Thread kailiu_dev (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kailiu_dev updated YARN-9940:
-
Attachment: YARN-9940.patch

> avoid continuous scheduling thread crashes while sorting nodes get 
> 'Comparison method violates its general contract'
> 
>
> Key: YARN-9940
> URL: https://issues.apache.org/jira/browse/YARN-9940
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: kailiu_dev
>Priority: Major
> Attachments: YARN-9940.patch
>
>
> 2019-10-16 09:14:51,215 ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception.
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>     at java.util.TimSort.mergeHi(TimSort.java:868)
>     at java.util.TimSort.mergeAt(TimSort.java:485)
>     at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
>     at java.util.TimSort.sort(TimSort.java:223)
>     at java.util.TimSort.sort(TimSort.java:173)
>     at java.util.Arrays.sort(Arrays.java:659)
>     at java.util.Collections.sort(Collections.java:217)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9947) lazy init appLogAggregatorImpl when log aggregation

2019-11-03 Thread Hu Ziqian (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hu Ziqian reassigned YARN-9947:
---

Assignee: Hu Ziqian

> lazy init appLogAggregatorImpl when log aggregation
> ---
>
> Key: YARN-9947
> URL: https://issues.apache.org/jira/browse/YARN-9947
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.1.3
>Reporter: Hu Ziqian
>Assignee: Hu Ziqian
>Priority: Major
>
> In current version, app log aggregator will check



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9947) lazy init appLogAggregatorImpl when log aggregation

2019-11-03 Thread Hu Ziqian (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hu Ziqian updated YARN-9947:

Description: In current version, app log aggregator will check

> lazy init appLogAggregatorImpl when log aggregation
> ---
>
> Key: YARN-9947
> URL: https://issues.apache.org/jira/browse/YARN-9947
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.1.3
>Reporter: Hu Ziqian
>Priority: Major
>
> In current version, app log aggregator will check



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'

2019-11-03 Thread kailiu_dev (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962833#comment-16962833
 ] 

kailiu_dev edited comment on YARN-9940 at 11/4/19 2:57 AM:
---

YARN-8436 May not be suitable for this issure.

because :

1: this is a bug about Fair FairScheduler ContinuousSchedulingThread, may not 
same with about FSParentQueue

2:  node maybe be deleted, and if use {color:#00}TreeSet{color} will occure 
this exception:

{color:#ff}java.util.ConcurrentModificationException{color}
 {color:#ff}  at 
java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909){color}

so: the current core is needed:

List nodeIdList = new ArrayList(nodes.keySet());

synchronized (this)

{   Collections.sort(nodeIdList, nodeAvailableResourceComparator);     }

3:  the other reason {color:#172b4d}not be suitable{color} to use 
{color:#00}TreeSet{color} is

    when: compare(NodeId n1, NodeId n2)

     if  n1==n2    the resultList of TreeSet  only keep one, this will  not 
suitable for continuous schedule, beause some node will not be 
{color:#172b4d}schedule{color}d

{color:#172b4d}4:{color}  in   YARN-9940,  we hold the sechdule lock to avoid  
exception while sorting when some node'Available resources  change  

 

5: Comparison method violates its general is Caused by JDK, from jdk7, Sorting 
algorithm is replaced with "TimSsort",   about the continuous scheduling of 
Fair sechdule , there you can use two solutions:

{color:#ff}    one is : you can see YARN-9940, patch{color}

{color:#ff}    two is:  add jvm option:   
-Djava.util.Arrays.useLegacyMergeSort={color}{color:#d73a49}true{color}

 

 


was (Author: kailiu_dev):
YARN-8436 May not be suitable for this issure.

because :

1:  node maybe be deleted, and if use {color:#00}TreeSet{color} will occure 
this exception:

{color:#FF}java.util.ConcurrentModificationException{color}
{color:#FF}  at 
java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909){color}

so: the current core is needed:

List nodeIdList = new ArrayList(nodes.keySet());

synchronized (this)

{   Collections.sort(nodeIdList, nodeAvailableResourceComparator);     }

2:  the other reason {color:#172b4d}not be suitable{color} to use 
{color:#00}TreeSet{color} is

    when: compare(NodeId n1, NodeId n2)

     if  n1==n2    the resultList of TreeSet  only keep one, this will  not 
suitable for continuous schedule, beause some node will not be 
{color:#172b4d}schedule{color}d

{color:#172b4d}3:{color}  in   YARN-9940,  we hold the sechdule lock to avoid  
exception while sorting when some node'Available resources  change  

 

4: Comparison method violates its general is Caused by JDK, from jdk7, Sorting 
algorithm is replaced with "TimSsort",   about the continuous scheduling of 
Fair sechdule , there you can use two solutions:

{color:#FF}    one is : you can see YARN-9940, patch{color}

{color:#FF}    two is:  add jvm option:   
{color:#33}-Djava.util.Arrays.useLegacyMergeSort={color}{color:#d73a49}true{color}{color}

 

 

> avoid continuous scheduling thread crashes while sorting nodes get 
> 'Comparison method violates its general contract'
> 
>
> Key: YARN-9940
> URL: https://issues.apache.org/jira/browse/YARN-9940
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: kailiu_dev
>Priority: Major
> Attachments: YARN-9940.patch
>
>
> 2019-10-16 09:14:51,215 ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception.
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>     at java.util.TimSort.mergeHi(TimSort.java:868)
>     at java.util.TimSort.mergeAt(TimSort.java:485)
>     at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
>     at java.util.TimSort.sort(TimSort.java:223)
>     at java.util.TimSort.sort(TimSort.java:173)
>     at java.util.Arrays.sort(Arrays.java:659)
>     at java.util.Collections.sort(Collections.java:217)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9947) lazy init appLogAggregatorImpl when log aggregation

2019-11-03 Thread Hu Ziqian (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hu Ziqian updated YARN-9947:

Description: 
This issue introduce an method to lazy init 

 

In current version, app log aggregator will check HDFS and try to create log 
app when init an app. This cause a problem when restart NMs in a  large cluster 
with a heavy hdfs. Restart NM will init all app on a NM and the NM will try to 
connect HDFS. If the HDFS is heavily loaded, many NMs restart at same time will 
let the hdfs not respond. The NM will wait for HDFS's response and RM can't get 
NM's heartbeat and treat all containers as timeout.

In our product environment with 3500+ NMs, we find the NMs restart will put 
heavy pressure on HDFS and the init app's operation is blocked on accessing 
hdfs (stack attached blow), which causes all the  container failed (we can find 
the container number in one NM fall to zero).

!https://teambition-file.alibaba-inc.com/storage/011mcaf1aebf84f02a5d2c2c5fa85af80f5b?download=upload_tfs_by_description.png&Signature=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBcHBJRCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9hcHBJZCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9vcmdhbml6YXRpb25JZCI6IjVjNDA1N2YwYmU4MjViMzkwNjY3YWJlZSIsImV4cCI6MTU3MjgzNzQxMywiaWF0IjoxNTcyODM3MTEzLCJyZXNvdXJjZSI6Ii9zdG9yYWdlLzAxMW1jYWYxYWViZjg0ZjAyYTVkMmMyYzVmYTg1YWY4MGY1YiJ9.JJQoQvjWdAQItQkjtdxa1SnkqJWuij_w2xq2Unoaktg!

!https://teambition-file.alibaba-inc.com/storage/011m873079212ee7fe507ddbe163a0c07fb1?download=upload_tfs_by_description.png&Signature=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBcHBJRCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9hcHBJZCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9vcmdhbml6YXRpb25JZCI6IjVjNDA1N2YwYmU4MjViMzkwNjY3YWJlZSIsImV4cCI6MTU3MjgzNzQxMywiaWF0IjoxNTcyODM3MTEzLCJyZXNvdXJjZSI6Ii9zdG9yYWdlLzAxMW04NzMwNzkyMTJlZTdmZTUwN2RkYmUxNjNhMGMwN2ZiMSJ9.kH73n6bdx8ETXsrWcBGgXGay2WP3z9nzuDlE8-RvQzs!

 

  was:In current version, app log aggregator will check


> lazy init appLogAggregatorImpl when log aggregation
> ---
>
> Key: YARN-9947
> URL: https://issues.apache.org/jira/browse/YARN-9947
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.1.3
>Reporter: Hu Ziqian
>Assignee: Hu Ziqian
>Priority: Major
>
> This issue introduce an method to lazy init 
>  
> In current version, app log aggregator will check HDFS and try to create log 
> app when init an app. This cause a problem when restart NMs in a  large 
> cluster with a heavy hdfs. Restart NM will init all app on a NM and the NM 
> will try to connect HDFS. If the HDFS is heavily loaded, many NMs restart at 
> same time will let the hdfs not respond. The NM will wait for HDFS's response 
> and RM can't get NM's heartbeat and treat all containers as timeout.
> In our product environment with 3500+ NMs, we find the NMs restart will put 
> heavy pressure on HDFS and the init app's operation is blocked on accessing 
> hdfs (stack attached blow), which causes all the  container failed (we can 
> find the container number in one NM fall to zero).
> !https://teambition-file.alibaba-inc.com/storage/011mcaf1aebf84f02a5d2c2c5fa85af80f5b?download=upload_tfs_by_description.png&Signature=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBcHBJRCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9hcHBJZCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9vcmdhbml6YXRpb25JZCI6IjVjNDA1N2YwYmU4MjViMzkwNjY3YWJlZSIsImV4cCI6MTU3MjgzNzQxMywiaWF0IjoxNTcyODM3MTEzLCJyZXNvdXJjZSI6Ii9zdG9yYWdlLzAxMW1jYWYxYWViZjg0ZjAyYTVkMmMyYzVmYTg1YWY4MGY1YiJ9.JJQoQvjWdAQItQkjtdxa1SnkqJWuij_w2xq2Unoaktg!
> !https://teambition-file.alibaba-inc.com/storage/011m873079212ee7fe507ddbe163a0c07fb1?download=upload_tfs_by_description.png&Signature=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBcHBJRCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9hcHBJZCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9vcmdhbml6YXRpb25JZCI6IjVjNDA1N2YwYmU4MjViMzkwNjY3YWJlZSIsImV4cCI6MTU3MjgzNzQxMywiaWF0IjoxNTcyODM3MTEzLCJyZXNvdXJjZSI6Ii9zdG9yYWdlLzAxMW04NzMwNzkyMTJlZTdmZTUwN2RkYmUxNjNhMGMwN2ZiMSJ9.kH73n6bdx8ETXsrWcBGgXGay2WP3z9nzuDlE8-RvQzs!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'

2019-11-03 Thread kailiu_dev (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966386#comment-16966386
 ] 

kailiu_dev commented on YARN-9940:
--

    [~zxu] [~snemeth]   can you please help me review this code? 

> avoid continuous scheduling thread crashes while sorting nodes get 
> 'Comparison method violates its general contract'
> 
>
> Key: YARN-9940
> URL: https://issues.apache.org/jira/browse/YARN-9940
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: kailiu_dev
>Priority: Major
> Attachments: YARN-9940.patch
>
>
> 2019-10-16 09:14:51,215 ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception.
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>     at java.util.TimSort.mergeHi(TimSort.java:868)
>     at java.util.TimSort.mergeAt(TimSort.java:485)
>     at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
>     at java.util.TimSort.sort(TimSort.java:223)
>     at java.util.TimSort.sort(TimSort.java:173)
>     at java.util.Arrays.sort(Arrays.java:659)
>     at java.util.Collections.sort(Collections.java:217)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'

2019-11-03 Thread kailiu_dev (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kailiu_dev updated YARN-9940:
-
Fix Version/s: 2.7.2

> avoid continuous scheduling thread crashes while sorting nodes get 
> 'Comparison method violates its general contract'
> 
>
> Key: YARN-9940
> URL: https://issues.apache.org/jira/browse/YARN-9940
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: kailiu_dev
>Priority: Major
> Fix For: 2.7.2
>
> Attachments: YARN-9940.patch
>
>
> 2019-10-16 09:14:51,215 ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception.
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>     at java.util.TimSort.mergeHi(TimSort.java:868)
>     at java.util.TimSort.mergeAt(TimSort.java:485)
>     at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
>     at java.util.TimSort.sort(TimSort.java:223)
>     at java.util.TimSort.sort(TimSort.java:173)
>     at java.util.Arrays.sort(Arrays.java:659)
>     at java.util.Collections.sort(Collections.java:217)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'

2019-11-03 Thread kailiu_dev (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kailiu_dev updated YARN-9940:
-
Attachment: (was: YARN-9940.patch)

> avoid continuous scheduling thread crashes while sorting nodes get 
> 'Comparison method violates its general contract'
> 
>
> Key: YARN-9940
> URL: https://issues.apache.org/jira/browse/YARN-9940
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: kailiu_dev
>Priority: Major
> Fix For: 2.7.2
>
> Attachments: YARN-9940-branch-2.7.2.001.patch
>
>
> 2019-10-16 09:14:51,215 ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception.
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>     at java.util.TimSort.mergeHi(TimSort.java:868)
>     at java.util.TimSort.mergeAt(TimSort.java:485)
>     at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
>     at java.util.TimSort.sort(TimSort.java:223)
>     at java.util.TimSort.sort(TimSort.java:173)
>     at java.util.Arrays.sort(Arrays.java:659)
>     at java.util.Collections.sort(Collections.java:217)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9947) lazy init appLogAggregatorImpl when log aggregation

2019-11-03 Thread Hu Ziqian (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hu Ziqian updated YARN-9947:

Description: 
This issue introduce an method to lazy init appLogAggregatorImpl, which let it 
access hdfs as later as possible (when the app finish usually), to avoid access 
hdfs at same time when restart NMs in a cluster and reduce hdfs pressure. Lets 
go into the details below. 

In current version, app log aggregator will check HDFS and try to create log 
app when init an app. This cause a problem when restart NMs in a  large cluster 
with a heavy hdfs. Restart NM will init all apps on a NM and the NM will try to 
connect HDFS. If the HDFS is heavily loaded, many NMs restart at same time will 
let the hdfs not respond. The NM will wait for HDFS's response and RM can't get 
NM's heartbeat and treat all containers as timeout.

In our product environment with 3500+ NMs, we find the NMs restart will put 
heavy pressure on HDFS and the init app's operation is blocked on accessing 
hdfs (stack attached blow), which causes all the  container failed (we can find 
the container number in one NM fall to zero).

!https://teambition-file.alibaba-inc.com/storage/011mcaf1aebf84f02a5d2c2c5fa85af80f5b?download=upload_tfs_by_description.png&Signature=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBcHBJRCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9hcHBJZCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9vcmdhbml6YXRpb25JZCI6IjVjNDA1N2YwYmU4MjViMzkwNjY3YWJlZSIsImV4cCI6MTU3MjgzNzQxMywiaWF0IjoxNTcyODM3MTEzLCJyZXNvdXJjZSI6Ii9zdG9yYWdlLzAxMW1jYWYxYWViZjg0ZjAyYTVkMmMyYzVmYTg1YWY4MGY1YiJ9.JJQoQvjWdAQItQkjtdxa1SnkqJWuij_w2xq2Unoaktg!

!https://teambition-file.alibaba-inc.com/storage/011m873079212ee7fe507ddbe163a0c07fb1?download=upload_tfs_by_description.png&Signature=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBcHBJRCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9hcHBJZCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9vcmdhbml6YXRpb25JZCI6IjVjNDA1N2YwYmU4MjViMzkwNjY3YWJlZSIsImV4cCI6MTU3MjgzNzQxMywiaWF0IjoxNTcyODM3MTEzLCJyZXNvdXJjZSI6Ii9zdG9yYWdlLzAxMW04NzMwNzkyMTJlZTdmZTUwN2RkYmUxNjNhMGMwN2ZiMSJ9.kH73n6bdx8ETXsrWcBGgXGay2WP3z9nzuDlE8-RvQzs!

We solve this problem by introduce lazy initialization in appLogAggregatorImpl. 
When init app, we just create appLogAggregatorImpl object with out 
verifyAndCreateRemoteLogDir(). We do the verifyAndCreateRemoteLogDir() when the 
app start aggregate logs. Because apps always are not finish or aggregate log 
at same time, the verifyAndCreateRemoteLogDir will execute dispersedly, which 
means NMs will not access hdfs at same time when they restart at same time.

 

YARN-8418,  solve the container logs' directory leaked problem by add a way to 
update credentials of NM. If we lazy init appLogAggregatorImpl, we don't need 
YARN-8418's logic because the lazy init logic happens after addCredentials 
logic, which means the credentials always refreshed before we use it.

 

In summary, this issue do two things:
 # Introducing a lazy init logic to appLogAggregatorImpl to avoid centralized 
access HDFS when  restart all NMs in a cluster.
 # Reverting YARN-8481 because the lazy init logic guarantee refreshing the 
credentials.

  was:
This issue introduce an method to lazy init 

 

In current version, app log aggregator will check HDFS and try to create log 
app when init an app. This cause a problem when restart NMs in a  large cluster 
with a heavy hdfs. Restart NM will init all app on a NM and the NM will try to 
connect HDFS. If the HDFS is heavily loaded, many NMs restart at same time will 
let the hdfs not respond. The NM will wait for HDFS's response and RM can't get 
NM's heartbeat and treat all containers as timeout.

In our product environment with 3500+ NMs, we find the NMs restart will put 
heavy pressure on HDFS and the init app's operation is blocked on accessing 
hdfs (stack attached blow), which causes all the  container failed (we can find 
the container number in one NM fall to zero).

!https://teambition-file.alibaba-inc.com/storage/011mcaf1aebf84f02a5d2c2c5fa85af80f5b?download=upload_tfs_by_description.png&Signature=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBcHBJRCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9hcHBJZCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9vcmdhbml6YXRpb25JZCI6IjVjNDA1N2YwYmU4MjViMzkwNjY3YWJlZSIsImV4cCI6MTU3MjgzNzQxMywiaWF0IjoxNTcyODM3MTEzLCJyZXNvdXJjZSI6Ii9zdG9yYWdlLzAxMW1jYWYxYWViZjg0ZjAyYTVkMmMyYzVmYTg1YWY4MGY1YiJ9.JJQoQvjWdAQItQkjtdxa1SnkqJWuij_w2xq2Unoaktg!

!https://teambition-file.alibaba-inc.com/storage/011m873079212ee7fe507ddbe163a0c07fb1?download=upload_tfs_by_description.png&Signature=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBcHBJRCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9hcHBJZCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9vcmdhbml6YXRpb25JZCI6IjVjNDA1N2YwYmU4MjViMzkwNjY3YWJlZSIsImV4cCI6MTU3MjgzNzQxMywiaWF0IjoxNTcyODM3MTEzLCJyZXNvdXJjZSI6Ii9zdG9yYWdlLzAxMW04NzMwNzkyMTJlZTdmZTUwN2RkYmUxNjNhMGMwN2ZiMSJ9.kH73n6bdx8ETX

[jira] [Updated] (YARN-9947) lazy init appLogAggregatorImpl when log aggregation

2019-11-03 Thread Hu Ziqian (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hu Ziqian updated YARN-9947:

Description: 
This issue introduce an method to lazy init appLogAggregatorImpl, which let it 
access hdfs as later as possible (when the app finish usually), to avoid access 
hdfs at same time when restart NMs in a cluster and reduce hdfs pressure. Lets 
go into the details below. 

In current version, app log aggregator will check HDFS and try to create log 
app when init an app. This cause a problem when restart NMs in a  large cluster 
with a heavy hdfs. Restart NM will init all apps on a NM and the NM will try to 
connect HDFS. If the HDFS is heavily loaded, many NMs restart at same time will 
let the hdfs not respond. The NM will wait for HDFS's response and RM can't get 
NM's heartbeat and treat all containers as timeout.

In our product environment with 3500+ NMs, we find the NMs restart will put 
heavy pressure on HDFS and the init app's operation is blocked on accessing 
hdfs (stack attached blow), which causes all the  container failed (we can find 
the container number in one NM fall to zero).

!https://teambition-file.alibaba-inc.com/storage/011mcaf1aebf84f02a5d2c2c5fa85af80f5b?download=upload_tfs_by_description.png&Signature=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBcHBJRCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9hcHBJZCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9vcmdhbml6YXRpb25JZCI6IjVjNDA1N2YwYmU4MjViMzkwNjY3YWJlZSIsImV4cCI6MTU3MjgzNzQxMywiaWF0IjoxNTcyODM3MTEzLCJyZXNvdXJjZSI6Ii9zdG9yYWdlLzAxMW1jYWYxYWViZjg0ZjAyYTVkMmMyYzVmYTg1YWY4MGY1YiJ9.JJQoQvjWdAQItQkjtdxa1SnkqJWuij_w2xq2Unoaktg!

!https://teambition-file.alibaba-inc.com/storage/011m873079212ee7fe507ddbe163a0c07fb1?download=upload_tfs_by_description.png&Signature=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBcHBJRCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9hcHBJZCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9vcmdhbml6YXRpb25JZCI6IjVjNDA1N2YwYmU4MjViMzkwNjY3YWJlZSIsImV4cCI6MTU3MjgzNzQxMywiaWF0IjoxNTcyODM3MTEzLCJyZXNvdXJjZSI6Ii9zdG9yYWdlLzAxMW04NzMwNzkyMTJlZTdmZTUwN2RkYmUxNjNhMGMwN2ZiMSJ9.kH73n6bdx8ETXsrWcBGgXGay2WP3z9nzuDlE8-RvQzs!

We solve this problem by introduce lazy initialization in appLogAggregatorImpl. 
When init app, we just create appLogAggregatorImpl object with out 
verifyAndCreateRemoteLogDir(). We do the verifyAndCreateRemoteLogDir() when the 
app start aggregate logs. Because apps always are not finish or aggregate log 
at same time, the verifyAndCreateRemoteLogDir will execute dispersedly, which 
means NMs will not access hdfs at same time when they restart at same time.

 

YARN-8418  solve the container logs' directory leaked problem by add a way to 
update credentials of NM. If we lazy init appLogAggregatorImpl, we don't need 
YARN-8418's logic because the lazy init logic happens after addCredentials 
logic, which means the credentials always refreshed before we use it.

 

In summary, this issue do two things:
 # Introducing a lazy init logic to appLogAggregatorImpl to avoid centralized 
access HDFS when  restart all NMs in a cluster.
 # Reverting YARN-8481 because the lazy init logic guarantee refreshing the 
credentials.

  was:
This issue introduce an method to lazy init appLogAggregatorImpl, which let it 
access hdfs as later as possible (when the app finish usually), to avoid access 
hdfs at same time when restart NMs in a cluster and reduce hdfs pressure. Lets 
go into the details below. 

In current version, app log aggregator will check HDFS and try to create log 
app when init an app. This cause a problem when restart NMs in a  large cluster 
with a heavy hdfs. Restart NM will init all apps on a NM and the NM will try to 
connect HDFS. If the HDFS is heavily loaded, many NMs restart at same time will 
let the hdfs not respond. The NM will wait for HDFS's response and RM can't get 
NM's heartbeat and treat all containers as timeout.

In our product environment with 3500+ NMs, we find the NMs restart will put 
heavy pressure on HDFS and the init app's operation is blocked on accessing 
hdfs (stack attached blow), which causes all the  container failed (we can find 
the container number in one NM fall to zero).

!https://teambition-file.alibaba-inc.com/storage/011mcaf1aebf84f02a5d2c2c5fa85af80f5b?download=upload_tfs_by_description.png&Signature=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBcHBJRCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9hcHBJZCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9vcmdhbml6YXRpb25JZCI6IjVjNDA1N2YwYmU4MjViMzkwNjY3YWJlZSIsImV4cCI6MTU3MjgzNzQxMywiaWF0IjoxNTcyODM3MTEzLCJyZXNvdXJjZSI6Ii9zdG9yYWdlLzAxMW1jYWYxYWViZjg0ZjAyYTVkMmMyYzVmYTg1YWY4MGY1YiJ9.JJQoQvjWdAQItQkjtdxa1SnkqJWuij_w2xq2Unoaktg!

!https://teambition-file.alibaba-inc.com/storage/011m873079212ee7fe507ddbe163a0c07fb1?download=upload_tfs_by_description.png&Signature=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBcHBJRCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9hcHBJZCI6IjVjZDkwOTdmY

[jira] [Updated] (YARN-9947) lazy init appLogAggregatorImpl when log aggregation

2019-11-03 Thread Hu Ziqian (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hu Ziqian updated YARN-9947:

Attachment: YARN-9947.001.patch

> lazy init appLogAggregatorImpl when log aggregation
> ---
>
> Key: YARN-9947
> URL: https://issues.apache.org/jira/browse/YARN-9947
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.1.3
>Reporter: Hu Ziqian
>Assignee: Hu Ziqian
>Priority: Major
> Attachments: YARN-9947.001.patch
>
>
> This issue introduce an method to lazy init appLogAggregatorImpl, which let 
> it access hdfs as later as possible (when the app finish usually), to avoid 
> access hdfs at same time when restart NMs in a cluster and reduce hdfs 
> pressure. Lets go into the details below. 
> In current version, app log aggregator will check HDFS and try to create log 
> app when init an app. This cause a problem when restart NMs in a  large 
> cluster with a heavy hdfs. Restart NM will init all apps on a NM and the NM 
> will try to connect HDFS. If the HDFS is heavily loaded, many NMs restart at 
> same time will let the hdfs not respond. The NM will wait for HDFS's response 
> and RM can't get NM's heartbeat and treat all containers as timeout.
> In our product environment with 3500+ NMs, we find the NMs restart will put 
> heavy pressure on HDFS and the init app's operation is blocked on accessing 
> hdfs (stack attached blow), which causes all the  container failed (we can 
> find the container number in one NM fall to zero).
> !https://teambition-file.alibaba-inc.com/storage/011mcaf1aebf84f02a5d2c2c5fa85af80f5b?download=upload_tfs_by_description.png&Signature=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBcHBJRCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9hcHBJZCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9vcmdhbml6YXRpb25JZCI6IjVjNDA1N2YwYmU4MjViMzkwNjY3YWJlZSIsImV4cCI6MTU3MjgzNzQxMywiaWF0IjoxNTcyODM3MTEzLCJyZXNvdXJjZSI6Ii9zdG9yYWdlLzAxMW1jYWYxYWViZjg0ZjAyYTVkMmMyYzVmYTg1YWY4MGY1YiJ9.JJQoQvjWdAQItQkjtdxa1SnkqJWuij_w2xq2Unoaktg!
> !https://teambition-file.alibaba-inc.com/storage/011m873079212ee7fe507ddbe163a0c07fb1?download=upload_tfs_by_description.png&Signature=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBcHBJRCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9hcHBJZCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9vcmdhbml6YXRpb25JZCI6IjVjNDA1N2YwYmU4MjViMzkwNjY3YWJlZSIsImV4cCI6MTU3MjgzNzQxMywiaWF0IjoxNTcyODM3MTEzLCJyZXNvdXJjZSI6Ii9zdG9yYWdlLzAxMW04NzMwNzkyMTJlZTdmZTUwN2RkYmUxNjNhMGMwN2ZiMSJ9.kH73n6bdx8ETXsrWcBGgXGay2WP3z9nzuDlE8-RvQzs!
> We solve this problem by introduce lazy initialization in 
> appLogAggregatorImpl. When init app, we just create appLogAggregatorImpl 
> object with out verifyAndCreateRemoteLogDir(). We do the 
> verifyAndCreateRemoteLogDir() when the app start aggregate logs. Because apps 
> always are not finish or aggregate log at same time, the 
> verifyAndCreateRemoteLogDir will execute dispersedly, which means NMs will 
> not access hdfs at same time when they restart at same time.
>  
> YARN-8418  solve the container logs' directory leaked problem by add a way to 
> update credentials of NM. If we lazy init appLogAggregatorImpl, we don't need 
> YARN-8418's logic because the lazy init logic happens after addCredentials 
> logic, which means the credentials always refreshed before we use it.
>  
> In summary, this issue do two things:
>  # Introducing a lazy init logic to appLogAggregatorImpl to avoid centralized 
> access HDFS when  restart all NMs in a cluster.
>  # Reverting YARN-8481 because the lazy init logic guarantee refreshing the 
> credentials.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'

2019-11-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966392#comment-16966392
 ] 

Hadoop QA commented on YARN-9940:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 10s{color} 
| {color:red} YARN-9940 does not apply to branch-2.7.2. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-9940 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984753/YARN-9940-branch-2.7.2.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25084/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> avoid continuous scheduling thread crashes while sorting nodes get 
> 'Comparison method violates its general contract'
> 
>
> Key: YARN-9940
> URL: https://issues.apache.org/jira/browse/YARN-9940
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: kailiu_dev
>Priority: Major
> Fix For: 2.7.2
>
> Attachments: YARN-9940-branch-2.7.2.001.patch
>
>
> 2019-10-16 09:14:51,215 ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception.
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>     at java.util.TimSort.mergeHi(TimSort.java:868)
>     at java.util.TimSort.mergeAt(TimSort.java:485)
>     at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
>     at java.util.TimSort.sort(TimSort.java:223)
>     at java.util.TimSort.sort(TimSort.java:173)
>     at java.util.Arrays.sort(Arrays.java:659)
>     at java.util.Collections.sort(Collections.java:217)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9948) Remove attempts that are beyond max-attempt limit from RMAppImpl

2019-11-03 Thread Hu Ziqian (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hu Ziqian updated YARN-9948:

Description: 
RM will store app attempt in both state store and RMAppImpl. YARN-3480 removes 
attempts that are beyond max-attempt limit from state store.  In this issue we 
delete those attempts in RMAppImpl the reduce decrease memory usage of RM.

We introduce flag yarn.resourcemanager.am.delete-old-attempts.enabled to enable 
this logic, default value is false.

> Remove attempts that are beyond max-attempt limit from RMAppImpl
> 
>
> Key: YARN-9948
> URL: https://issues.apache.org/jira/browse/YARN-9948
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.3
>Reporter: Hu Ziqian
>Priority: Major
>
> RM will store app attempt in both state store and RMAppImpl. YARN-3480 
> removes attempts that are beyond max-attempt limit from state store.  In this 
> issue we delete those attempts in RMAppImpl the reduce decrease memory usage 
> of RM.
> We introduce flag yarn.resourcemanager.am.delete-old-attempts.enabled to 
> enable this logic, default value is false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9948) Remove attempts that are beyond max-attempt limit from RMAppImpl

2019-11-03 Thread Hu Ziqian (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hu Ziqian updated YARN-9948:

Attachment: YARN-9948.001.patch

> Remove attempts that are beyond max-attempt limit from RMAppImpl
> 
>
> Key: YARN-9948
> URL: https://issues.apache.org/jira/browse/YARN-9948
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.3
>Reporter: Hu Ziqian
>Priority: Major
> Attachments: YARN-9948.001.patch
>
>
> RM will store app attempt in both state store and RMAppImpl. YARN-3480 
> removes attempts that are beyond max-attempt limit from state store.  In this 
> issue we delete those attempts in RMAppImpl the reduce decrease memory usage 
> of RM.
> We introduce flag yarn.resourcemanager.am.delete-old-attempts.enabled to 
> enable this logic, default value is false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9950) Unset Ordering Policy of Leaf/Parent queue converted from Parent/Leaf queue respectively

2019-11-03 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966397#comment-16966397
 ] 

Sunil G commented on YARN-9950:
---

Hi [~prabhujoseph]

1. I think following fix may be incomplete.
{code:java}
// Unset Ordering Policy of Parent Queue converted from   
// Leaf Queue after addQueue 
String parentQueueOrderingPolicy = CapacitySchedulerConfiguration.PREFIX
  + parentQueue + CapacitySchedulerConfiguration.DOT + ORDERING_POLICY;
if (siblingQueues.size() == 1) {
   proposedConf.unset(parentQueueOrderingPolicy);
   confUpdate.put(parentQueueOrderingPolicy, null);
}{code}
When an existing parent queue has set ordering policy as PRIORITY based, and 
then a new child queue is added to that parent queue, the above mentioned code 
can unset the ordering policy to Resource based. This is incorrect.

2. For parent queue,
{code:java}
// Unset Ordering Policy of Leaf Queue converted from
// Parent Queue after removeQueue
String leafQueueOrderingPolicy = CapacitySchedulerConfiguration.PREFIX
  + parentQueuePath + CapacitySchedulerConfiguration.DOT
  + ORDERING_POLICY;
proposedConf.unset(leafQueueOrderingPolicy);
confUpdate.put(leafQueueOrderingPolicy, null); {code}
This doesnt seem accurate enough (naming). We are now removing a queue from a 
parent, hence we should call leafQueueOrderingPolicy =>  
parentQueueOrderingPolicy

> Unset Ordering Policy of Leaf/Parent queue converted from Parent/Leaf queue 
> respectively
> 
>
> Key: YARN-9950
> URL: https://issues.apache.org/jira/browse/YARN-9950
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9950-001.patch
>
>
> RM fails to start when adding a queue (say root.A.A1) under a leaf queue (say 
> root.A) with ordering policy fifo.
> YARN supports fifo or fair for leaf queue and utilization or 
> priority-utilization for parent queue. When the existing leaf queue (root.A) 
> becomes parent queue - the ordering policy (fifo or fair) has to be unset. 
> Else YARN RM will fail as fifo or fair is not a valid queue ordering policy 
> for parent queue.
> Similarly while removing a queue, unset ordering policy of leaf queue which 
> converted from parent queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9947) lazy init appLogAggregatorImpl when log aggregation

2019-11-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966409#comment-16966409
 ] 

Hadoop QA commented on YARN-9947:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 41s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 27s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 10 new + 269 unchanged - 8 fixed = 279 total (was 277) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 47s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 
29s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 74m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9947 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984754/YARN-9947.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7e5d28bf1865 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d462308 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/25085/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25085/testReport/ |
| Max. process+thread count | 464 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager

[jira] [Updated] (YARN-9950) Unset Ordering Policy of Leaf/Parent queue converted from Parent/Leaf queue respectively

2019-11-03 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9950:

Attachment: YARN-9950-002.patch

> Unset Ordering Policy of Leaf/Parent queue converted from Parent/Leaf queue 
> respectively
> 
>
> Key: YARN-9950
> URL: https://issues.apache.org/jira/browse/YARN-9950
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9950-001.patch, YARN-9950-002.patch
>
>
> RM fails to start when adding a queue (say root.A.A1) under a leaf queue (say 
> root.A) with ordering policy fifo.
> YARN supports fifo or fair for leaf queue and utilization or 
> priority-utilization for parent queue. When the existing leaf queue (root.A) 
> becomes parent queue - the ordering policy (fifo or fair) has to be unset. 
> Else YARN RM will fail as fifo or fair is not a valid queue ordering policy 
> for parent queue.
> Similarly while removing a queue, unset ordering policy of leaf queue which 
> converted from parent queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9950) Unset Ordering Policy of Leaf/Parent queue converted from Parent/Leaf queue respectively

2019-11-03 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966415#comment-16966415
 ] 

Prabhu Joseph commented on YARN-9950:
-

Thanks [~sunilg] for reviewing.

1. The below condition will prevent the case (1). The {{siblingQueues}} size 
will be 1 only when a queue is added under a leaf queue.

*if (siblingQueues.size() == 1) {*

2. Yes it is not accurate, have changed the name to {{queueOrderingPolicy}} to 
avoid confusion.

> Unset Ordering Policy of Leaf/Parent queue converted from Parent/Leaf queue 
> respectively
> 
>
> Key: YARN-9950
> URL: https://issues.apache.org/jira/browse/YARN-9950
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9950-001.patch, YARN-9950-002.patch
>
>
> RM fails to start when adding a queue (say root.A.A1) under a leaf queue (say 
> root.A) with ordering policy fifo.
> YARN supports fifo or fair for leaf queue and utilization or 
> priority-utilization for parent queue. When the existing leaf queue (root.A) 
> becomes parent queue - the ordering policy (fifo or fair) has to be unset. 
> Else YARN RM will fail as fifo or fair is not a valid queue ordering policy 
> for parent queue.
> Similarly while removing a queue, unset ordering policy of leaf queue which 
> converted from parent queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9950) Unset Ordering Policy of Leaf/Parent queue converted from Parent/Leaf queue respectively

2019-11-03 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966418#comment-16966418
 ] 

Sunil G commented on YARN-9950:
---

Thanks [~prabhujoseph] 

This makes sense. +1 from me, pending jenkins.

> Unset Ordering Policy of Leaf/Parent queue converted from Parent/Leaf queue 
> respectively
> 
>
> Key: YARN-9950
> URL: https://issues.apache.org/jira/browse/YARN-9950
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9950-001.patch, YARN-9950-002.patch
>
>
> RM fails to start when adding a queue (say root.A.A1) under a leaf queue (say 
> root.A) with ordering policy fifo.
> YARN supports fifo or fair for leaf queue and utilization or 
> priority-utilization for parent queue. When the existing leaf queue (root.A) 
> becomes parent queue - the ordering policy (fifo or fair) has to be unset. 
> Else YARN RM will fail as fifo or fair is not a valid queue ordering policy 
> for parent queue.
> Similarly while removing a queue, unset ordering policy of leaf queue which 
> converted from parent queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'

2019-11-03 Thread kailiu_dev (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kailiu_dev updated YARN-9940:
-
Attachment: (was: YARN-9940-branch-2.7.2.001.patch)

> avoid continuous scheduling thread crashes while sorting nodes get 
> 'Comparison method violates its general contract'
> 
>
> Key: YARN-9940
> URL: https://issues.apache.org/jira/browse/YARN-9940
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: kailiu_dev
>Priority: Major
> Fix For: 2.7.2
>
>
> 2019-10-16 09:14:51,215 ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception.
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>     at java.util.TimSort.mergeHi(TimSort.java:868)
>     at java.util.TimSort.mergeAt(TimSort.java:485)
>     at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
>     at java.util.TimSort.sort(TimSort.java:223)
>     at java.util.TimSort.sort(TimSort.java:173)
>     at java.util.Arrays.sort(Arrays.java:659)
>     at java.util.Collections.sort(Collections.java:217)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9948) Remove attempts that are beyond max-attempt limit from RMAppImpl

2019-11-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966437#comment-16966437
 ] 

Hadoop QA commented on YARN-9948:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
43s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
50s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 23s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 14 new + 328 unchanged - 0 fixed = 342 total (was 328) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 32s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
0s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
56s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 83m 
20s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
45s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}174m 39s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9948 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984755/YARN-9948.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux bade0de99ce6 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 201

[jira] [Commented] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'

2019-11-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966439#comment-16966439
 ] 

Hadoop QA commented on YARN-9940:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 11s{color} 
| {color:red} YARN-9940 does not apply to branch-2.7.2. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-9940 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984770/YARN-9940-branch-2.7.2.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25088/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> avoid continuous scheduling thread crashes while sorting nodes get 
> 'Comparison method violates its general contract'
> 
>
> Key: YARN-9940
> URL: https://issues.apache.org/jira/browse/YARN-9940
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: kailiu_dev
>Priority: Major
> Fix For: 2.7.2
>
> Attachments: YARN-9940-branch-2.7.2.001.patch
>
>
> 2019-10-16 09:14:51,215 ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception.
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>     at java.util.TimSort.mergeHi(TimSort.java:868)
>     at java.util.TimSort.mergeAt(TimSort.java:485)
>     at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
>     at java.util.TimSort.sort(TimSort.java:223)
>     at java.util.TimSort.sort(TimSort.java:173)
>     at java.util.Arrays.sort(Arrays.java:659)
>     at java.util.Collections.sort(Collections.java:217)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9948) Remove attempts that are beyond max-attempt limit from RMAppImpl

2019-11-03 Thread Hu Ziqian (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966447#comment-16966447
 ] 

Hu Ziqian commented on YARN-9948:
-

[~taoyang], could you help me to review this issue?

> Remove attempts that are beyond max-attempt limit from RMAppImpl
> 
>
> Key: YARN-9948
> URL: https://issues.apache.org/jira/browse/YARN-9948
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.3
>Reporter: Hu Ziqian
>Priority: Major
> Attachments: YARN-9948.001.patch
>
>
> RM will store app attempt in both state store and RMAppImpl. YARN-3480 
> removes attempts that are beyond max-attempt limit from state store.  In this 
> issue we delete those attempts in RMAppImpl the reduce decrease memory usage 
> of RM.
> We introduce flag yarn.resourcemanager.am.delete-old-attempts.enabled to 
> enable this logic, default value is false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9947) lazy init appLogAggregatorImpl when log aggregation

2019-11-03 Thread Hu Ziqian (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966448#comment-16966448
 ] 

Hu Ziqian commented on YARN-9947:
-

[~taoyang], [~leftnoteasy] could you help me to review this issue?

> lazy init appLogAggregatorImpl when log aggregation
> ---
>
> Key: YARN-9947
> URL: https://issues.apache.org/jira/browse/YARN-9947
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.1.3
>Reporter: Hu Ziqian
>Assignee: Hu Ziqian
>Priority: Major
> Attachments: YARN-9947.001.patch
>
>
> This issue introduce an method to lazy init appLogAggregatorImpl, which let 
> it access hdfs as later as possible (when the app finish usually), to avoid 
> access hdfs at same time when restart NMs in a cluster and reduce hdfs 
> pressure. Lets go into the details below. 
> In current version, app log aggregator will check HDFS and try to create log 
> app when init an app. This cause a problem when restart NMs in a  large 
> cluster with a heavy hdfs. Restart NM will init all apps on a NM and the NM 
> will try to connect HDFS. If the HDFS is heavily loaded, many NMs restart at 
> same time will let the hdfs not respond. The NM will wait for HDFS's response 
> and RM can't get NM's heartbeat and treat all containers as timeout.
> In our product environment with 3500+ NMs, we find the NMs restart will put 
> heavy pressure on HDFS and the init app's operation is blocked on accessing 
> hdfs (stack attached blow), which causes all the  container failed (we can 
> find the container number in one NM fall to zero).
> !https://teambition-file.alibaba-inc.com/storage/011mcaf1aebf84f02a5d2c2c5fa85af80f5b?download=upload_tfs_by_description.png&Signature=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBcHBJRCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9hcHBJZCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9vcmdhbml6YXRpb25JZCI6IjVjNDA1N2YwYmU4MjViMzkwNjY3YWJlZSIsImV4cCI6MTU3MjgzNzQxMywiaWF0IjoxNTcyODM3MTEzLCJyZXNvdXJjZSI6Ii9zdG9yYWdlLzAxMW1jYWYxYWViZjg0ZjAyYTVkMmMyYzVmYTg1YWY4MGY1YiJ9.JJQoQvjWdAQItQkjtdxa1SnkqJWuij_w2xq2Unoaktg!
> !https://teambition-file.alibaba-inc.com/storage/011m873079212ee7fe507ddbe163a0c07fb1?download=upload_tfs_by_description.png&Signature=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBcHBJRCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9hcHBJZCI6IjVjZDkwOTdmYjNhNDMyMjk3OTBhN2EyZiIsIl9vcmdhbml6YXRpb25JZCI6IjVjNDA1N2YwYmU4MjViMzkwNjY3YWJlZSIsImV4cCI6MTU3MjgzNzQxMywiaWF0IjoxNTcyODM3MTEzLCJyZXNvdXJjZSI6Ii9zdG9yYWdlLzAxMW04NzMwNzkyMTJlZTdmZTUwN2RkYmUxNjNhMGMwN2ZiMSJ9.kH73n6bdx8ETXsrWcBGgXGay2WP3z9nzuDlE8-RvQzs!
> We solve this problem by introduce lazy initialization in 
> appLogAggregatorImpl. When init app, we just create appLogAggregatorImpl 
> object with out verifyAndCreateRemoteLogDir(). We do the 
> verifyAndCreateRemoteLogDir() when the app start aggregate logs. Because apps 
> always are not finish or aggregate log at same time, the 
> verifyAndCreateRemoteLogDir will execute dispersedly, which means NMs will 
> not access hdfs at same time when they restart at same time.
>  
> YARN-8418  solve the container logs' directory leaked problem by add a way to 
> update credentials of NM. If we lazy init appLogAggregatorImpl, we don't need 
> YARN-8418's logic because the lazy init logic happens after addCredentials 
> logic, which means the credentials always refreshed before we use it.
>  
> In summary, this issue do two things:
>  # Introducing a lazy init logic to appLogAggregatorImpl to avoid centralized 
> access HDFS when  restart all NMs in a cluster.
>  # Reverting YARN-8481 because the lazy init logic guarantee refreshing the 
> credentials.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org