date:20190328

[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler

2019-03-28 Thread Tao Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803659#comment-16803659
 ] 

Tao Yang commented on YARN-9413:


{quote}

 Can you submit the patch to trigger jenkins job?

{quote}

Sure, already triggered. Thanks [~cheersyang] for the review.

> Queue resource leak after app fail for CapacityScheduler
> 
>
> Key: YARN-9413
> URL: https://issues.apache.org/jira/browse/YARN-9413
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.1.2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9413.001.patch
>
>
> To reproduce this problem:
>  # Submit an app which is configured to keep containers across app attempts 
> and should fail after AM finished at first time (am-max-attempts=1).
>  # App is started with 2 containers running on NM1 node.
>  # Fail the AM of the application with PREEMPTED exit status which should not 
> count towards max attempt retry but app will fail immediately.
>  # Used resource of this queue leaks after app fail.
> The root cause is the inconsistency of handling app attempt failure between 
> RMAppAttemptImpl$BaseFinalTransition#transition and 
> RMAppImpl$AttemptFailedTransition#transition:
>  # After app fail, RMAppFailedAttemptEvent will be sent in 
> RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM 
> container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it 
> will not count towards max attempt retry, so that it will send 
> AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and 
> RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true.
>  # RMAppImpl$AttemptFailedTransition#transition handle 
> RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1.
>  # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in 
> CapcityScheduler#doneApplicationAttempt, it will skip killing and calling 
> completion process for containers belong to this app, so that queue resource 
> leak happens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler

2019-03-28 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803694#comment-16803694
 ] 

Hadoop QA commented on YARN-9413:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 27s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
39s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
39s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 39s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 34s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 163 unchanged - 1 fixed = 164 total (was 164) 
{color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
41s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  3m 
56s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
30s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 41s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 41m 53s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9413 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12963827/YARN-9413.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8a5d1a658b61 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8a59efe |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-YARN-Build/23822/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| compile | 
https://builds.apache.org/job/PreCommit-YARN-Build/23822/artifact/out/patch-co

[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2

2019-03-28 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803726#comment-16803726
 ] 

Hadoop QA commented on YARN-8200:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 73 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
53s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 
19s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
23s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
26s{color} | {color:green} branch-2 passed with JDK v1.8.0_191 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 6s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 12m 
10s{color} | {color:green} branch-2 passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-yarn-project/hadoop-yarn 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  9m 
11s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
51s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m 
37s{color} | {color:green} branch-2 passed with JDK v1.8.0_191 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
22s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 11m 
22s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 11m 22s{color} 
| {color:red} root-jdk1.7.0_95 with JDK v1.7.0_95 generated 2 new + 1441 
unchanged - 2 fixed = 1443 total (was 1443) {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
26s{color} | {color:green} the patch passed with JDK v1.8.0_191 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 10m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 
26s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 21s{color} | {color:orange} root: The patch generated 194 new + 3977 
unchanged - 73 fixed = 4171 total (was 4050) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
12s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 524 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m 
17s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{c

[jira] [Updated] (YARN-9270) Minor cleanup in TestFpgaDiscoverer

2019-03-28 Thread Peter Bacsko (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9270:
---
Attachment: YARN-9270-004.patch

> Minor cleanup in TestFpgaDiscoverer
> ---
>
> Key: YARN-9270
> URL: https://issues.apache.org/jira/browse/YARN-9270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9270-001.patch, YARN-9270-002.patch, 
> YARN-9270-003.patch, YARN-9270-004.patch
>
>
> Let's do some cleanup in this class.
> * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split 
> up to 5 different tests, because it tests 5 different scenarios.
> * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a 
> {{Function}} in the plugin class like {{Function envProvider 
> = System::getenv()}} plus a setter method which allows the test to modify 
> {{envProvider}}. Much simpler and straightfoward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer

2019-03-28 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803847#comment-16803847
 ] 

Hadoop QA commented on YARN-9270:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 24s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 9 new + 34 unchanged - 5 fixed = 43 total (was 39) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 40s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
45s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 73m 32s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9270 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12964003/YARN-9270-004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 95f3acbcf759 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8a59efe |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/23823/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23823/testReport/ |
| Max. process+thread count | 308 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nod

[jira] [Updated] (YARN-9270) Minor cleanup in TestFpgaDiscoverer

2019-03-28 Thread Peter Bacsko (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9270:
---
Attachment: YARN-9270-005.patch

> Minor cleanup in TestFpgaDiscoverer
> ---
>
> Key: YARN-9270
> URL: https://issues.apache.org/jira/browse/YARN-9270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9270-001.patch, YARN-9270-002.patch, 
> YARN-9270-003.patch, YARN-9270-004.patch, YARN-9270-005.patch
>
>
> Let's do some cleanup in this class.
> * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split 
> up to 5 different tests, because it tests 5 different scenarios.
> * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a 
> {{Function}} in the plugin class like {{Function envProvider 
> = System::getenv()}} plus a setter method which allows the test to modify 
> {{envProvider}}. Much simpler and straightfoward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer

2019-03-28 Thread Peter Bacsko (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803859#comment-16803859
 ] 

Peter Bacsko commented on YARN-9270:


Fixed the remaining checkstyle problems in patch v5.

> Minor cleanup in TestFpgaDiscoverer
> ---
>
> Key: YARN-9270
> URL: https://issues.apache.org/jira/browse/YARN-9270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9270-001.patch, YARN-9270-002.patch, 
> YARN-9270-003.patch, YARN-9270-004.patch, YARN-9270-005.patch
>
>
> Let's do some cleanup in this class.
> * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split 
> up to 5 different tests, because it tests 5 different scenarios.
> * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a 
> {{Function}} in the plugin class like {{Function envProvider 
> = System::getenv()}} plus a setter method which allows the test to modify 
> {{envProvider}}. Much simpler and straightfoward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9235) If linux container executor is not set for a GPU cluster GpuResourceHandlerImpl is not initialized and NPE is thrown

2019-03-28 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/YARN-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Bálint Steinbach updated YARN-9235:
-
Attachment: YARN-9235.004.patch

> If linux container executor is not set for a GPU cluster 
> GpuResourceHandlerImpl is not initialized and NPE is thrown
> 
>
> Key: YARN-9235
> URL: https://issues.apache.org/jira/browse/YARN-9235
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Major
> Attachments: YARN-9235.001.patch, YARN-9235.002.patch, 
> YARN-9235.003.patch, YARN-9235.004.patch
>
>
> If GPU plugin is enabled for the NodeManager, it is possible to run jobs with 
> GPU.
> However, if LinuxContainerExecutor is not configured, an NPE is thrown when 
> calling 
> {code:java}
> GpuResourcePlugin.getNMResourceInfo{code}
> Also, there are no warns in the log if GPU is misconfigured like this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9414) Application Catalog for YARN applications

2019-03-28 Thread Adam Antal (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803872#comment-16803872
 ] 

Adam Antal commented on YARN-9414:
--

Thanks [~eyang]!

> Application Catalog for YARN applications
> -
>
> Key: YARN-9414
> URL: https://issues.apache.org/jira/browse/YARN-9414
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN Appstore.pdf, YARN-Application-Catalog.pdf
>
>
> YARN native services provides web services API to improve usability of 
> application deployment on Hadoop using collection of docker images.  It would 
> be nice to have an application catalog system which provides an editorial and 
> search interface for YARN applications.  This improves usability of YARN for 
> manage the life cycle of applications.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9418) ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics

2019-03-28 Thread Prabhu Joseph (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9418:

Description: 
ATSV2 entities rest api does not show the metrics
{code:java}
[hbase@yarn-ats-3 centos]$ curl -s 
"http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase&fields=METRICS";
 | jq .
{
"metrics": [],
"events": [],
"createdtime": 1553695002014,
"idprefix": 0,
"type": "YARN_CONTAINER",
"id": "container_e18_1553685341603_0006_01_01",
"info": {
"UID": 
"ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01",
"FROM_ID": 
"ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01"
},
"configs": {},
"isrelatedto": {},
"relatesto": {}
}{code}
NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this 
is not shown in above output. Found NM container entities are set with 
entityIdPrefix as inverted container starttime whereas RM container entities 
are set with default 0. TimelineReader fetches only RM container entries.

Confirmed with setting NM container entities entityIdPrefix to 0 same as RM 
(for testing purpose) and found metrics are shown.
{code:java}
"metrics": [
{
"type": "SINGLE_VALUE",
"id": "MEMORY",
"aggregationOp": "NOP",
"values": {
"1553774981355": 490430464
}
},
{
"type": "SINGLE_VALUE",
"id": "CPU",
"aggregationOp": "NOP",
"values": {
"1553774981355": 5
}
}
]{code}
 

  was:
ATSV2 entities rest api does not show the metrics
{code:java}
[hbase@yarn-ats-3 centos]$ curl -s 
"http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase&fields=METRICS";
 | jq .
{
"metrics": [],
"events": [],
"createdtime": 1553695002014,
"idprefix": 0,
"type": "YARN_CONTAINER",
"id": "container_e18_1553685341603_0006_01_01",
"info": {
"UID": 
"ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01",
"FROM_ID": 
"ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01"
},
"configs": {},
"isrelatedto": {},
"relatesto": {}
}{code}
NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this 
is not shown in above output. Found NM container entries are updated with right 
flowRunId (startTime of the job) whereas RM container entries are updated with 
default 0. TimelineReader fetches only rows which are updated by RM (i.e, 
rowkeys with flowRunId 0).


> ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics
> ---
>
> Key: YARN-9418
> URL: https://issues.apache.org/jira/browse/YARN-9418
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Critical
>
> ATSV2 entities rest api does not show the metrics
> {code:java}
> [hbase@yarn-ats-3 centos]$ curl -s 
> "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase&fields=METRICS";
>  | jq .
> {
> "metrics": [],
> "events": [],
> "createdtime": 1553695002014,
> "idprefix": 0,
> "type": "YARN_CONTAINER",
> "id": "container_e18_1553685341603_0006_01_01",
> "info": {
> "UID": 
> "ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01",
> "FROM_ID": 
> "ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01"
> },
> "configs": {},
> "isrelatedto": {},
> "relatesto": {}
> }{code}
> NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this 
> is not shown in above output. Found NM container entities are set with 
> entityIdPrefix as inverted container starttime whereas RM container entities 
> are set with default 0. TimelineReader fetches only RM container entries.
> Confirmed with setting NM container entities entityIdPrefix to 0 same as RM 
> (for testing purpose) and found metrics are shown.
> {code:java}
> "metrics": [
> {
> "type": "SINGLE_VALUE",
> "id": "MEMORY",
> "aggregationOp": "NOP",
> "values": {
> "1553774981355": 490430464
> }
> },
> {
> "type": "SINGLE_VALUE",
> "id": "CPU",
> "aggregationOp": "NOP",
> "values": {
> "1553774981355": 5
> }
> }
> ]{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail

[jira] [Commented] (YARN-9235) If linux container executor is not set for a GPU cluster GpuResourceHandlerImpl is not initialized and NPE is thrown

2019-03-28 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803881#comment-16803881
 ] 

Szilard Nemeth commented on YARN-9235:
--

Hi [~bsteinbach]!

Do you need help with the review?

 

> If linux container executor is not set for a GPU cluster 
> GpuResourceHandlerImpl is not initialized and NPE is thrown
> 
>
> Key: YARN-9235
> URL: https://issues.apache.org/jira/browse/YARN-9235
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Major
> Attachments: YARN-9235.001.patch, YARN-9235.002.patch, 
> YARN-9235.003.patch, YARN-9235.004.patch
>
>
> If GPU plugin is enabled for the NodeManager, it is possible to run jobs with 
> GPU.
> However, if LinuxContainerExecutor is not configured, an NPE is thrown when 
> calling 
> {code:java}
> GpuResourcePlugin.getNMResourceInfo{code}
> Also, there are no warns in the log if GPU is misconfigured like this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9235) If linux container executor is not set for a GPU cluster GpuResourceHandlerImpl is not initialized and NPE is thrown

2019-03-28 Thread JIRA



[ 
https://issues.apache.org/jira/browse/YARN-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803882#comment-16803882
 ] 

Antal Bálint Steinbach commented on YARN-9235:
--

Hi,

LOGGER is renamed to LOG.

I agree that testing general exceptions need to be checked, but testing its 
error message is not the way to do that for several reasons. (parameters in the 
text, i18n, etc..)

I do not want to change the exception itself, as this was a small refactor and 
NPE fix. So I think this small tradeoff is acceptable in the test.

[~sunilg], [~tangzhankun] can you please push?

 

> If linux container executor is not set for a GPU cluster 
> GpuResourceHandlerImpl is not initialized and NPE is thrown
> 
>
> Key: YARN-9235
> URL: https://issues.apache.org/jira/browse/YARN-9235
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Major
> Attachments: YARN-9235.001.patch, YARN-9235.002.patch, 
> YARN-9235.003.patch, YARN-9235.004.patch
>
>
> If GPU plugin is enabled for the NodeManager, it is possible to run jobs with 
> GPU.
> However, if LinuxContainerExecutor is not configured, an NPE is thrown when 
> calling 
> {code:java}
> GpuResourcePlugin.getNMResourceInfo{code}
> Also, there are no warns in the log if GPU is misconfigured like this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9235) If linux container executor is not set for a GPU cluster GpuResourceHandlerImpl is not initialized and NPE is thrown

2019-03-28 Thread JIRA



[ 
https://issues.apache.org/jira/browse/YARN-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803896#comment-16803896
 ] 

Antal Bálint Steinbach commented on YARN-9235:
--

[~snemeth] , it already has a +1, thanks

> If linux container executor is not set for a GPU cluster 
> GpuResourceHandlerImpl is not initialized and NPE is thrown
> 
>
> Key: YARN-9235
> URL: https://issues.apache.org/jira/browse/YARN-9235
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Major
> Attachments: YARN-9235.001.patch, YARN-9235.002.patch, 
> YARN-9235.003.patch, YARN-9235.004.patch
>
>
> If GPU plugin is enabled for the NodeManager, it is possible to run jobs with 
> GPU.
> However, if LinuxContainerExecutor is not configured, an NPE is thrown when 
> calling 
> {code:java}
> GpuResourcePlugin.getNMResourceInfo{code}
> Also, there are no warns in the log if GPU is misconfigured like this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-9423) Optimize AM launcher to avoid bottleneck when a large number of AM failover happen at the same time

2019-03-28 Thread Tao Yang (JIRA)

Tao Yang created YARN-9423:
--

 Summary: Optimize AM launcher to avoid bottleneck when a large 
number of AM failover happen at the same time
 Key: YARN-9423
 URL: https://issues.apache.org/jira/browse/YARN-9423
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 3.2.0
Reporter: Tao Yang
Assignee: Tao Yang


We have met a slow recovery for applications when many NM lost happen at the 
same time:
 # many NM shut down at the same time abnormally.
 # NM expired, then a large number of AM start failover.
 # AM containers are allocated but not launched for about half an hour.

Among this slow recovery, all ApplicationMasterLauncher threads were calling 
cleanup for containers on these lost nodes and keep retrying to communicate 
with NM for 3 minutes(retry policy is configured in NMProxy#createNMProxy) even 
though RM had known these NM are lost and probably can't be connected for a 
long time. Meanwhile many AM cleanup and launch operations were still waiting 
in queue (ApplicationMasterLauncher#masterEvents). Obviously AM launch 
operations were blocked by cleanup operations which are wasting 3 minutes. As a 
result, AM failover can be a very slow journey.

I think we can optimize AM launcher in two ways:
 # Modify type of ApplicationMasterLauncher#masterEvents from 
LinkedBlockingQueue to PriorityBlockingQueue, keep executing launch operations 
in front of cleanup operations.
 # Check node state first and skip cleanup AM containers on non-existent or 
unusable NM (because these NM probably can't be communicated for a long time) 
before communicating with NM in cleanup process(AMLauncher#cleanup).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9423) Optimize AM launcher to avoid bottleneck when a large number of AM failover happen at the same time

2019-03-28 Thread Tao Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9423:
---
Description: 
We have met a slow recovery for applications after many NM lost:
 # many NM shut down at the same time abnormally.
 # NM expired, then a large number of AM start failover.
 # AM containers are allocated but not launched for about half an hour.

Among this slow recovery, all ApplicationMasterLauncher threads were calling 
cleanup for containers on these lost nodes and keep retrying to communicate 
with NM for 3 minutes(retry policy is configured in NMProxy#createNMProxy) even 
though RM had known these NM are lost and probably can't be connected for a 
long time. Meanwhile many AM cleanup and launch operations were still waiting 
in queue (ApplicationMasterLauncher#masterEvents). Obviously AM launch 
operations were blocked by cleanup operations which are wasting 3 minutes. As a 
result, AM failover can be a very slow journey.

I think we can optimize AM launcher in two ways:
 # Modify type of ApplicationMasterLauncher#masterEvents from 
LinkedBlockingQueue to PriorityBlockingQueue, keep executing launch operations 
in front of cleanup operations.
 # Check node state first and skip cleanup AM containers on non-existent or 
unusable NM (because these NM probably can't be communicated for a long time) 
before communicating with NM in cleanup process(AMLauncher#cleanup).

  was:
We have met a slow recovery for applications when many NM lost happen at the 
same time:
 # many NM shut down at the same time abnormally.
 # NM expired, then a large number of AM start failover.
 # AM containers are allocated but not launched for about half an hour.

Among this slow recovery, all ApplicationMasterLauncher threads were calling 
cleanup for containers on these lost nodes and keep retrying to communicate 
with NM for 3 minutes(retry policy is configured in NMProxy#createNMProxy) even 
though RM had known these NM are lost and probably can't be connected for a 
long time. Meanwhile many AM cleanup and launch operations were still waiting 
in queue (ApplicationMasterLauncher#masterEvents). Obviously AM launch 
operations were blocked by cleanup operations which are wasting 3 minutes. As a 
result, AM failover can be a very slow journey.

I think we can optimize AM launcher in two ways:
 # Modify type of ApplicationMasterLauncher#masterEvents from 
LinkedBlockingQueue to PriorityBlockingQueue, keep executing launch operations 
in front of cleanup operations.
 # Check node state first and skip cleanup AM containers on non-existent or 
unusable NM (because these NM probably can't be communicated for a long time) 
before communicating with NM in cleanup process(AMLauncher#cleanup).


> Optimize AM launcher to avoid bottleneck when a large number of AM failover 
> happen at the same time
> ---
>
> Key: YARN-9423
> URL: https://issues.apache.org/jira/browse/YARN-9423
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>
> We have met a slow recovery for applications after many NM lost:
>  # many NM shut down at the same time abnormally.
>  # NM expired, then a large number of AM start failover.
>  # AM containers are allocated but not launched for about half an hour.
> Among this slow recovery, all ApplicationMasterLauncher threads were calling 
> cleanup for containers on these lost nodes and keep retrying to communicate 
> with NM for 3 minutes(retry policy is configured in NMProxy#createNMProxy) 
> even though RM had known these NM are lost and probably can't be connected 
> for a long time. Meanwhile many AM cleanup and launch operations were still 
> waiting in queue (ApplicationMasterLauncher#masterEvents). Obviously AM 
> launch operations were blocked by cleanup operations which are wasting 3 
> minutes. As a result, AM failover can be a very slow journey.
> I think we can optimize AM launcher in two ways:
>  # Modify type of ApplicationMasterLauncher#masterEvents from 
> LinkedBlockingQueue to PriorityBlockingQueue, keep executing launch 
> operations in front of cleanup operations.
>  # Check node state first and skip cleanup AM containers on non-existent or 
> unusable NM (because these NM probably can't be communicated for a long time) 
> before communicating with NM in cleanup process(AMLauncher#cleanup).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9423) Optimize AM launcher to avoid bottleneck when a large number of AM failover happen at the same time

2019-03-28 Thread Tao Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9423:
---
Description: 
We have met a slow recovery for applications after many NM lost:
 # many NM shut down at the same time abnormally.
 # NM expired, then a large number of AM start failover.
 # AM containers were allocated but not launched and last for about half an 
hour.

Among this slow recovery, all ApplicationMasterLauncher threads were calling 
cleanup for containers on these lost nodes and keep retrying to communicate 
with NM for 3 minutes(retry policy is configured in NMProxy#createNMProxy) even 
though RM had known these NM are lost and probably can't be connected for a 
long time. Meanwhile many AM cleanup and launch operations were still waiting 
in queue (ApplicationMasterLauncher#masterEvents). Obviously AM launch 
operations were blocked by cleanup operations which are wasting 3 minutes. As a 
result, AM failover can be a very slow journey.

I think we can optimize AM launcher in two ways:
 # Modify type of ApplicationMasterLauncher#masterEvents from 
LinkedBlockingQueue to PriorityBlockingQueue, keep executing launch operations 
in front of cleanup operations.
 # Check node state first and skip cleanup AM containers on non-existent or 
unusable NM (because these NM probably can't be communicated for a long time) 
before communicating with NM in cleanup process(AMLauncher#cleanup).

  was:
We have met a slow recovery for applications after many NM lost:
 # many NM shut down at the same time abnormally.
 # NM expired, then a large number of AM start failover.
 # AM containers are allocated but not launched for about half an hour.

Among this slow recovery, all ApplicationMasterLauncher threads were calling 
cleanup for containers on these lost nodes and keep retrying to communicate 
with NM for 3 minutes(retry policy is configured in NMProxy#createNMProxy) even 
though RM had known these NM are lost and probably can't be connected for a 
long time. Meanwhile many AM cleanup and launch operations were still waiting 
in queue (ApplicationMasterLauncher#masterEvents). Obviously AM launch 
operations were blocked by cleanup operations which are wasting 3 minutes. As a 
result, AM failover can be a very slow journey.

I think we can optimize AM launcher in two ways:
 # Modify type of ApplicationMasterLauncher#masterEvents from 
LinkedBlockingQueue to PriorityBlockingQueue, keep executing launch operations 
in front of cleanup operations.
 # Check node state first and skip cleanup AM containers on non-existent or 
unusable NM (because these NM probably can't be communicated for a long time) 
before communicating with NM in cleanup process(AMLauncher#cleanup).


> Optimize AM launcher to avoid bottleneck when a large number of AM failover 
> happen at the same time
> ---
>
> Key: YARN-9423
> URL: https://issues.apache.org/jira/browse/YARN-9423
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>
> We have met a slow recovery for applications after many NM lost:
>  # many NM shut down at the same time abnormally.
>  # NM expired, then a large number of AM start failover.
>  # AM containers were allocated but not launched and last for about half an 
> hour.
> Among this slow recovery, all ApplicationMasterLauncher threads were calling 
> cleanup for containers on these lost nodes and keep retrying to communicate 
> with NM for 3 minutes(retry policy is configured in NMProxy#createNMProxy) 
> even though RM had known these NM are lost and probably can't be connected 
> for a long time. Meanwhile many AM cleanup and launch operations were still 
> waiting in queue (ApplicationMasterLauncher#masterEvents). Obviously AM 
> launch operations were blocked by cleanup operations which are wasting 3 
> minutes. As a result, AM failover can be a very slow journey.
> I think we can optimize AM launcher in two ways:
>  # Modify type of ApplicationMasterLauncher#masterEvents from 
> LinkedBlockingQueue to PriorityBlockingQueue, keep executing launch 
> operations in front of cleanup operations.
>  # Check node state first and skip cleanup AM containers on non-existent or 
> unusable NM (because these NM probably can't be communicated for a long time) 
> before communicating with NM in cleanup process(AMLauncher#cleanup).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9413) Queue resource leak after app fail for CapacityScheduler

2019-03-28 Thread Tao Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9413:
---
Attachment: YARN-9413.002.patch

> Queue resource leak after app fail for CapacityScheduler
> 
>
> Key: YARN-9413
> URL: https://issues.apache.org/jira/browse/YARN-9413
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.1.2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9413.001.patch, YARN-9413.002.patch
>
>
> To reproduce this problem:
>  # Submit an app which is configured to keep containers across app attempts 
> and should fail after AM finished at first time (am-max-attempts=1).
>  # App is started with 2 containers running on NM1 node.
>  # Fail the AM of the application with PREEMPTED exit status which should not 
> count towards max attempt retry but app will fail immediately.
>  # Used resource of this queue leaks after app fail.
> The root cause is the inconsistency of handling app attempt failure between 
> RMAppAttemptImpl$BaseFinalTransition#transition and 
> RMAppImpl$AttemptFailedTransition#transition:
>  # After app fail, RMAppFailedAttemptEvent will be sent in 
> RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM 
> container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it 
> will not count towards max attempt retry, so that it will send 
> AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and 
> RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true.
>  # RMAppImpl$AttemptFailedTransition#transition handle 
> RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1.
>  # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in 
> CapcityScheduler#doneApplicationAttempt, it will skip killing and calling 
> completion process for containers belong to this app, so that queue resource 
> leak happens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler

2019-03-28 Thread Tao Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803905#comment-16803905
 ] 

Tao Yang commented on YARN-9413:


Attached v2 patch to fix compile error and update name of new test case from 
testQueueResourceLeakForCapacityScheduler to testQueueResourceDoesNotLeak.

> Queue resource leak after app fail for CapacityScheduler
> 
>
> Key: YARN-9413
> URL: https://issues.apache.org/jira/browse/YARN-9413
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.1.2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9413.001.patch, YARN-9413.002.patch
>
>
> To reproduce this problem:
>  # Submit an app which is configured to keep containers across app attempts 
> and should fail after AM finished at first time (am-max-attempts=1).
>  # App is started with 2 containers running on NM1 node.
>  # Fail the AM of the application with PREEMPTED exit status which should not 
> count towards max attempt retry but app will fail immediately.
>  # Used resource of this queue leaks after app fail.
> The root cause is the inconsistency of handling app attempt failure between 
> RMAppAttemptImpl$BaseFinalTransition#transition and 
> RMAppImpl$AttemptFailedTransition#transition:
>  # After app fail, RMAppFailedAttemptEvent will be sent in 
> RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM 
> container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it 
> will not count towards max attempt retry, so that it will send 
> AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and 
> RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true.
>  # RMAppImpl$AttemptFailedTransition#transition handle 
> RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1.
>  # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in 
> CapcityScheduler#doneApplicationAttempt, it will skip killing and calling 
> completion process for containers belong to this app, so that queue resource 
> leak happens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer

2019-03-28 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803906#comment-16803906
 ] 

Hadoop QA commented on YARN-9270:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 0 new + 33 unchanged - 5 fixed = 33 total (was 38) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 41s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
46s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 68m 35s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9270 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12964022/YARN-9270-005.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 25ae00ca6dc6 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 15d38b1 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23824/testReport/ |
| Max. process+thread count | 447 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23824/console |
| Powered by | A

[jira] [Commented] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code

2019-03-28 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803913#comment-16803913
 ] 

Szilard Nemeth commented on YARN-9214:
--

Hi [~jiwq]! 

I can help review this. 

Could you please rebase the patch onto trunk? There's a merge conflict.

Thanks!

> Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code 
> --
>
> Key: YARN-9214
> URL: https://issues.apache.org/jira/browse/YARN-9214
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.1.0, 3.2.0, 2.9.2, 3.0.3, 2.8.5
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9214.001.patch
>
>
> *AbstractYarnScheduler#moveAllApps* and 
> *AbstractYarnScheduler#killAllAppsInQueue* had the same code segment. So I 
> think we need a method to handle it named 
> *AbstractYarnScheduler#getValidQueues*. Apart from this we need add the doc 
> comment to expound why exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Deleted] (YARN-9422) Simon poortman

2019-03-28 Thread Steve Loughran (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran deleted YARN-9422:
-


> Simon poortman
> --
>
> Key: YARN-9422
> URL: https://issues.apache.org/jira/browse/YARN-9422
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Simon Poortman
>Priority: Major
>  Labels: Beste
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9235) If linux container executor is not set for a GPU cluster GpuResourceHandlerImpl is not initialized and NPE is thrown

2019-03-28 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803934#comment-16803934
 ] 

Hadoop QA commented on YARN-9235:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 20s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
47s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9235 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12964024/YARN-9235.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 92a3073ea089 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 15d38b1 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/23825/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23825/testReport/ |
| Max. process+thread count | 446 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U:

[jira] [Updated] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code

2019-03-28 Thread Wanqiang Ji (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wanqiang Ji updated YARN-9214:
--
Attachment: YARN-9214.002.patch

> Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code 
> --
>
> Key: YARN-9214
> URL: https://issues.apache.org/jira/browse/YARN-9214
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.1.0, 3.2.0, 2.9.2, 3.0.3, 2.8.5
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9214.001.patch, YARN-9214.002.patch
>
>
> *AbstractYarnScheduler#moveAllApps* and 
> *AbstractYarnScheduler#killAllAppsInQueue* had the same code segment. So I 
> think we need a method to handle it named 
> *AbstractYarnScheduler#getValidQueues*. Apart from this we need add the doc 
> comment to expound why exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9418) ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics

2019-03-28 Thread Prabhu Joseph (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9418:

Attachment: YARN-9418-001.patch

> ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics
> ---
>
> Key: YARN-9418
> URL: https://issues.apache.org/jira/browse/YARN-9418
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Critical
> Attachments: YARN-9418-001.patch
>
>
> ATSV2 entities rest api does not show the metrics
> {code:java}
> [hbase@yarn-ats-3 centos]$ curl -s 
> "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase&fields=METRICS";
>  | jq .
> {
> "metrics": [],
> "events": [],
> "createdtime": 1553695002014,
> "idprefix": 0,
> "type": "YARN_CONTAINER",
> "id": "container_e18_1553685341603_0006_01_01",
> "info": {
> "UID": 
> "ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01",
> "FROM_ID": 
> "ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01"
> },
> "configs": {},
> "isrelatedto": {},
> "relatesto": {}
> }{code}
> NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this 
> is not shown in above output. Found NM container entities are set with 
> entityIdPrefix as inverted container starttime whereas RM container entities 
> are set with default 0. TimelineReader fetches only RM container entries.
> Confirmed with setting NM container entities entityIdPrefix to 0 same as RM 
> (for testing purpose) and found metrics are shown.
> {code:java}
> "metrics": [
> {
> "type": "SINGLE_VALUE",
> "id": "MEMORY",
> "aggregationOp": "NOP",
> "values": {
> "1553774981355": 490430464
> }
> },
> {
> "type": "SINGLE_VALUE",
> "id": "CPU",
> "aggregationOp": "NOP",
> "values": {
> "1553774981355": 5
> }
> }
> ]{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9418) ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics

2019-03-28 Thread Prabhu Joseph (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9418:

Issue Type: Sub-task  (was: Bug)
Parent: YARN-7055

> ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics
> ---
>
> Key: YARN-9418
> URL: https://issues.apache.org/jira/browse/YARN-9418
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Critical
> Attachments: YARN-9418-001.patch
>
>
> ATSV2 entities rest api does not show the metrics
> {code:java}
> [hbase@yarn-ats-3 centos]$ curl -s 
> "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase&fields=METRICS";
>  | jq .
> {
> "metrics": [],
> "events": [],
> "createdtime": 1553695002014,
> "idprefix": 0,
> "type": "YARN_CONTAINER",
> "id": "container_e18_1553685341603_0006_01_01",
> "info": {
> "UID": 
> "ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01",
> "FROM_ID": 
> "ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01"
> },
> "configs": {},
> "isrelatedto": {},
> "relatesto": {}
> }{code}
> NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this 
> is not shown in above output. Found NM container entities are set with 
> entityIdPrefix as inverted container starttime whereas RM container entities 
> are set with default 0. TimelineReader fetches only RM container entries.
> Confirmed with setting NM container entities entityIdPrefix to 0 same as RM 
> (for testing purpose) and found metrics are shown.
> {code:java}
> "metrics": [
> {
> "type": "SINGLE_VALUE",
> "id": "MEMORY",
> "aggregationOp": "NOP",
> "values": {
> "1553774981355": 490430464
> }
> },
> {
> "type": "SINGLE_VALUE",
> "id": "CPU",
> "aggregationOp": "NOP",
> "values": {
> "1553774981355": 5
> }
> }
> ]{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler

2019-03-28 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804047#comment-16804047
 ] 

Hadoop QA commented on YARN-9413:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 30s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 163 unchanged - 1 fixed = 164 total (was 164) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 35s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 76m  
0s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}121m 36s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9413 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12964030/YARN-9413.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c0634f52bf01 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 15d38b1 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/23826/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23826/testReport/ |
| Max. process+thread count | 926 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-se

[jira] [Commented] (YARN-9418) ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics

2019-03-28 Thread Prabhu Joseph (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804054#comment-16804054
 ] 

Prabhu Joseph commented on YARN-9418:
-

Have used inverted ContainerId#getContainerId as entityIdPrefix for 
YARN_CONTAINER entities for both NM and RM. TimelineReader fetches 
ContainerEntity from both NM and RM.
{code:java}
[hbase@yarn-ats-3 BACKUP]$ curl -s 
"http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553788280931_0001/entities/YARN_CONTAINER/container_e30_1553788280931_0001_01_01?user.name=hbase&fields=ALL";
 | jq .
{
"metrics": [
{
"type": "SINGLE_VALUE",
"id": "CPU",
"aggregationOp": "NOP",
"values": {
"1553788338798": 2
}
},
{
"type": "SINGLE_VALUE",
"id": "MEMORY",
"aggregationOp": "NOP",
"values": {
"1553788338798": 488013824
}
}
],
"events": [
{
"id": "YARN_RM_CONTAINER_FINISHED",
"timestamp": 1553788341158,
"info": {}
},
{
"id": "YARN_CONTAINER_FINISHED",
"timestamp": 1553788341151,
"info": {}
},
{
"id": "YARN_NM_CONTAINER_LOCALIZATION_FINISHED",
"timestamp": 1553788316027,
"info": {}
},
{
"id": "YARN_NM_CONTAINER_LOCALIZATION_STARTED",
"timestamp": 1553788315294,
"info": {}
},
{
"id": "YARN_CONTAINER_CREATED",
"timestamp": 1553788315284,
"info": {}
},
{
"id": "YARN_RM_CONTAINER_CREATED",
"timestamp": 1553788314339,
"info": {}
}
],
"createdtime": 1553788314809,
"idprefix": 9223339051505943000,
"info": {
"YARN_CONTAINER_STATE": "COMPLETE",
"YARN_CONTAINER_ALLOCATED_HOST": "yarn-ats-1",
"YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS": "yarn-ats-1:8042",
"YARN_CONTAINER_ALLOCATED_VCORE": 1,
"FROM_ID": "ats!hbase!word 
count!1553788313450!application_1553788280931_0001!YARN_CONTAINER!9223339051505942526!container_e30_1553788280931_0001_01_01",
"YARN_CONTAINER_ALLOCATED_PORT": 45454,
"UID": 
"ats!application_1553788280931_0001!YARN_CONTAINER!9223339051505942526!container_e30_1553788280931_0001_01_01",
"YARN_CONTAINER_ALLOCATED_MEMORY": 2048,
"SYSTEM_INFO_PARENT_ENTITY": {
"type": "YARN_APPLICATION_ATTEMPT",
"id": "appattempt_1553788280931_0001_01"
},
"YARN_CONTAINER_EXIT_STATUS": 0,
"YARN_CONTAINER_ALLOCATED_PRIORITY": "0",
"YARN_CONTAINER_DIAGNOSTICS_INFO": "",
"YARN_CONTAINER_FINISHED_TIME": 1553788341151
},
"configs": {},
"isrelatedto": {},
"relatesto": {},
"id": "container_e30_1553788280931_0001_01_01",
"type": "YARN_CONTAINER"
}{code}

> ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics
> ---
>
> Key: YARN-9418
> URL: https://issues.apache.org/jira/browse/YARN-9418
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Critical
> Attachments: YARN-9418-001.patch
>
>
> ATSV2 entities rest api does not show the metrics
> {code:java}
> [hbase@yarn-ats-3 centos]$ curl -s 
> "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase&fields=METRICS";
>  | jq .
> {
> "metrics": [],
> "events": [],
> "createdtime": 1553695002014,
> "idprefix": 0,
> "type": "YARN_CONTAINER",
> "id": "container_e18_1553685341603_0006_01_01",
> "info": {
> "UID": 
> "ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01",
> "FROM_ID": 
> "ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01"
> },
> "configs": {},
> "isrelatedto": {},
> "relatesto": {}
> }{code}
> NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this 
> is not shown in above output. Found NM container entities are set with 
> entityIdPrefix as inverted container starttime whereas RM container entities 
> are set with default 0. TimelineReader fetches only RM container entries.
> Confirmed with setting NM container entities entityIdPrefix to 0 same as RM 
> (for testing purpose) and found metrics are shown.
> {code:java}
> "metrics": [
> {
> "type": "SINGLE_VALUE",
> "id": "MEMORY",
> "aggregationOp": "NOP",
> "values": {
> "1553774981355": 490430464
> }
> },
> {
> "type": "SINGLE_VALUE",
> "id": "CPU",
> "aggregationOp": "NOP",
> "values": {
> "1553774981355": 5
> }
> }
> ]{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler

2019-03-28 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804056#comment-16804056
 ] 

Weiwei Yang commented on YARN-9413:
---

LGTM, +1 once the remaining checkstyle issue is fixed. [~Tao Yang], could you 
pls fix that?

> Queue resource leak after app fail for CapacityScheduler
> 
>
> Key: YARN-9413
> URL: https://issues.apache.org/jira/browse/YARN-9413
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.1.2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9413.001.patch, YARN-9413.002.patch
>
>
> To reproduce this problem:
>  # Submit an app which is configured to keep containers across app attempts 
> and should fail after AM finished at first time (am-max-attempts=1).
>  # App is started with 2 containers running on NM1 node.
>  # Fail the AM of the application with PREEMPTED exit status which should not 
> count towards max attempt retry but app will fail immediately.
>  # Used resource of this queue leaks after app fail.
> The root cause is the inconsistency of handling app attempt failure between 
> RMAppAttemptImpl$BaseFinalTransition#transition and 
> RMAppImpl$AttemptFailedTransition#transition:
>  # After app fail, RMAppFailedAttemptEvent will be sent in 
> RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM 
> container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it 
> will not count towards max attempt retry, so that it will send 
> AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and 
> RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true.
>  # RMAppImpl$AttemptFailedTransition#transition handle 
> RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1.
>  # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in 
> CapcityScheduler#doneApplicationAttempt, it will skip killing and calling 
> completion process for containers belong to this app, so that queue resource 
> leak happens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp

2019-03-28 Thread Eric Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-9348:

Attachment: YARN-9348.009.patch

> Build issues on hadoop-yarn-application-catalog-webapp
> --
>
> Key: YARN-9348
> URL: https://issues.apache.org/jira/browse/YARN-9348
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9348.001.patch, YARN-9348.002.patch, 
> YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, 
> YARN-9348.006.patch, YARN-9348.007.patch, YARN-9348.008.patch, 
> YARN-9348.009.patch
>
>
> A couple reports jenkins precommit builds are failing due to integration 
> problem between nodejs libraries and Yetus.  Problems are:
> # Nodejs third party libraries are checked by whitespace check, which 
> generates many errors.  One possible solution is to move nodejs libraries 
> placement from project top level directory to target directory to prevent 
> stumble on whitespace checks.
> # maven clean fails because clean plugin tries to remove target directory and 
> files inside target/generated-sources directories to cause race conditions.
> # Building on mac will trigger access to osx keychain to attempt to login to 
> Dockerhub.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp

2019-03-28 Thread Eric Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-9348:

Attachment: (was: YARN-9348.009.patch)

> Build issues on hadoop-yarn-application-catalog-webapp
> --
>
> Key: YARN-9348
> URL: https://issues.apache.org/jira/browse/YARN-9348
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9348.001.patch, YARN-9348.002.patch, 
> YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, 
> YARN-9348.006.patch, YARN-9348.007.patch, YARN-9348.008.patch
>
>
> A couple reports jenkins precommit builds are failing due to integration 
> problem between nodejs libraries and Yetus.  Problems are:
> # Nodejs third party libraries are checked by whitespace check, which 
> generates many errors.  One possible solution is to move nodejs libraries 
> placement from project top level directory to target directory to prevent 
> stumble on whitespace checks.
> # maven clean fails because clean plugin tries to remove target directory and 
> files inside target/generated-sources directories to cause race conditions.
> # Building on mac will trigger access to osx keychain to attempt to login to 
> Dockerhub.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp

2019-03-28 Thread Eric Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-9348:

Attachment: YARN-9348.009.patch

> Build issues on hadoop-yarn-application-catalog-webapp
> --
>
> Key: YARN-9348
> URL: https://issues.apache.org/jira/browse/YARN-9348
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9348.001.patch, YARN-9348.002.patch, 
> YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, 
> YARN-9348.006.patch, YARN-9348.007.patch, YARN-9348.008.patch, 
> YARN-9348.009.patch
>
>
> A couple reports jenkins precommit builds are failing due to integration 
> problem between nodejs libraries and Yetus.  Problems are:
> # Nodejs third party libraries are checked by whitespace check, which 
> generates many errors.  One possible solution is to move nodejs libraries 
> placement from project top level directory to target directory to prevent 
> stumble on whitespace checks.
> # maven clean fails because clean plugin tries to remove target directory and 
> files inside target/generated-sources directories to cause race conditions.
> # Building on mac will trigger access to osx keychain to attempt to login to 
> Dockerhub.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9318) Resources#multiplyAndRoundUp does not consider Resource Types

2019-03-28 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804075#comment-16804075
 ] 

Szilard Nemeth commented on YARN-9318:
--

Hi [~sunilg]!

Do you think branh-3.2 / branch-3.1 patches are needed? If so, I will upload 
them.

Thanks!

> Resources#multiplyAndRoundUp does not consider Resource Types
> -
>
> Key: YARN-9318
> URL: https://issues.apache.org/jira/browse/YARN-9318
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9318.001.patch, YARN-9318.002.patch, 
> YARN-9318.003.patch, YARN-9318.004.patch, YARN-9318.005.patch
>
>
> org.apache.hadoop.yarn.util.resource.Resources#multiplyAndRoundUp only deals 
> with memory and vcores while computing the rounded value. It should also 
> consider custom Resource Types as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9322) Store metrics for custom resource types into FSQueueMetrics and query them in FairSchedulerQueueInfo

2019-03-28 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804077#comment-16804077
 ] 

Szilard Nemeth commented on YARN-9322:
--

Hi [~sunilg]!

Do you think branh-3.2 / branch-3.1 patches are needed? If so, I will upload 
them.

Thanks!

> Store metrics for custom resource types into FSQueueMetrics and query them in 
> FairSchedulerQueueInfo
> 
>
> Key: YARN-9322
> URL: https://issues.apache.org/jira/browse/YARN-9322
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: Screen Shot 2019-02-21 at 12.06.46.png, 
> YARN-9322.001.patch, YARN-9322.002.patch, YARN-9322.003.patch, 
> YARN-9322.004.patch, YARN-9322.005.patch, YARN-9322.006.patch
>
>
> YARN-8842 implemented storing and exposing of metrics of custom resources.
> FSQueueMetrics should have a similar implementation.
> All metrics stored in this class should have their custom resource 
> counterpart.
> In a consequence of metrics were not stored for custom resource type, 
> FairSchedulerQueueInfo haven't contained those values therefore the UI v1 
> could not show them, obviously. 
> See that gpu is missing from the value of  "AM Max Resources" on the attached 
> screenshot.
> Additionally, the callees of the following methods (in class 
> FairSchedulerQueueInfo) should consider to query values for custom resource 
> types too: 
> getMaxAMShareMB
> getMaxAMShareVCores
> getAMResourceUsageMB
> getAMResourceUsageVCores



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9323) FSLeafQueue#computeMaxAMResource does not override zero values for custom resources

2019-03-28 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804081#comment-16804081
 ] 

Szilard Nemeth commented on YARN-9323:
--

Hi [~sunilg]!

Do you think branh-3.2 / branch-3.1 patches are needed? If so, I will upload 
them.

Thanks!

> FSLeafQueue#computeMaxAMResource does not override zero values for custom 
> resources
> ---
>
> Key: YARN-9323
> URL: https://issues.apache.org/jira/browse/YARN-9323
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9323.001.patch, YARN-9323.002.patch, 
> YARN-9323.003.patch, YARN-9323.004.patch, YARN-9323.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp

2019-03-28 Thread Eric Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-9348:

Attachment: YARN-9348.010.patch

> Build issues on hadoop-yarn-application-catalog-webapp
> --
>
> Key: YARN-9348
> URL: https://issues.apache.org/jira/browse/YARN-9348
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9348.001.patch, YARN-9348.002.patch, 
> YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, 
> YARN-9348.006.patch, YARN-9348.007.patch, YARN-9348.008.patch, 
> YARN-9348.009.patch, YARN-9348.010.patch
>
>
> A couple reports jenkins precommit builds are failing due to integration 
> problem between nodejs libraries and Yetus.  Problems are:
> # Nodejs third party libraries are checked by whitespace check, which 
> generates many errors.  One possible solution is to move nodejs libraries 
> placement from project top level directory to target directory to prevent 
> stumble on whitespace checks.
> # maven clean fails because clean plugin tries to remove target directory and 
> files inside target/generated-sources directories to cause race conditions.
> # Building on mac will trigger access to osx keychain to attempt to login to 
> Dockerhub.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp

2019-03-28 Thread Eric Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804095#comment-16804095
 ] 

Eric Yang commented on YARN-9348:
-

Patch 9 is broken during patch generation.
Patch 10 is YARN-7129 patch 035 + YARN-9348 patch 08 combined to work around a 
Yetus bug that it doesn't compute findbugs report correctly for newly added 
submodule (YETUS-825).
This ensures the precommit build test changes in YARN-9348 patch 08.  When 
commiting, YARN-7129 and YARN-9348 will commit separately to track the 
requested changes.

> Build issues on hadoop-yarn-application-catalog-webapp
> --
>
> Key: YARN-9348
> URL: https://issues.apache.org/jira/browse/YARN-9348
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9348.001.patch, YARN-9348.002.patch, 
> YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, 
> YARN-9348.006.patch, YARN-9348.007.patch, YARN-9348.008.patch, 
> YARN-9348.009.patch, YARN-9348.010.patch
>
>
> A couple reports jenkins precommit builds are failing due to integration 
> problem between nodejs libraries and Yetus.  Problems are:
> # Nodejs third party libraries are checked by whitespace check, which 
> generates many errors.  One possible solution is to move nodejs libraries 
> placement from project top level directory to target directory to prevent 
> stumble on whitespace checks.
> # maven clean fails because clean plugin tries to remove target directory and 
> files inside target/generated-sources directories to cause race conditions.
> # Building on mac will trigger access to osx keychain to attempt to login to 
> Dockerhub.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code

2019-03-28 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804215#comment-16804215
 ] 

Hadoop QA commented on YARN-9214:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 55s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}124m 14s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9214 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12964045/YARN-9214.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 5a7c17f28237 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / df578c0 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/23827/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23827/testReport/ |
| Max. process+thread count | 899 (vs. ulimit of 1) |
| mod

[jira] [Commented] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code

2019-03-28 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804244#comment-16804244
 ] 

Szilard Nemeth commented on YARN-9214:
--

Hi [~jiwq]!

This is a nice refactor, in overall!

Here are my comments: 
 # Imports changes are confusing, I guess you Organized imports with your IDE. 
Could you please revert those changes and only add the new imports to the 
bottom?
 # The name of the extracted method "getValidQueues" is misleading: The method 
is not about getting valid queues, it returns apps belong to the specified 
queue. I would rename it to getAppsFromQueue() or something like that. 
 # I know you just extracted the method, but I think it's okay to modify the 
error message a bit to: "The specified queue: " + queueName + " does not 
exist!" (queue with lowercase Q in the beginning, does not instead of doesn't)

Apart from these, I'm okay with the patch!

Thanks!

> Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code 
> --
>
> Key: YARN-9214
> URL: https://issues.apache.org/jira/browse/YARN-9214
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.1.0, 3.2.0, 2.9.2, 3.0.3, 2.8.5
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9214.001.patch, YARN-9214.002.patch
>
>
> *AbstractYarnScheduler#moveAllApps* and 
> *AbstractYarnScheduler#killAllAppsInQueue* had the same code segment. So I 
> think we need a method to handle it named 
> *AbstractYarnScheduler#getValidQueues*. Apart from this we need add the doc 
> comment to expound why exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9418) ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics

2019-03-28 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804280#comment-16804280
 ] 

Hadoop QA commented on YARN-9418:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 43s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 59s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
48s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 52s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}154m 32s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9418 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12964046/YARN-9418-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ad21946ee925 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| g

[jira] [Commented] (YARN-9227) DistributedShell RelativePath is not removed at end

2019-03-28 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804284#comment-16804284
 ] 

Szilard Nemeth commented on YARN-9227:
--

Hi [~Prabhu Joseph]!

Some comments: 

In the method "testDistributedShellCleanup": 
 # Result boolean variable is not used, can be removed.
 # There's a while loop in the end: How can you be sure that it won't run 
infinitely?
 # You don't need the continue statement at the end of the while-loop
 # In the assertion, the condition is negated ( {{!fs1.exists(path)}}). Use 
assertFalse instead, without negating the fs1.exists(path) call. 
 # In the assertion, I would replace "Fails" with "failed". 

Thanks!

> DistributedShell RelativePath is not removed at end
> ---
>
> Key: YARN-9227
> URL: https://issues.apache.org/jira/browse/YARN-9227
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-shell
>Affects Versions: 3.1.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: 0001-YARN-9227.patch, 0002-YARN-9227.patch, 
> 0003-YARN-9227.patch
>
>
> DistributedShell Job does not remove the relative path which contains jars 
> and localized files.
> {code}
> [ambari-qa@ash hadoop-yarn]$ hadoop fs -ls 
> /user/ambari-qa/DistributedShell/application_1542665708563_0017
> Found 2 items
> -rw-r--r--   3 ambari-qa hdfs  46636 2019-01-23 13:37 
> /user/ambari-qa/DistributedShell/application_1542665708563_0017/AppMaster.jar
> -rwx--x---   3 ambari-qa hdfs  4 2019-01-23 13:37 
> /user/ambari-qa/DistributedShell/application_1542665708563_0017/shellCommands
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8943) Upgrade JUnit from 4 to 5 in hadoop-yarn-api

2019-03-28 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804298#comment-16804298
 ] 

Szilard Nemeth commented on YARN-8943:
--

Hi [~ajisakaa]!

I reviewed your changes, +1 (non-binding).

Next time, please avoid to change things not strictly related to the patch, 
e.g.: 
 * Changing method visibilities
 * Reformatting the code
 * Any other unnecessary change

These can be solved in a follow-up jira, but in this case, just makes the 
review process harder.

This is especially true if you introduce junit5 into some bigger Maven modules 
than the hadoop-yarn-api project (I saw you have this in YARN-6946).

I do note that the order of the parameters needs to be changed for the assertXX 
methods, as the message string is became the last parameter in junit5 as 
opposed to junit4.

Btw, after this is merged, are you planning to proceed with YARN-6946?

As I said, if you include the changes absolutely required to introduce junit5 
into that project, the more likely it's easier to review.

A second thought, just out of curiosity: Did you use some junit4 to junit5 
migration script or did you need to adjust the order of the parameters by hand?

Thanks! 

> Upgrade JUnit from 4 to 5 in hadoop-yarn-api
> 
>
> Key: YARN-8943
> URL: https://issues.apache.org/jira/browse/YARN-8943
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
> Attachments: YARN-8943.01.patch, YARN-8943.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9418) ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics

2019-03-28 Thread Giovanni Matteo Fumarola (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804301#comment-16804301
 ] 

Giovanni Matteo Fumarola commented on YARN-9418:


Thanks [~Prabhu Joseph] for the patch.
I think the failed test is related. 

> ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics
> ---
>
> Key: YARN-9418
> URL: https://issues.apache.org/jira/browse/YARN-9418
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Critical
> Attachments: YARN-9418-001.patch
>
>
> ATSV2 entities rest api does not show the metrics
> {code:java}
> [hbase@yarn-ats-3 centos]$ curl -s 
> "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase&fields=METRICS";
>  | jq .
> {
> "metrics": [],
> "events": [],
> "createdtime": 1553695002014,
> "idprefix": 0,
> "type": "YARN_CONTAINER",
> "id": "container_e18_1553685341603_0006_01_01",
> "info": {
> "UID": 
> "ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01",
> "FROM_ID": 
> "ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01"
> },
> "configs": {},
> "isrelatedto": {},
> "relatesto": {}
> }{code}
> NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this 
> is not shown in above output. Found NM container entities are set with 
> entityIdPrefix as inverted container starttime whereas RM container entities 
> are set with default 0. TimelineReader fetches only RM container entries.
> Confirmed with setting NM container entities entityIdPrefix to 0 same as RM 
> (for testing purpose) and found metrics are shown.
> {code:java}
> "metrics": [
> {
> "type": "SINGLE_VALUE",
> "id": "MEMORY",
> "aggregationOp": "NOP",
> "values": {
> "1553774981355": 490430464
> }
> },
> {
> "type": "SINGLE_VALUE",
> "id": "CPU",
> "aggregationOp": "NOP",
> "values": {
> "1553774981355": 5
> }
> }
> ]{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler

2019-03-28 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804308#comment-16804308
 ] 

Szilard Nemeth commented on YARN-9413:
--

Hi [~Tao Yang]!

Apart from the checkstyle issue, I found some others: 
1. Could you please use javadoc instead of simple comments to document 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart#testQueueResourceDoesNotLeak
 ? 
2. Instead of calling getConf(), can you use the field "conf" directly? I can 
see calling getConf() is the way how other tests are working as well, but I 
feel it a bit weird.
3. There's this call: 
{code:java}
Assert.assertTrue(!attempt1.shouldCountTowardsMaxAttemptRetry()); {code}

You should change it to Assert.assertFalse, so that you don't need to use 
negation!

> Queue resource leak after app fail for CapacityScheduler
> 
>
> Key: YARN-9413
> URL: https://issues.apache.org/jira/browse/YARN-9413
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.1.2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9413.001.patch, YARN-9413.002.patch
>
>
> To reproduce this problem:
>  # Submit an app which is configured to keep containers across app attempts 
> and should fail after AM finished at first time (am-max-attempts=1).
>  # App is started with 2 containers running on NM1 node.
>  # Fail the AM of the application with PREEMPTED exit status which should not 
> count towards max attempt retry but app will fail immediately.
>  # Used resource of this queue leaks after app fail.
> The root cause is the inconsistency of handling app attempt failure between 
> RMAppAttemptImpl$BaseFinalTransition#transition and 
> RMAppImpl$AttemptFailedTransition#transition:
>  # After app fail, RMAppFailedAttemptEvent will be sent in 
> RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM 
> container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it 
> will not count towards max attempt retry, so that it will send 
> AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and 
> RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true.
>  # RMAppImpl$AttemptFailedTransition#transition handle 
> RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1.
>  # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in 
> CapcityScheduler#doneApplicationAttempt, it will skip killing and calling 
> completion process for containers belong to this app, so that queue resource 
> leak happens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8943) Upgrade JUnit from 4 to 5 in hadoop-yarn-api

2019-03-28 Thread Akira Ajisaka (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804312#comment-16804312
 ] 

Akira Ajisaka commented on YARN-8943:
-

Thanks [~snemeth] for the review!

bq. Next time, please avoid to change things not strictly related to the patch
I got it. I'll solve these things in the follow-up jiras.

bq. Btw, after this is merged, are you planning to proceed with YARN-6946?
Yes, I'd like to migrate per module.

bq. Did you use some junit4 to junit5 migration script or did you need to 
adjust the order of the parameters by hand?
I did the change by hand. In the next jira, I'd like to write a script for the 
migration.

> Upgrade JUnit from 4 to 5 in hadoop-yarn-api
> 
>
> Key: YARN-8943
> URL: https://issues.apache.org/jira/browse/YARN-8943
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
> Attachments: YARN-8943.01.patch, YARN-8943.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8943) Upgrade JUnit from 4 to 5 in hadoop-yarn-api

2019-03-28 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804317#comment-16804317
 ] 

Szilard Nemeth commented on YARN-8943:
--

Or maybe we could use this one before doing the migration from junit4 to juni5 
on other projects?

[http://joel-costigliola.github.io/assertj/assertj-core-converting-junit-assertions-to-assertj.html]

AssertJ has much cleaner / better syntax and produces more readable error 
messages, anyways.

> Upgrade JUnit from 4 to 5 in hadoop-yarn-api
> 
>
> Key: YARN-8943
> URL: https://issues.apache.org/jira/browse/YARN-8943
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
> Attachments: YARN-8943.01.patch, YARN-8943.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-8943) Upgrade JUnit from 4 to 5 in hadoop-yarn-api

2019-03-28 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804317#comment-16804317
 ] 

Szilard Nemeth edited comment on YARN-8943 at 3/28/19 9:26 PM:
---

Or maybe we could use this one before doing the migration from junit4 to juni5 
on other projects?

[http://joel-costigliola.github.io/assertj/assertj-core-converting-junit-assertions-to-assertj.html]

AssertJ has much cleaner / better syntax and produces more readable error 
messages, anyways.

Regarding your current patch: If you need help finding committers, I think 
[~sunilg] or [~tangzhankun] can help as we have a +1 already from me. 

Thanks!


was (Author: snemeth):
Or maybe we could use this one before doing the migration from junit4 to juni5 
on other projects?

[http://joel-costigliola.github.io/assertj/assertj-core-converting-junit-assertions-to-assertj.html]

AssertJ has much cleaner / better syntax and produces more readable error 
messages, anyways.

> Upgrade JUnit from 4 to 5 in hadoop-yarn-api
> 
>
> Key: YARN-8943
> URL: https://issues.apache.org/jira/browse/YARN-8943
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
> Attachments: YARN-8943.01.patch, YARN-8943.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer

2019-03-28 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804319#comment-16804319
 ] 

Szilard Nemeth commented on YARN-9270:
--

Hi [~pbacsko]!

Do you need review on the latest patch?

> Minor cleanup in TestFpgaDiscoverer
> ---
>
> Key: YARN-9270
> URL: https://issues.apache.org/jira/browse/YARN-9270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9270-001.patch, YARN-9270-002.patch, 
> YARN-9270-003.patch, YARN-9270-004.patch, YARN-9270-005.patch
>
>
> Let's do some cleanup in this class.
> * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split 
> up to 5 different tests, because it tests 5 different scenarios.
> * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a 
> {{Function}} in the plugin class like {{Function envProvider 
> = System::getenv()}} plus a setter method which allows the test to modify 
> {{envProvider}}. Much simpler and straightfoward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer

2019-03-28 Thread Peter Bacsko (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804320#comment-16804320
 ] 

Peter Bacsko commented on YARN-9270:


[~snemeth] I'd appreciate comments if you have any.

> Minor cleanup in TestFpgaDiscoverer
> ---
>
> Key: YARN-9270
> URL: https://issues.apache.org/jira/browse/YARN-9270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9270-001.patch, YARN-9270-002.patch, 
> YARN-9270-003.patch, YARN-9270-004.patch, YARN-9270-005.patch
>
>
> Let's do some cleanup in this class.
> * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split 
> up to 5 different tests, because it tests 5 different scenarios.
> * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a 
> {{Function}} in the plugin class like {{Function envProvider 
> = System::getenv()}} plus a setter method which allows the test to modify 
> {{envProvider}}. Much simpler and straightfoward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-7505) RM REST endpoints generate malformed JSON

2019-03-28 Thread Szilard Nemeth (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth reassigned YARN-7505:


Assignee: Szilard Nemeth  (was: Daniel Templeton)

> RM REST endpoints generate malformed JSON
> -
>
> Key: YARN-7505
> URL: https://issues.apache.org/jira/browse/YARN-7505
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: restapi
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Szilard Nemeth
>Priority: Critical
> Attachments: YARN-7505.001.patch, YARN-7505.002.patch
>
>
> For all endpoints that return DAOs that contain maps, the generated JSON is 
> malformed.  For example:
> % curl 'http://localhost:8088/ws/v1/cluster/apps'
> {"apps":{"app":[{"id":"application_1510777276702_0001","user":"daniel","name":"QuasiMonteCarlo","queue":"root.daniel","state":"RUNNING","finalStatus":"UNDEFINED","progress":5.0,"trackingUI":"ApplicationMaster","trackingUrl":"http://dhcp-10-16-0-181.pa.cloudera.com:8088/proxy/application_1510777276702_0001/","diagnostics":"","clusterId":1510777276702,"applicationType":"MAPREDUCE","applicationTags":"","priority":0,"startedTime":1510777317853,"finishedTime":0,"elapsedTime":21623,"amContainerLogs":"http://dhcp-10-16-0-181.pa.cloudera.com:8042/node/containerlogs/container_1510777276702_0001_01_01/daniel","amHostHttpAddress":"dhcp-10-16-0-181.pa.cloudera.com:8042","amRPCAddress":"dhcp-10-16-0-181.pa.cloudera.com:63371","allocatedMB":5120,"allocatedVCores":4,"reservedMB":0,"reservedVCores":0,"runningContainers":4,"memorySeconds":49820,"vcoreSeconds":26,"queueUsagePercentage":62.5,"clusterUsagePercentage":62.5,"resourceSecondsMap":{"entry":{"key":"test2","value":"0"},"entry":{"key":"test","value":"0"},"entry":{"key":"memory-mb","value":"49820"},"entry":{"key":"vcores","value":"26"}},"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0,"preemptedMemorySeconds":0,"preemptedVcoreSeconds":0,"preemptedResourceSecondsMap":{},"resourceRequests":[{"priority":20,"resourceName":"dhcp-10-16-0-181.pa.cloudera.com","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false},{"priority":20,"resourceName":"/default-rack","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false},{"priority":20,"resourceName":"*","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false}],"logAggregationStatus":"DISABLED","unmanagedApplication":false,"amNodeLabelExpression":"","timeouts":{"timeout":[{"type":"LIFETIME","expiryTime":"UNLIMITED","remainingTimeInSeconds":-1}]}}]}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer

2019-03-28 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804340#comment-16804340
 ] 

Szilard Nemeth commented on YARN-9270:
--

Hi [~pbacsko]!

Went through the changes, +1 (non-binding).

This cleanup in the prod code and especially in the test code is pretty neat!

> Minor cleanup in TestFpgaDiscoverer
> ---
>
> Key: YARN-9270
> URL: https://issues.apache.org/jira/browse/YARN-9270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9270-001.patch, YARN-9270-002.patch, 
> YARN-9270-003.patch, YARN-9270-004.patch, YARN-9270-005.patch
>
>
> Let's do some cleanup in this class.
> * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split 
> up to 5 different tests, because it tests 5 different scenarios.
> * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a 
> {{Function}} in the plugin class like {{Function envProvider 
> = System::getenv()}} plus a setter method which allows the test to modify 
> {{envProvider}}. Much simpler and straightfoward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8701) If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1

2019-03-28 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804345#comment-16804345
 ] 

Szilard Nemeth commented on YARN-8701:
--

Hi [~Sen Zhao]!

Patch no longer applies!

Could you please upload a new patch?

Thanks!

> If the single parameter in Resources#createResourceWithSameValue is greater 
> than Integer.MAX_VALUE, then the value of vcores will be -1
> ---
>
> Key: YARN-8701
> URL: https://issues.apache.org/jira/browse/YARN-8701
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Reporter: Sen Zhao
>Assignee: Sen Zhao
>Priority: Major
> Attachments: YARN-8701.001.patch, YARN-8701.002.patch
>
>
> If I configure *MaxResources* in fair-scheduler.xml, like this:
> {code}resource1=50{code}
> In the queue, the *MaxResources* value will change to 
> {code}Max Resources: {code}
> I think the value of VCores should be *CLUSTER_VCORES*.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8503) Add unit test for FINISHED_CONTAINERS_PULLED_BY_AM event on DECOMMISSIONING

2019-03-28 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804347#comment-16804347
 ] 

Szilard Nemeth commented on YARN-8503:
--

Hi [~AmiyaChakraborty]!

The patch file you uploaded is not a real patch file, but it is in HTML format. 
Can you please update a valid one?

Thanks!

> Add unit test for FINISHED_CONTAINERS_PULLED_BY_AM event on DECOMMISSIONING
> ---
>
> Key: YARN-8503
> URL: https://issues.apache.org/jira/browse/YARN-8503
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.7.2
>Reporter: Amiya Chakraborty
>Assignee: Amiya Chakraborty
>Priority: Major
>  Labels: patch-available, yarn
> Attachments: YARN-8503.001.patch, YARN-8503.001.patch
>
>
> Currently, there is no unit test for testing the functionality - 
> FINISHED_CONTAINERS_PULLED_BY_AM event while Decommissioning of node. This 
> patch provides the same to check the AM has pulled the containers from the 
> RM; then the RM will inform the NM about it and the NM can remove the 
> completed container from its list during DECOMMISSIONING.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9196) Attempt started time zone and Application started time zone is different when OS time zone is modified

2019-03-28 Thread Szilard Nemeth (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9196:
-
Description: 
In RM application page, attempt start time is formatted client side (browser), 
but application start time is formatted by the server.

If client time zone and server time zone is different then on the UI, the 
application start time and attempt start time will be in different format.

  was:
In RM application page, attempt start time is formated client side(browser), 
but aplication start time is formated by the server.

If client time zone and server time zone is different then in UI application 
start time and attempt start time will be different format.


> Attempt started time zone and Application started time zone is different when 
> OS time zone is modified
> --
>
> Key: YARN-9196
> URL: https://issues.apache.org/jira/browse/YARN-9196
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-9196-001.patch
>
>
> In RM application page, attempt start time is formatted client side 
> (browser), but application start time is formatted by the server.
> If client time zone and server time zone is different then on the UI, the 
> application start time and attempt start time will be in different format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9196) Attempt started time zone and Application started time zone is different when OS time zone is modified

2019-03-28 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804350#comment-16804350
 ] 

Szilard Nemeth commented on YARN-9196:
--

Hi [~BilwaST]!

This seems to be a nice fix!

Can you update a new patch? The patch does not apply to trunk!

> Attempt started time zone and Application started time zone is different when 
> OS time zone is modified
> --
>
> Key: YARN-9196
> URL: https://issues.apache.org/jira/browse/YARN-9196
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-9196-001.patch
>
>
> In RM application page, attempt start time is formatted client side 
> (browser), but application start time is formatted by the server.
> If client time zone and server time zone is different then on the UI, the 
> application start time and attempt start time will be in different format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8503) Add unit test for FINISHED_CONTAINERS_PULLED_BY_AM event on DECOMMISSIONING

2019-03-28 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804357#comment-16804357
 ] 

Hadoop QA commented on YARN-8503:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} YARN-8503 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8503 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12942893/YARN-8503.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23833/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Add unit test for FINISHED_CONTAINERS_PULLED_BY_AM event on DECOMMISSIONING
> ---
>
> Key: YARN-8503
> URL: https://issues.apache.org/jira/browse/YARN-8503
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.7.2
>Reporter: Amiya Chakraborty
>Assignee: Amiya Chakraborty
>Priority: Major
>  Labels: patch-available, yarn
> Attachments: YARN-8503.001.patch, YARN-8503.001.patch
>
>
> Currently, there is no unit test for testing the functionality - 
> FINISHED_CONTAINERS_PULLED_BY_AM event while Decommissioning of node. This 
> patch provides the same to check the AM has pulled the containers from the 
> RM; then the RM will inform the NM about it and the NM can remove the 
> completed container from its list during DECOMMISSIONING.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8701) If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1

2019-03-28 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804355#comment-16804355
 ] 

Hadoop QA commented on YARN-8701:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} YARN-8701 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8701 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12947055/YARN-8701.002.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23832/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> If the single parameter in Resources#createResourceWithSameValue is greater 
> than Integer.MAX_VALUE, then the value of vcores will be -1
> ---
>
> Key: YARN-8701
> URL: https://issues.apache.org/jira/browse/YARN-8701
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Reporter: Sen Zhao
>Assignee: Sen Zhao
>Priority: Major
> Attachments: YARN-8701.001.patch, YARN-8701.002.patch
>
>
> If I configure *MaxResources* in fair-scheduler.xml, like this:
> {code}resource1=50{code}
> In the queue, the *MaxResources* value will change to 
> {code}Max Resources: {code}
> I think the value of VCores should be *CLUSTER_VCORES*.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8943) Upgrade JUnit from 4 to 5 in hadoop-yarn-api

2019-03-28 Thread Akira Ajisaka (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804463#comment-16804463
 ] 

Akira Ajisaka commented on YARN-8943:
-

Thanks [~snemeth] for the information.

bq. Or maybe we could use this one before doing the migration from junit4 to 
juni5 on other projects?
Personally, I'm interested in using AssertJ, but it should be discussed in the 
dev mailing lists because this will change all the test code.

> Upgrade JUnit from 4 to 5 in hadoop-yarn-api
> 
>
> Key: YARN-8943
> URL: https://issues.apache.org/jira/browse/YARN-8943
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
> Attachments: YARN-8943.01.patch, YARN-8943.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7505) RM REST endpoints generate malformed JSON

2019-03-28 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804491#comment-16804491
 ] 

Hadoop QA commented on YARN-7505:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
51s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
31s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
46s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 19s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 32 unchanged - 2 fixed = 33 total (was 34) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
51s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 26s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}163m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMHA |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions
 |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-7505 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12897909/YARN-7505.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname

[jira] [Created] (YARN-9424) Change getDeclaredMethods to getMethods in FederationClientInterceptor#invokeConcurrent()

2019-03-28 Thread Shen Yinjie (JIRA)

Shen Yinjie created YARN-9424:
-

 Summary: Change getDeclaredMethods to getMethods in 
FederationClientInterceptor#invokeConcurrent()
 Key: YARN-9424
 URL: https://issues.apache.org/jira/browse/YARN-9424
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Shen Yinjie


In YARN-8699, FederationClientInterceptor#invokeConcurrent uses 
getDeclaredMethods(), which cannot recongnize some methods in 
ApplicationBaseProtocol (ApplicationClientProtocol extend 
ApplicationBaseProtocol) ,for example getApplications, when I run "yarn 
application -list" by connecting to yarn router, it will throw exception.
So change getDeclaredMethods to getMethods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9424) Change getDeclaredMethods to getMethods in FederationClientInterceptor#invokeConcurrent()

2019-03-28 Thread Shen Yinjie (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shen Yinjie updated YARN-9424:
--
Description: 
In YARN-8699, FederationClientInterceptor#invokeConcurrent uses 
getDeclaredMethods(), which cannot recongnize some methods in 
ApplicationBaseProtocol (ApplicationClientProtocol extend 
ApplicationBaseProtocol) ,for example getApplications(), when I run "yarn 
application -list" by connecting to yarn router, router will throw exception.
So change getDeclaredMethods to getMethods.

  was:
In YARN-8699, FederationClientInterceptor#invokeConcurrent uses 
getDeclaredMethods(), which cannot recongnize some methods in 
ApplicationBaseProtocol (ApplicationClientProtocol extend 
ApplicationBaseProtocol) ,for example getApplications, when I run "yarn 
application -list" by connecting to yarn router, it will throw exception.
So change getDeclaredMethods to getMethods.


> Change getDeclaredMethods to getMethods in 
> FederationClientInterceptor#invokeConcurrent()
> -
>
> Key: YARN-9424
> URL: https://issues.apache.org/jira/browse/YARN-9424
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shen Yinjie
>Priority: Major
>
> In YARN-8699, FederationClientInterceptor#invokeConcurrent uses 
> getDeclaredMethods(), which cannot recongnize some methods in 
> ApplicationBaseProtocol (ApplicationClientProtocol extend 
> ApplicationBaseProtocol) ,for example getApplications(), when I run "yarn 
> application -list" by connecting to yarn router, router will throw exception.
> So change getDeclaredMethods to getMethods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9424) Change getDeclaredMethods to getMethods in FederationClientInterceptor#invokeConcurrent()

2019-03-28 Thread Shen Yinjie (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shen Yinjie updated YARN-9424:
--
Attachment: YARN-9124_1.patch

> Change getDeclaredMethods to getMethods in 
> FederationClientInterceptor#invokeConcurrent()
> -
>
> Key: YARN-9424
> URL: https://issues.apache.org/jira/browse/YARN-9424
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shen Yinjie
>Priority: Major
> Attachments: YARN-9124_1.patch
>
>
> In YARN-8699, FederationClientInterceptor#invokeConcurrent uses 
> getDeclaredMethods(), which cannot recongnize some methods in 
> ApplicationBaseProtocol (ApplicationClientProtocol extend 
> ApplicationBaseProtocol) ,for example getApplications(), when I run "yarn 
> application -list" by connecting to yarn router, router will throw exception.
> So change getDeclaredMethods to getMethods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9413) Queue resource leak after app fail for CapacityScheduler

2019-03-28 Thread Tao Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9413:
---
Attachment: image-2019-03-29-10-47-47-953.png

> Queue resource leak after app fail for CapacityScheduler
> 
>
> Key: YARN-9413
> URL: https://issues.apache.org/jira/browse/YARN-9413
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.1.2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9413.001.patch, YARN-9413.002.patch, 
> image-2019-03-29-10-47-47-953.png
>
>
> To reproduce this problem:
>  # Submit an app which is configured to keep containers across app attempts 
> and should fail after AM finished at first time (am-max-attempts=1).
>  # App is started with 2 containers running on NM1 node.
>  # Fail the AM of the application with PREEMPTED exit status which should not 
> count towards max attempt retry but app will fail immediately.
>  # Used resource of this queue leaks after app fail.
> The root cause is the inconsistency of handling app attempt failure between 
> RMAppAttemptImpl$BaseFinalTransition#transition and 
> RMAppImpl$AttemptFailedTransition#transition:
>  # After app fail, RMAppFailedAttemptEvent will be sent in 
> RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM 
> container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it 
> will not count towards max attempt retry, so that it will send 
> AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and 
> RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true.
>  # RMAppImpl$AttemptFailedTransition#transition handle 
> RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1.
>  # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in 
> CapcityScheduler#doneApplicationAttempt, it will skip killing and calling 
> completion process for containers belong to this app, so that queue resource 
> leak happens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler

2019-03-28 Thread Tao Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804536#comment-16804536
 ] 

Tao Yang commented on YARN-9413:


Thanks [~cheersyang], [~snemeth] for your suggestions.

!image-2019-03-29-10-47-47-953.png!

The checkstyle issue seems unreasonable to me, I think the indentation level 
should be 12 for line 1505 in RMAppAttemptImpl, can you please help to see 
what's the problem? Thanks!

For issue 3 as [~snemeth] mentioned , test case can't use the field "conf" 
directly since it's a private field defined in the parent class 
(ParameterizedSchedulerTestBase).

Other issues above were imported from other cases in TestAMRestart when reusing 
codes in new test case. I think perhaps I should fix all of them in 
TestAMRestart, right? 

> Queue resource leak after app fail for CapacityScheduler
> 
>
> Key: YARN-9413
> URL: https://issues.apache.org/jira/browse/YARN-9413
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.1.2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9413.001.patch, YARN-9413.002.patch, 
> image-2019-03-29-10-47-47-953.png
>
>
> To reproduce this problem:
>  # Submit an app which is configured to keep containers across app attempts 
> and should fail after AM finished at first time (am-max-attempts=1).
>  # App is started with 2 containers running on NM1 node.
>  # Fail the AM of the application with PREEMPTED exit status which should not 
> count towards max attempt retry but app will fail immediately.
>  # Used resource of this queue leaks after app fail.
> The root cause is the inconsistency of handling app attempt failure between 
> RMAppAttemptImpl$BaseFinalTransition#transition and 
> RMAppImpl$AttemptFailedTransition#transition:
>  # After app fail, RMAppFailedAttemptEvent will be sent in 
> RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM 
> container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it 
> will not count towards max attempt retry, so that it will send 
> AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and 
> RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true.
>  # RMAppImpl$AttemptFailedTransition#transition handle 
> RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1.
>  # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in 
> CapcityScheduler#doneApplicationAttempt, it will skip killing and calling 
> completion process for containers belong to this app, so that queue resource 
> leak happens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-9424) Change getDeclaredMethods to getMethods in FederationClientInterceptor#invokeConcurrent()

2019-03-28 Thread Shen Yinjie (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shen Yinjie reassigned YARN-9424:
-

Assignee: Shen Yinjie

> Change getDeclaredMethods to getMethods in 
> FederationClientInterceptor#invokeConcurrent()
> -
>
> Key: YARN-9424
> URL: https://issues.apache.org/jira/browse/YARN-9424
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shen Yinjie
>Assignee: Shen Yinjie
>Priority: Major
> Attachments: YARN-9124_1.patch
>
>
> In YARN-8699, FederationClientInterceptor#invokeConcurrent uses 
> getDeclaredMethods(), which cannot recongnize some methods in 
> ApplicationBaseProtocol (ApplicationClientProtocol extend 
> ApplicationBaseProtocol) ,for example getApplications(), when I run "yarn 
> application -list" by connecting to yarn router, router will throw exception.
> So change getDeclaredMethods to getMethods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9424) Change getDeclaredMethods to getMethods in FederationClientInterceptor#invokeConcurrent()

2019-03-28 Thread Shen Yinjie (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shen Yinjie updated YARN-9424:
--
Description: 
In YARN-8699, FederationClientInterceptor#invokeConcurrent uses 
getDeclaredMethods(), which cannot recongnize some methods in 
ApplicationBaseProtocol (ApplicationClientProtocol extend 
ApplicationBaseProtocol) .
We have implemented some methods in FederationClientInterceptor, such as 
getApplications(), GetQueueUserAclsInfo ...etc, when I run "yarn application 
-list" by connecting to yarn router, router will throw exception.
So change getDeclaredMethods() to getMethods().

  was:
In YARN-8699, FederationClientInterceptor#invokeConcurrent uses 
getDeclaredMethods(), which cannot recongnize some methods in 
ApplicationBaseProtocol (ApplicationClientProtocol extend 
ApplicationBaseProtocol) ,for example getApplications(), when I run "yarn 
application -list" by connecting to yarn router, router will throw exception.
So change getDeclaredMethods to getMethods.


> Change getDeclaredMethods to getMethods in 
> FederationClientInterceptor#invokeConcurrent()
> -
>
> Key: YARN-9424
> URL: https://issues.apache.org/jira/browse/YARN-9424
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shen Yinjie
>Assignee: Shen Yinjie
>Priority: Major
> Attachments: YARN-9124_1.patch
>
>
> In YARN-8699, FederationClientInterceptor#invokeConcurrent uses 
> getDeclaredMethods(), which cannot recongnize some methods in 
> ApplicationBaseProtocol (ApplicationClientProtocol extend 
> ApplicationBaseProtocol) .
> We have implemented some methods in FederationClientInterceptor, such as 
> getApplications(), GetQueueUserAclsInfo ...etc, when I run "yarn application 
> -list" by connecting to yarn router, router will throw exception.
> So change getDeclaredMethods() to getMethods().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler

2019-03-28 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804542#comment-16804542
 ] 

Weiwei Yang commented on YARN-9413:
---

Yeah, it looks a bit weird, might be a false alarm.

Could you please fix everything else according to [~snemeth]'s comment?

> Queue resource leak after app fail for CapacityScheduler
> 
>
> Key: YARN-9413
> URL: https://issues.apache.org/jira/browse/YARN-9413
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.1.2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9413.001.patch, YARN-9413.002.patch, 
> image-2019-03-29-10-47-47-953.png
>
>
> To reproduce this problem:
>  # Submit an app which is configured to keep containers across app attempts 
> and should fail after AM finished at first time (am-max-attempts=1).
>  # App is started with 2 containers running on NM1 node.
>  # Fail the AM of the application with PREEMPTED exit status which should not 
> count towards max attempt retry but app will fail immediately.
>  # Used resource of this queue leaks after app fail.
> The root cause is the inconsistency of handling app attempt failure between 
> RMAppAttemptImpl$BaseFinalTransition#transition and 
> RMAppImpl$AttemptFailedTransition#transition:
>  # After app fail, RMAppFailedAttemptEvent will be sent in 
> RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM 
> container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it 
> will not count towards max attempt retry, so that it will send 
> AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and 
> RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true.
>  # RMAppImpl$AttemptFailedTransition#transition handle 
> RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1.
>  # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in 
> CapcityScheduler#doneApplicationAttempt, it will skip killing and calling 
> completion process for containers belong to this app, so that queue resource 
> leak happens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2

2019-03-28 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804558#comment-16804558
 ] 

Hadoop QA commented on YARN-8200:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 73 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
49s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 
 7s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
23s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
30s{color} | {color:green} branch-2 passed with JDK v1.8.0_191 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
36s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 11m 
40s{color} | {color:green} branch-2 passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-yarn-project/hadoop-yarn 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  9m 
17s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  8m 
21s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m 
53s{color} | {color:green} branch-2 passed with JDK v1.8.0_191 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
49s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 11m 
49s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 11m 49s{color} 
| {color:red} root-jdk1.7.0_95 with JDK v1.7.0_95 generated 2 new + 1441 
unchanged - 2 fixed = 1443 total (was 1443) {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
49s{color} | {color:green} the patch passed with JDK v1.8.0_191 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 10m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 
49s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 35s{color} | {color:orange} root: The patch generated 195 new + 3979 
unchanged - 73 fixed = 4174 total (was 4052) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 11m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m  
9s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
1s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
1s{color} | {color:red} The patch 524 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m 
17s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{c

[jira] [Commented] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp

2019-03-28 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804571#comment-16804571
 ] 

Hadoop QA commented on YARN-9348:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 16 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 12m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 40s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
38s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
30s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 
18s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
4m 22s{color} | {color:orange} root: The patch generated 3 new + 4 unchanged - 
0 fixed = 7 total (was 4) {color} |
| {color:green}+1{color} | {color:green} hadolint {color} | {color:green}  0m  
0s{color} | {color:green} There were no new hadolint issues. {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 13m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:orange}-0{color} | {color:orange} shelldocs {color} | {color:orange}  
0m 13s{color} | {color:orange} The patch generated 132 new + 104 unchanged - 0 
fixed = 236 total (was 104) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m 
13s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  7s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-docker
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
58s{color} | {color:green} the patch passed {color} |
|| |

[jira] [Updated] (YARN-9227) DistributedShell RelativePath is not removed at end

2019-03-28 Thread Prabhu Joseph (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9227:

Attachment: YARN-9227-004.patch

> DistributedShell RelativePath is not removed at end
> ---
>
> Key: YARN-9227
> URL: https://issues.apache.org/jira/browse/YARN-9227
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-shell
>Affects Versions: 3.1.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: 0001-YARN-9227.patch, 0002-YARN-9227.patch, 
> 0003-YARN-9227.patch, YARN-9227-004.patch
>
>
> DistributedShell Job does not remove the relative path which contains jars 
> and localized files.
> {code}
> [ambari-qa@ash hadoop-yarn]$ hadoop fs -ls 
> /user/ambari-qa/DistributedShell/application_1542665708563_0017
> Found 2 items
> -rw-r--r--   3 ambari-qa hdfs  46636 2019-01-23 13:37 
> /user/ambari-qa/DistributedShell/application_1542665708563_0017/AppMaster.jar
> -rwx--x---   3 ambari-qa hdfs  4 2019-01-23 13:37 
> /user/ambari-qa/DistributedShell/application_1542665708563_0017/shellCommands
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9227) DistributedShell RelativePath is not removed at end

2019-03-28 Thread Prabhu Joseph (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804574#comment-16804574
 ] 

Prabhu Joseph commented on YARN-9227:
-

Thanks [~snemeth] for reviewing. Attached 004 patch addressing the above 
comments.
{quote} # There's a while loop in the end: How can you be sure that it won't 
run infinitely?{quote}
Was relying on testcase timeout 9ms. Have changed it to 
{{GenericTestUtils.waitFor}}.

 

> DistributedShell RelativePath is not removed at end
> ---
>
> Key: YARN-9227
> URL: https://issues.apache.org/jira/browse/YARN-9227
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-shell
>Affects Versions: 3.1.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: 0001-YARN-9227.patch, 0002-YARN-9227.patch, 
> 0003-YARN-9227.patch, YARN-9227-004.patch
>
>
> DistributedShell Job does not remove the relative path which contains jars 
> and localized files.
> {code}
> [ambari-qa@ash hadoop-yarn]$ hadoop fs -ls 
> /user/ambari-qa/DistributedShell/application_1542665708563_0017
> Found 2 items
> -rw-r--r--   3 ambari-qa hdfs  46636 2019-01-23 13:37 
> /user/ambari-qa/DistributedShell/application_1542665708563_0017/AppMaster.jar
> -rwx--x---   3 ambari-qa hdfs  4 2019-01-23 13:37 
> /user/ambari-qa/DistributedShell/application_1542665708563_0017/shellCommands
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler

2019-03-28 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804580#comment-16804580
 ] 

Hadoop QA commented on YARN-9413:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} YARN-9413 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-9413 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23834/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Queue resource leak after app fail for CapacityScheduler
> 
>
> Key: YARN-9413
> URL: https://issues.apache.org/jira/browse/YARN-9413
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.1.2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9413.001.patch, YARN-9413.002.patch, 
> image-2019-03-29-10-47-47-953.png
>
>
> To reproduce this problem:
>  # Submit an app which is configured to keep containers across app attempts 
> and should fail after AM finished at first time (am-max-attempts=1).
>  # App is started with 2 containers running on NM1 node.
>  # Fail the AM of the application with PREEMPTED exit status which should not 
> count towards max attempt retry but app will fail immediately.
>  # Used resource of this queue leaks after app fail.
> The root cause is the inconsistency of handling app attempt failure between 
> RMAppAttemptImpl$BaseFinalTransition#transition and 
> RMAppImpl$AttemptFailedTransition#transition:
>  # After app fail, RMAppFailedAttemptEvent will be sent in 
> RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM 
> container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it 
> will not count towards max attempt retry, so that it will send 
> AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and 
> RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true.
>  # RMAppImpl$AttemptFailedTransition#transition handle 
> RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1.
>  # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in 
> CapcityScheduler#doneApplicationAttempt, it will skip killing and calling 
> completion process for containers belong to this app, so that queue resource 
> leak happens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9413) Queue resource leak after app fail for CapacityScheduler

2019-03-28 Thread Tao Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9413:
---
Attachment: YARN-9413.003.patch

> Queue resource leak after app fail for CapacityScheduler
> 
>
> Key: YARN-9413
> URL: https://issues.apache.org/jira/browse/YARN-9413
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.1.2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9413.001.patch, YARN-9413.002.patch, 
> YARN-9413.003.patch, image-2019-03-29-10-47-47-953.png
>
>
> To reproduce this problem:
>  # Submit an app which is configured to keep containers across app attempts 
> and should fail after AM finished at first time (am-max-attempts=1).
>  # App is started with 2 containers running on NM1 node.
>  # Fail the AM of the application with PREEMPTED exit status which should not 
> count towards max attempt retry but app will fail immediately.
>  # Used resource of this queue leaks after app fail.
> The root cause is the inconsistency of handling app attempt failure between 
> RMAppAttemptImpl$BaseFinalTransition#transition and 
> RMAppImpl$AttemptFailedTransition#transition:
>  # After app fail, RMAppFailedAttemptEvent will be sent in 
> RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM 
> container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it 
> will not count towards max attempt retry, so that it will send 
> AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and 
> RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true.
>  # RMAppImpl$AttemptFailedTransition#transition handle 
> RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1.
>  # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in 
> CapcityScheduler#doneApplicationAttempt, it will skip killing and calling 
> completion process for containers belong to this app, so that queue resource 
> leak happens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler

2019-03-28 Thread Tao Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804589#comment-16804589
 ] 

Tao Yang commented on YARN-9413:


Thanks [~cheersyang] for the confirmation about the checkstyle error.

Attached v3 patch to fix issues in TestAMRestart according to [~snemeth]'s 
comment.

 

> Queue resource leak after app fail for CapacityScheduler
> 
>
> Key: YARN-9413
> URL: https://issues.apache.org/jira/browse/YARN-9413
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.1.2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9413.001.patch, YARN-9413.002.patch, 
> YARN-9413.003.patch, image-2019-03-29-10-47-47-953.png
>
>
> To reproduce this problem:
>  # Submit an app which is configured to keep containers across app attempts 
> and should fail after AM finished at first time (am-max-attempts=1).
>  # App is started with 2 containers running on NM1 node.
>  # Fail the AM of the application with PREEMPTED exit status which should not 
> count towards max attempt retry but app will fail immediately.
>  # Used resource of this queue leaks after app fail.
> The root cause is the inconsistency of handling app attempt failure between 
> RMAppAttemptImpl$BaseFinalTransition#transition and 
> RMAppImpl$AttemptFailedTransition#transition:
>  # After app fail, RMAppFailedAttemptEvent will be sent in 
> RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM 
> container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it 
> will not count towards max attempt retry, so that it will send 
> AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and 
> RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true.
>  # RMAppImpl$AttemptFailedTransition#transition handle 
> RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1.
>  # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in 
> CapcityScheduler#doneApplicationAttempt, it will skip killing and calling 
> completion process for containers belong to this app, so that queue resource 
> leak happens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9424) Change getDeclaredMethods to getMethods in FederationClientInterceptor#invokeConcurrent()

2019-03-28 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804601#comment-16804601
 ] 

Hadoop QA commented on YARN-9424:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 29s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 35s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
37s{color} | {color:green} hadoop-yarn-server-router in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 46m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9424 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12964120/YARN-9124_1.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 643327fafdad 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d7a2f94 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23835/testReport/ |
| Max. process+thread count | 786 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23835/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.a

74 matches

Mail list logo