[jira] [Commented] (YARN-7340) Missing the time stamp in exception message in Class NoOverCommitPolicy

2018-05-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486819#comment-16486819
 ] 

genericqa commented on YARN-7340:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 26s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 59s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 68m 
15s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}120m  0s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-7340 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924687/YARN-7340.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 5efeefc932e0 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 5a91406 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/20837/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20837/testReport/ |
| Max. process+thread count | 88

[jira] [Resolved] (YARN-7998) RM crashes with NPE during recovering if ACL configuration was changed

2018-05-23 Thread Oleksandr Shevchenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksandr Shevchenko resolved YARN-7998.

Resolution: Fixed

> RM crashes with NPE during recovering if ACL configuration was changed
> --
>
> Key: YARN-7998
> URL: https://issues.apache.org/jira/browse/YARN-7998
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 3.0.0
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Major
> Attachments: YARN-7998.000.patch, YARN-7998.001.patch, 
> YARN-7998.002.patch, YARN-7998.003.patch
>
>
> RM crashes with NPE during failover because ACL configurations were changed 
> as a result we no longer have a rights to submit an application to a queue.
> Scenario:
>  # Submit an application
>  # Change ACL configuration for a queue that accepted the application so that 
> an owner of the application will no longer have a rights to submit this 
> application.
>  # Restart RM.
> As a result, we get NPE:
> 2018-02-27 18:14:00,968 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state STARTED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:738)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1286)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:116)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1098)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1044)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7998) RM crashes with NPE during recovering if ACL configuration was changed

2018-05-23 Thread Oleksandr Shevchenko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486841#comment-16486841
 ] 

Oleksandr Shevchenko commented on YARN-7998:


Thank you [~wilfreds]. Closed as the duplicate of YARN-7913.

> RM crashes with NPE during recovering if ACL configuration was changed
> --
>
> Key: YARN-7998
> URL: https://issues.apache.org/jira/browse/YARN-7998
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 3.0.0
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Major
> Attachments: YARN-7998.000.patch, YARN-7998.001.patch, 
> YARN-7998.002.patch, YARN-7998.003.patch
>
>
> RM crashes with NPE during failover because ACL configurations were changed 
> as a result we no longer have a rights to submit an application to a queue.
> Scenario:
>  # Submit an application
>  # Change ACL configuration for a queue that accepted the application so that 
> an owner of the application will no longer have a rights to submit this 
> application.
>  # Restart RM.
> As a result, we get NPE:
> 2018-02-27 18:14:00,968 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state STARTED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:738)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1286)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:116)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1098)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1044)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7786) NullPointerException while launching ApplicationMaster

2018-05-23 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7786:

Priority: Major  (was: Minor)

> NullPointerException while launching ApplicationMaster
> --
>
> Key: YARN-7786
> URL: https://issues.apache.org/jira/browse/YARN-7786
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-beta1
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: YARN-7786.patch, YARN-7786_1.patch, YARN-7786_2.patch, 
> YARN-7786_3.patch, YARN-7786_4.patch, YARN-7786_5.patch, YARN-7786_6.patch, 
> resourcemanager.log
>
>
> Before launching the ApplicationMaster, send kill command to the job, then 
> some Null pointer appears:
> {code}
> 2017-11-25 21:27:25,333 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
> launching appattempt_1511616410268_0001_01. Got exception: 
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.setupTokens(AMLauncher.java:205)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.createAMContainerLaunchContext(AMLauncher.java:193)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:112)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:304)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING

2018-05-23 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-6948:

Priority: Major  (was: Minor)

> Invalid event: ATTEMPT_ADDED at FINAL_SAVING
> 
>
> Key: YARN-6948
> URL: https://issues.apache.org/jira/browse/YARN-6948
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0, 3.0.0-alpha4
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: YARN-6948_1.patch, YARN-6948_2.patch, yarn-6948.png, 
> yarn-6948.txt
>
>
> When I send kill command to a running job, I check the logs and find the 
> Exception:
> {code:java}
> 2017-08-03 01:35:20,485 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ATTEMPT_ADDED at FINAL_SAVING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-7998) RM crashes with NPE during recovering if ACL configuration was changed

2018-05-23 Thread Oleksandr Shevchenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksandr Shevchenko reopened YARN-7998:


> RM crashes with NPE during recovering if ACL configuration was changed
> --
>
> Key: YARN-7998
> URL: https://issues.apache.org/jira/browse/YARN-7998
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 3.0.0
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Major
> Attachments: YARN-7998.000.patch, YARN-7998.001.patch, 
> YARN-7998.002.patch, YARN-7998.003.patch
>
>
> RM crashes with NPE during failover because ACL configurations were changed 
> as a result we no longer have a rights to submit an application to a queue.
> Scenario:
>  # Submit an application
>  # Change ACL configuration for a queue that accepted the application so that 
> an owner of the application will no longer have a rights to submit this 
> application.
>  # Restart RM.
> As a result, we get NPE:
> 2018-02-27 18:14:00,968 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state STARTED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:738)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1286)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:116)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1098)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1044)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7998) RM crashes with NPE during recovering if ACL configuration was changed

2018-05-23 Thread Oleksandr Shevchenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksandr Shevchenko resolved YARN-7998.

Resolution: Duplicate

> RM crashes with NPE during recovering if ACL configuration was changed
> --
>
> Key: YARN-7998
> URL: https://issues.apache.org/jira/browse/YARN-7998
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 3.0.0
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Major
> Attachments: YARN-7998.000.patch, YARN-7998.001.patch, 
> YARN-7998.002.patch, YARN-7998.003.patch
>
>
> RM crashes with NPE during failover because ACL configurations were changed 
> as a result we no longer have a rights to submit an application to a queue.
> Scenario:
>  # Submit an application
>  # Change ACL configuration for a queue that accepted the application so that 
> an owner of the application will no longer have a rights to submit this 
> application.
>  # Restart RM.
> As a result, we get NPE:
> 2018-02-27 18:14:00,968 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state STARTED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:738)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1286)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:116)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1098)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1044)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-05-23 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7663:

Priority: Major  (was: Minor)

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Major
>  Labels: patch
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, 
> YARN-7663_4.patch, YARN-7663_5.patch, YARN-7663_6.patch, YARN-7663_7.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8273) Log aggregation does not warn if HDFS quota in target directory is exceeded

2018-05-23 Thread Gergo Repas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486937#comment-16486937
 ] 

Gergo Repas commented on YARN-8273:
---

[~rkanter] - Thanks for the review and commiting the change!
[~snemeth] - Thanks for the review!

> Log aggregation does not warn if HDFS quota in target directory is exceeded
> ---
>
> Key: YARN-8273
> URL: https://issues.apache.org/jira/browse/YARN-8273
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.1.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8273.000.patch, YARN-8273.001.patch, 
> YARN-8273.002.patch, YARN-8273.003.patch, YARN-8273.004.patch, 
> YARN-8273.005.patch, YARN-8273.006.patch
>
>
> It appears that if an HDFS space quota is set on a target directory for log 
> aggregation and the quota is already exceeded when log aggregation is 
> attempted, zero-byte log files will be written to the HDFS directory, however 
> NodeManager logs do not reflect a failure to write the files successfully 
> (i.e. there are no ERROR or WARN messages to this effect).
> An improvement may be worth investigating to alert users to this scenario, as 
> otherwise logs for a YARN application may be missing both on HDFS and locally 
> (after local log cleanup is done) and the user may not otherwise be informed.
> Steps to reproduce:
> * Set a small HDFS space quota on /tmp/logs/username/logs (e.g. 2MB)
> * Write files to HDFS such that /tmp/logs/username/logs is almost 2MB full
> * Run a Spark or MR job in the cluster
> * Observe that zero byte files are written to HDFS after job completion
> * Observe that YARN container logs are also not present on the NM hosts (or 
> are deleted after yarn.nodemanager.delete.debug-delay-sec)
> * Observe that no ERROR or WARN messages appear to be logged in the NM role 
> log



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups

2018-05-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486991#comment-16486991
 ] 

genericqa commented on YARN-4599:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 19m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
32m 53s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
32s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 29m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 29m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
18s{color} | {color:green} root: The patch generated 0 new + 235 unchanged - 1 
fixed = 235 total (was 236) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 20m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m  
2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}166m 35s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}373m 17s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
|   | hadoop.hdfs.client.impl.TestBlockReaderLocal |
|   | hadoop.yarn.server.timelineservice.storage.TestHBaseT

[jira] [Created] (YARN-8345) NodeHealthCheckerService to differentiate between reason for UnusableNodes for client to act suitably on it

2018-05-23 Thread Kartik Bhatia (JIRA)
Kartik Bhatia created YARN-8345:
---

 Summary: NodeHealthCheckerService to differentiate between reason 
for UnusableNodes for client to act suitably on it
 Key: YARN-8345
 URL: https://issues.apache.org/jira/browse/YARN-8345
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Reporter: Kartik Bhatia


+*Current Scenario :*+ 

NodeHealthCheckerService marks a node Unhealthy on basis of 2 things : 
 # External Script
 # Directory status

If a directory is marked as full(as per DiskCheck configs in yarn-site), node 
manager marks this as unhealthy. 

Once a node is marked unhealthy, mapreduce launches all the map tasks that ran 
on this usable node. This leads to even successful tasks being relaunched.

+{color:#33}*Problem :*{color}+

{color:#33}We do not have distinction between disk limit to stop container 
launch on that node and limit so that reducer can read data from that 
node.{color}

{color:#33}For Example : {color}

{color:#33}Let us consider a 3 TB disk. If we set max disk utilisation 
percentage as 95% (since launch of container requires approx 0.15 TB for jobs 
in our cluster) and there are few nodes where disk utilisation is say 96%, the 
threshold will be breached. These nodes will be marked unhealthy by 
NodeManager. This will result in all successful mappers being relaunched on 
other nodes. But still 4% memory is good enough for reducers to read that data. 
This causes unnecessary delay in our jobs. (Mappers launching again can preempt 
reducers if there is crunch for space and there are issues with calculating 
Headroom in Capacity scheduler as well){color}

 

+*Correction :*+

We need a state (say UNUSABLE_WRITE) that can let mapreduce know that node is 
still good for reading data and successful mappers should not be relaunched. 
This can prevent delay.

  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8320) Support CPU isolation for latency-sensitive (LS) service

2018-05-23 Thread Jiandan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487066#comment-16487066
 ] 

Jiandan Yang  commented on YARN-8320:
-

[~cheersyang] and I discuss design offline together. Add more details in v2 
design doc.

> Support CPU isolation for latency-sensitive (LS) service
> 
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8320) Support CPU isolation for latency-sensitive (LS) service

2018-05-23 Thread Jiandan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  updated YARN-8320:

Attachment: CPU-isolation-for-latency-sensitive-services-v2.pdf

> Support CPU isolation for latency-sensitive (LS) service
> 
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8337) Deadlock In Federation Router

2018-05-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487083#comment-16487083
 ] 

genericqa commented on YARN-8337:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
43s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
4s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}100m 55s{color} 
| {color:red} hadoop-yarn in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
14s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}206m 31s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestContainerManager |
|   | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageDomain 
|
|   | 
hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage
 |
|   | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageApps |
|   | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageSchema 
|
|   | hadoop.yarn.server.timelineservice.stor

[jira] [Commented] (YARN-8319) More YARN pages need to honor yarn.resourcemanager.display.per-user-apps

2018-05-23 Thread Sunil Govindan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487084#comment-16487084
 ] 

Sunil Govindan commented on YARN-8319:
--

Updated patch with test case. [~rohithsharma] pls help to check.

> More YARN pages need to honor yarn.resourcemanager.display.per-user-apps
> 
>
> Key: YARN-8319
> URL: https://issues.apache.org/jira/browse/YARN-8319
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-8319.001.patch, YARN-8319.002.patch, 
> YARN-8319.003.patch
>
>
> When this config is on
>  - Per queue page on UI2 should filter app list by user
>  -- TODO: Verify the same with UI1 Per-queue page
>  - ATSv2 with UI2 should filter list of all users' flows and flow activities
>  - Per Node pages
>  -- Listing of apps and containers on a per-node basis should filter apps and 
> containers by user.
> To this end, because this is no longer just for resourcemanager, we should 
> also deprecate {{yarn.resourcemanager.display.per-user-apps}} in favor of 
> {{yarn.webapp.filter-app-list-by-user}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8319) More YARN pages need to honor yarn.resourcemanager.display.per-user-apps

2018-05-23 Thread Sunil Govindan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-8319:
-
Attachment: YARN-8319.003.patch

> More YARN pages need to honor yarn.resourcemanager.display.per-user-apps
> 
>
> Key: YARN-8319
> URL: https://issues.apache.org/jira/browse/YARN-8319
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-8319.001.patch, YARN-8319.002.patch, 
> YARN-8319.003.patch
>
>
> When this config is on
>  - Per queue page on UI2 should filter app list by user
>  -- TODO: Verify the same with UI1 Per-queue page
>  - ATSv2 with UI2 should filter list of all users' flows and flow activities
>  - Per Node pages
>  -- Listing of apps and containers on a per-node basis should filter apps and 
> containers by user.
> To this end, because this is no longer just for resourcemanager, we should 
> also deprecate {{yarn.resourcemanager.display.per-user-apps}} in favor of 
> {{yarn.webapp.filter-app-list-by-user}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8337) Deadlock In Federation Router

2018-05-23 Thread Jianchao Jia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487101#comment-16487101
 ] 

Jianchao Jia commented on YARN-8337:


[~giovanni.fumarola] Thanks for your reply,I have update the test,please review 
again.

BTW,I don`t think these failed junit tests are related to this patch.

> Deadlock In Federation Router
> -
>
> Key: YARN-8337
> URL: https://issues.apache.org/jira/browse/YARN-8337
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, router
>Reporter: Jianchao Jia
>Priority: Major
> Attachments: YARN-8337.001.patch, YARN-8337.002.patch
>
>
> We use mysql innodb as the state store engine,in router log we found dead 
> lock error like below:
> {code:java}
> [2018-05-21T15:41:40.383+08:00] [ERROR] [IPC Server handler 25 on 8050] : 
> Unable to insert the newly generated application 
> application_1526295230627_127402
> com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock 
> found when trying to get lock; try restarting transaction
> at sun.reflect.GeneratedConstructorAccessor107.newInstance(Unknown 
> Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
> at com.mysql.jdbc.Util.getInstance(Util.java:408)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
> at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
> at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
> at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
> at 
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
> at 
> com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
> at 
> com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418)
> at 
> com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887)
> at 
> com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
> at 
> com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java)
> at 
> org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547)
> {code}
> Use "show engine innodb status;" command to find what happens 
> {code:java}
> 2018-05-21 15:41:40 7f4685870700
> *** (1) TRANSACTION:
> TRANSACTION 241131538, ACTIVE 0 sec inserting, thread declared inside InnoDB 
> 4999
> mysql tables in use 2, locked 2
> LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s)
> MySQL thread id 7602335, OS thread handle 0x7f46858f2700, query id 2919792534 
> 192.168.1.138 federation executing
> INSERT INTO applicationsHomeSubCluster
> (applicationId,homeSubCluster)
> (SELECT applicationId_IN, homeSubCluster_IN
> FROM applicationsHomeSubCluster
> WHERE applicationId = applicationId_IN
> HAVING COUNT(*) = 0 )
> *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
> RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table 
> `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131538 
> lock_mode X locks gap before rec insert intention waiting
> Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info 
> bits 0
> 0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; 
> asc application_1526295230627_1274; (total 31 bytes);
> 1: len 6; hex 0ba5f32d; asc -;;
> 2: len 7; hex dd00280110; asc ( ;;
> 3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;
> *** (2) TRANSACTION:
> TRANSACTION 241131539, ACTIVE 0 sec inserting, thread declared inside InnoDB 
> 4999
> mysql tables in use 2, locked 2
> 4 lock struct(s), heap size 1184, 2 row lock(s)
> MySQL thread id 7600638, OS thread handle 0x7f4685870700, query id 2919792535 
> 192.168.1.138 federation executing
> INSERT INTO applicationsHomeSubCluster
> (applicationId,homeSubCluster)
> (SELECT applicationId_IN, homeSubCluster_IN
> FROM applicationsHomeSubCluster
> WHERE applicationId = applicationId_IN
> HAVING COUNT(*) = 0 )
> *** (2) HOLDS THE LOCK(S):
> RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table 
> `guldan_federationstatestore

[jira] [Commented] (YARN-6919) Add default volume mount list

2018-05-23 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487156#comment-16487156
 ] 

Shane Kumpf commented on YARN-6919:
---

[~ebadger] - the patch doesn't apply cleanly on 3.1. Do you think we need this 
in 3.1? If so, could you provide a patch? Thanks!

> Add default volume mount list
> -
>
> Key: YARN-6919
> URL: https://issues.apache.org/jira/browse/YARN-6919
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-6919.001.patch, YARN-6919.002.patch
>
>
> Piggybacking on YARN-5534, we should create a default list that bind mounts 
> selected volumes into all docker containers. This list will be empty by 
> default 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-8346:
---

 Summary: Upgrading to 3.1 kills running containers with error 
"Opportunistic container queue is full"
 Key: YARN-8346
 URL: https://issues.apache.org/jira/browse/YARN-8346
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Rohith Sharma K S


It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the running 
containers are killed and second attempt is launched for that application. The 
diagnostics message is "Opportunistic container queue is full" which is the 
reason for container killed. 

In NM log, I see below logs for after container is recovered.
{noformat}
2018-05-23 17:18:50,655 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
 Opportunistic container [container_e06_1527075664705_0001_01_01] will not 
be queued at the NMsince max queue length [0] has been reached
{noformat}

Following steps are executed for rolling upgrade
# Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
# Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
# Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487175#comment-16487175
 ] 

Rohith Sharma K S commented on YARN-8346:
-

In class ContainerScheduler#enqueueContainer, for recovered container from 
2.8.4 execution type is not set which result in else condition with zero queue 
lenght.  This is sending kill event for container resulting running containers 
are killed.

{code}
private boolean enqueueContainer(Container container) {
boolean isGuaranteedContainer = container.getContainerTokenIdentifier().
getExecutionType() == ExecutionType.GUARANTEED;

boolean isQueued;
if (isGuaranteedContainer) {
  queuedGuaranteedContainers.put(container.getContainerId(), container);
  isQueued = true;
} else {
  if (queuedOpportunisticContainers.size() < maxOppQueueLength) {
LOG.info("Opportunistic container {} will be queued at the NM.",
container.getContainerId());
queuedOpportunisticContainers.put(
container.getContainerId(), container);
isQueued = true;
  } else {
LOG.info("Opportunistic container [{}] will not be queued at the NM" +
"since max queue length [{}] has been reached",
container.getContainerId(), maxOppQueueLength);
container.sendKillEvent(
ContainerExitStatus.KILLED_BY_CONTAINER_SCHEDULER,
"Opportunistic container queue is full.");
isQueued = false;
  }
}
{code}


Since opportunistic container feature is exist in 2.9, this would also issue 
upgrading into 2.9 I think.
cc:/ [~jlowe] [~arun.sur...@gmail.com] 

> Upgrading to 3.1 kills running containers with error "Opportunistic container 
> queue is full"
> 
>
> Key: YARN-8346
> URL: https://issues.apache.org/jira/browse/YARN-8346
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Priority: Major
>
> It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the 
> running containers are killed and second attempt is launched for that 
> application. The diagnostics message is "Opportunistic container queue is 
> full" which is the reason for container killed. 
> In NM log, I see below logs for after container is recovered.
> {noformat}
> 2018-05-23 17:18:50,655 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
>  Opportunistic container [container_e06_1527075664705_0001_01_01] will 
> not be queued at the NMsince max queue length [0] has been reached
> {noformat}
> Following steps are executed for rolling upgrade
> # Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
> # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
> # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8285) Remove unused environment variables from the Docker runtime

2018-05-23 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487184#comment-16487184
 ] 

Shane Kumpf commented on YARN-8285:
---

Thanks to [~ebadger] for the contribution! I committed this to trunk.

> Remove unused environment variables from the Docker runtime
> ---
>
> Key: YARN-8285
> URL: https://issues.apache.org/jira/browse/YARN-8285
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Eric Badger
>Priority: Trivial
>  Labels: Docker
> Attachments: YARN-8285.001.patch
>
>
> YARN-7430 enabled user remapping for Docker containers by default. As a 
> result, YARN_CONTAINER_RUNTIME_DOCKER_RUN_ENABLE_USER_REMAPPING is no longer 
> used and can be removed.
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE_FILE was added in the original 
> implementation, but was never used and can be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8347) [Umbrella] Upgrade efforts to Hadoop 3.x

2018-05-23 Thread Sunil Govindan (JIRA)
Sunil Govindan created YARN-8347:


 Summary: [Umbrella] Upgrade efforts to Hadoop 3.x
 Key: YARN-8347
 URL: https://issues.apache.org/jira/browse/YARN-8347
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sunil Govindan


This is an umbrella ticket to manage all similar efforts to close gaps for 
upgrade efforts to 3.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Sunil Govindan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-8346:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-8347

> Upgrading to 3.1 kills running containers with error "Opportunistic container 
> queue is full"
> 
>
> Key: YARN-8346
> URL: https://issues.apache.org/jira/browse/YARN-8346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Priority: Major
>
> It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the 
> running containers are killed and second attempt is launched for that 
> application. The diagnostics message is "Opportunistic container queue is 
> full" which is the reason for container killed. 
> In NM log, I see below logs for after container is recovered.
> {noformat}
> 2018-05-23 17:18:50,655 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
>  Opportunistic container [container_e06_1527075664705_0001_01_01] will 
> not be queued at the NMsince max queue length [0] has been reached
> {noformat}
> Following steps are executed for rolling upgrade
> # Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
> # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
> # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8347) [Umbrella] Upgrade efforts to Hadoop 3.x

2018-05-23 Thread Sunil Govindan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487202#comment-16487202
 ] 

Sunil Govindan commented on YARN-8347:
--

cc /[~leftnoteasy]

> [Umbrella] Upgrade efforts to Hadoop 3.x
> 
>
> Key: YARN-8347
> URL: https://issues.apache.org/jira/browse/YARN-8347
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sunil Govindan
>Priority: Major
>
> This is an umbrella ticket to manage all similar efforts to close gaps for 
> upgrade efforts to 3.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8285) Remove unused environment variables from the Docker runtime

2018-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487214#comment-16487214
 ] 

Hudson commented on YARN-8285:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14264 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14264/])
YARN-8285. Remove unused environment variables from the Docker runtime. 
(skumpf: rev 9837ca9cc746573571029f9fb996a1be10b588ab)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java


> Remove unused environment variables from the Docker runtime
> ---
>
> Key: YARN-8285
> URL: https://issues.apache.org/jira/browse/YARN-8285
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Eric Badger
>Priority: Trivial
>  Labels: Docker
> Fix For: 3.2.0
>
> Attachments: YARN-8285.001.patch
>
>
> YARN-7430 enabled user remapping for Docker containers by default. As a 
> result, YARN_CONTAINER_RUNTIME_DOCKER_RUN_ENABLE_USER_REMAPPING is no longer 
> used and can be removed.
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE_FILE was added in the original 
> implementation, but was never used and can be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8191) Fair scheduler: queue deletion without RM restart

2018-05-23 Thread Gergo Repas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergo Repas updated YARN-8191:
--
Attachment: YARN-8191.015.patch

> Fair scheduler: queue deletion without RM restart
> -
>
> Key: YARN-8191
> URL: https://issues.apache.org/jira/browse/YARN-8191
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.1
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: Queue Deletion in Fair Scheduler.pdf, 
> YARN-8191.000.patch, YARN-8191.001.patch, YARN-8191.002.patch, 
> YARN-8191.003.patch, YARN-8191.004.patch, YARN-8191.005.patch, 
> YARN-8191.006.patch, YARN-8191.007.patch, YARN-8191.008.patch, 
> YARN-8191.009.patch, YARN-8191.010.patch, YARN-8191.011.patch, 
> YARN-8191.012.patch, YARN-8191.013.patch, YARN-8191.014.patch, 
> YARN-8191.015.patch
>
>
> The Fair Scheduler never cleans up queues even if they are deleted in the 
> allocation file, or were dynamically created and are never going to be used 
> again. Queues always remain in memory which leads to two following issues.
>  # Steady fairshares aren’t calculated correctly due to remaining queues
>  # WebUI shows deleted queues, which is confusing for users (YARN-4022).
> We want to support proper queue deletion without restarting the Resource 
> Manager:
>  # Static queues without any entries that are removed from fair-scheduler.xml 
> should be deleted from memory.
>  # Dynamic queues without any entries should be deleted.
>  # RM Web UI should only show the queues defined in the scheduler at that 
> point in time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8297) Incorrect ATS Url used for Wire encrypted cluster

2018-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487248#comment-16487248
 ] 

Hudson commented on YARN-8297:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14265 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14265/])
YARN-8297. Incorrect ATS Url used for Wire encrypted cluster.(addendum). 
(rohithsharmaks: rev f61e3e752eb1cf4a08030da04bc3d6c5a2b3926d)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/initializers/loader.js


> Incorrect ATS Url used for Wire encrypted cluster
> -
>
> Key: YARN-8297
> URL: https://issues.apache.org/jira/browse/YARN-8297
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Sunil Govindan
>Priority: Blocker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8297-addendum.patch, YARN-8297.001.patch
>
>
> "Service" page uses incorrect web url for ATS in wire encrypted env. For ATS 
> urls, it uses https protocol with http port.
> This issue causes all ATS call to fail and UI does not display component 
> details.
> url used: 
> https://xxx:8198/ws/v2/timeline/apps/application_1526357251888_0022/entities/SERVICE_ATTEMPT?fields=ALL&_=1526415938320
> expected url : 
> https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/SERVICE_ATTEMPT?fields=ALL&_=1526415938320



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Sunil Govindan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-8346:
-
Priority: Blocker  (was: Major)

> Upgrading to 3.1 kills running containers with error "Opportunistic container 
> queue is full"
> 
>
> Key: YARN-8346
> URL: https://issues.apache.org/jira/browse/YARN-8346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0, 3.0.2
>Reporter: Rohith Sharma K S
>Priority: Blocker
>
> It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the 
> running containers are killed and second attempt is launched for that 
> application. The diagnostics message is "Opportunistic container queue is 
> full" which is the reason for container killed. 
> In NM log, I see below logs for after container is recovered.
> {noformat}
> 2018-05-23 17:18:50,655 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
>  Opportunistic container [container_e06_1527075664705_0001_01_01] will 
> not be queued at the NMsince max queue length [0] has been reached
> {noformat}
> Following steps are executed for rolling upgrade
> # Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
> # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
> # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Sunil Govindan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487249#comment-16487249
 ] 

Sunil Govindan commented on YARN-8346:
--

bumping up as Blocker.

> Upgrading to 3.1 kills running containers with error "Opportunistic container 
> queue is full"
> 
>
> Key: YARN-8346
> URL: https://issues.apache.org/jira/browse/YARN-8346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0, 3.0.2
>Reporter: Rohith Sharma K S
>Priority: Blocker
>
> It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the 
> running containers are killed and second attempt is launched for that 
> application. The diagnostics message is "Opportunistic container queue is 
> full" which is the reason for container killed. 
> In NM log, I see below logs for after container is recovered.
> {noformat}
> 2018-05-23 17:18:50,655 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
>  Opportunistic container [container_e06_1527075664705_0001_01_01] will 
> not be queued at the NMsince max queue length [0] has been reached
> {noformat}
> Following steps are executed for rolling upgrade
> # Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
> # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
> # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Sunil Govindan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-8346:
-
Affects Version/s: 3.1.0
   3.0.2

> Upgrading to 3.1 kills running containers with error "Opportunistic container 
> queue is full"
> 
>
> Key: YARN-8346
> URL: https://issues.apache.org/jira/browse/YARN-8346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0, 3.0.2
>Reporter: Rohith Sharma K S
>Priority: Blocker
>
> It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the 
> running containers are killed and second attempt is launched for that 
> application. The diagnostics message is "Opportunistic container queue is 
> full" which is the reason for container killed. 
> In NM log, I see below logs for after container is recovered.
> {noformat}
> 2018-05-23 17:18:50,655 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
>  Opportunistic container [container_e06_1527075664705_0001_01_01] will 
> not be queued at the NMsince max queue length [0] has been reached
> {noformat}
> Following steps are executed for rolling upgrade
> # Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
> # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
> # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Sunil Govindan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-8346:
-
Target Version/s: 3.1.1, 3.0.3

> Upgrading to 3.1 kills running containers with error "Opportunistic container 
> queue is full"
> 
>
> Key: YARN-8346
> URL: https://issues.apache.org/jira/browse/YARN-8346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0, 3.0.2
>Reporter: Rohith Sharma K S
>Priority: Blocker
>
> It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the 
> running containers are killed and second attempt is launched for that 
> application. The diagnostics message is "Opportunistic container queue is 
> full" which is the reason for container killed. 
> In NM log, I see below logs for after container is recovered.
> {noformat}
> 2018-05-23 17:18:50,655 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
>  Opportunistic container [container_e06_1527075664705_0001_01_01] will 
> not be queued at the NMsince max queue length [0] has been reached
> {noformat}
> Following steps are executed for rolling upgrade
> # Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
> # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
> # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8285) Remove unused environment variables from the Docker runtime

2018-05-23 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487323#comment-16487323
 ] 

Eric Badger commented on YARN-8285:
---

Thanks, [~shaneku...@gmail.com]!

> Remove unused environment variables from the Docker runtime
> ---
>
> Key: YARN-8285
> URL: https://issues.apache.org/jira/browse/YARN-8285
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Eric Badger
>Priority: Trivial
>  Labels: Docker
> Fix For: 3.2.0
>
> Attachments: YARN-8285.001.patch
>
>
> YARN-7430 enabled user remapping for Docker containers by default. As a 
> result, YARN_CONTAINER_RUNTIME_DOCKER_RUN_ENABLE_USER_REMAPPING is no longer 
> used and can be removed.
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE_FILE was added in the original 
> implementation, but was never used and can be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8319) More YARN pages need to honor yarn.resourcemanager.display.per-user-apps

2018-05-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487353#comment-16487353
 ] 

genericqa commented on YARN-8319:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
27s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 25s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 302 unchanged - 0 fixed = 303 total (was 302) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
50s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
46s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
15s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 20s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
11s{color} | {color:green} hadoop-yarn-server-timelineservice in the patch 
passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m  7s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
48s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}200m 15s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestContainerManager |
|   | 
hadoop.yarn.server.reso

[jira] [Commented] (YARN-8319) More YARN pages need to honor yarn.resourcemanager.display.per-user-apps

2018-05-23 Thread Sunil Govindan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487446#comment-16487446
 ] 

Sunil Govindan commented on YARN-8319:
--

Test failures are not related. [~rohithsharma] could u pls check

> More YARN pages need to honor yarn.resourcemanager.display.per-user-apps
> 
>
> Key: YARN-8319
> URL: https://issues.apache.org/jira/browse/YARN-8319
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-8319.001.patch, YARN-8319.002.patch, 
> YARN-8319.003.patch
>
>
> When this config is on
>  - Per queue page on UI2 should filter app list by user
>  -- TODO: Verify the same with UI1 Per-queue page
>  - ATSv2 with UI2 should filter list of all users' flows and flow activities
>  - Per Node pages
>  -- Listing of apps and containers on a per-node basis should filter apps and 
> containers by user.
> To this end, because this is no longer just for resourcemanager, we should 
> also deprecate {{yarn.resourcemanager.display.per-user-apps}} in favor of 
> {{yarn.webapp.filter-app-list-by-user}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8191) Fair scheduler: queue deletion without RM restart

2018-05-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487490#comment-16487490
 ] 

genericqa commented on YARN-8191:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 37s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
21s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 48s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}127m 43s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.removeEmptyIncompatibleQueues(String,
 FSQueueType) has Boolean return type and returns explicit null  At 
QueueManager.java:type and returns explicit null  At QueueManager.java:[line 
401] |
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8191 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924749/YARN-8191.015.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9292b981d335 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f61e3e7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_

[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487535#comment-16487535
 ] 

Eric Yang commented on YARN-8346:
-

The queue length is based on 
yarn.nodemanager.opportunistic-containers-max-queue-length setting.  If 
yarn-site.xml does not specify this, it will have size of 0.  This seems like a 
bad default value that will cause rolling upgrade to fail.  I think a default 
value of 4 to 8 is probably sensible.  Number of CPU cores and queue length are 
some what related, it would be good for external upgrading system like Ambari 
to set queue length accordingly.

> Upgrading to 3.1 kills running containers with error "Opportunistic container 
> queue is full"
> 
>
> Key: YARN-8346
> URL: https://issues.apache.org/jira/browse/YARN-8346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0, 3.0.2
>Reporter: Rohith Sharma K S
>Priority: Blocker
>
> It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the 
> running containers are killed and second attempt is launched for that 
> application. The diagnostics message is "Opportunistic container queue is 
> full" which is the reason for container killed. 
> In NM log, I see below logs for after container is recovered.
> {noformat}
> 2018-05-23 17:18:50,655 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
>  Opportunistic container [container_e06_1527075664705_0001_01_01] will 
> not be queued at the NMsince max queue length [0] has been reached
> {noformat}
> Following steps are executed for rolling upgrade
> # Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
> # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
> # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487557#comment-16487557
 ] 

Jason Lowe commented on YARN-8346:
--

IIUC the issue isn't the queue length setting but rather that the containers 
are recovered with the wrong execution type (opportunistic instead of 
guaranteed).

I believe the bug is here in ContainerTokenIdentifier#getExecutionType:
{code}
  public ExecutionType getExecutionType(){
if (!proto.hasExecutionType()) {
  return null;
}
return convertFromProtoFormat(proto.getExecutionType());
  }
{code}

Instead of returning NULL for the execution type it should return GUARANTEED.  
All containers before an execution type was added were effectively guaranteed 
since that was the only execution type supported.


> Upgrading to 3.1 kills running containers with error "Opportunistic container 
> queue is full"
> 
>
> Key: YARN-8346
> URL: https://issues.apache.org/jira/browse/YARN-8346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0, 3.0.2
>Reporter: Rohith Sharma K S
>Priority: Blocker
>
> It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the 
> running containers are killed and second attempt is launched for that 
> application. The diagnostics message is "Opportunistic container queue is 
> full" which is the reason for container killed. 
> In NM log, I see below logs for after container is recovered.
> {noformat}
> 2018-05-23 17:18:50,655 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
>  Opportunistic container [container_e06_1527075664705_0001_01_01] will 
> not be queued at the NMsince max queue length [0] has been reached
> {noformat}
> Following steps are executed for rolling upgrade
> # Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
> # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
> # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8191) Fair scheduler: queue deletion without RM restart

2018-05-23 Thread Gergo Repas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487573#comment-16487573
 ] 

Gergo Repas commented on YARN-8191:
---

Thanks [~haibochen] for the review!

{quote}
In this patch,  getRemovedStaticQueues() and QueueManager.setQueuesToDynamic() 
together identify the queues that were added in the previous allocation file, 
but are now removed in the new allocation file, and then mark them as dynamic. 
What I mean is that, is it possible that some queues that were created 
dynamically, but are now included in the new allocation file? If so, we need to 
mark them as static.
{quote}

This is done in {{QueueManager.ensureQueueExistsAndIsCompatibleAndIsStatic()}} 
method, it's been taken care of by the {{queue.setDynamic(false);}} line.

{quote}
The behavior of  QueueManager.removeLeafQueue() is still changed with the 
refactoring.  Previously it would return true if there is no incompatible queue 
found, but it now returns false.  We should also return true if 
removeEmptyIncompatibleQueues(name, FSQueueType.PARENT) returns null.  
Similarly, in IncompatibleQueueRemovalTask.execute(), the task shall be removed 
if `removed == null`.
{quote}

Thanks, I've corrected these conditions.

{quote}
Let's add some javadoc to newly added QueueManager public methods.
{quote}

Sure, I added them.

{quote}
`reloadListener.onCheck();` makes me worried what if the listener is not set. 
Looking closely at the code, the setReloadListener() is always set right after 
the AllocationFileLoaderService constructor, so I think we can move 
reloadListener as a construnctor argument, so that we never worry if listener 
is null.{quote}

The Listener is now a constructor argument in the latest patch.

> Fair scheduler: queue deletion without RM restart
> -
>
> Key: YARN-8191
> URL: https://issues.apache.org/jira/browse/YARN-8191
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.1
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: Queue Deletion in Fair Scheduler.pdf, 
> YARN-8191.000.patch, YARN-8191.001.patch, YARN-8191.002.patch, 
> YARN-8191.003.patch, YARN-8191.004.patch, YARN-8191.005.patch, 
> YARN-8191.006.patch, YARN-8191.007.patch, YARN-8191.008.patch, 
> YARN-8191.009.patch, YARN-8191.010.patch, YARN-8191.011.patch, 
> YARN-8191.012.patch, YARN-8191.013.patch, YARN-8191.014.patch, 
> YARN-8191.015.patch
>
>
> The Fair Scheduler never cleans up queues even if they are deleted in the 
> allocation file, or were dynamically created and are never going to be used 
> again. Queues always remain in memory which leads to two following issues.
>  # Steady fairshares aren’t calculated correctly due to remaining queues
>  # WebUI shows deleted queues, which is confusing for users (YARN-4022).
> We want to support proper queue deletion without restarting the Resource 
> Manager:
>  # Static queues without any entries that are removed from fair-scheduler.xml 
> should be deleted from memory.
>  # Dynamic queues without any entries should be deleted.
>  # RM Web UI should only show the queues defined in the scheduler at that 
> point in time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487593#comment-16487593
 ] 

Eric Yang commented on YARN-8346:
-

[~jlowe] Yes, you are right.  Existing workload supposed to have execution type 
GUARANTEED.

> Upgrading to 3.1 kills running containers with error "Opportunistic container 
> queue is full"
> 
>
> Key: YARN-8346
> URL: https://issues.apache.org/jira/browse/YARN-8346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0, 3.0.2
>Reporter: Rohith Sharma K S
>Priority: Blocker
>
> It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the 
> running containers are killed and second attempt is launched for that 
> application. The diagnostics message is "Opportunistic container queue is 
> full" which is the reason for container killed. 
> In NM log, I see below logs for after container is recovered.
> {noformat}
> 2018-05-23 17:18:50,655 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
>  Opportunistic container [container_e06_1527075664705_0001_01_01] will 
> not be queued at the NMsince max queue length [0] has been reached
> {noformat}
> Following steps are executed for rolling upgrade
> # Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
> # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
> # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8320) Support CPU isolation for latency-sensitive (LS) service

2018-05-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487596#comment-16487596
 ] 

Wangda Tan commented on YARN-8320:
--

Thanks [~cheersyang] / [~yangjiandan] for the detailed design, very helpful to 
understand the contexts. 

I took a quick look at the proposal, a couple of questions / comments: 

1) It seems that the #vcore must be divisible by #physical-core, otherwise it 
will cause rounding issue and containers will get less/more than requested 
resources. If admin enable the feature, YARN should take care of checking this 
value before starting NM.

2) I'm still trying to understand benefit of RESERVED / SHARED mode. If a 
RESERVED core can be used by ANY container, in my mind, the RESERVED container 
can be affected by adhoc ANY container. And similarly, if we allow SHARED 
containers bind to same set of cores, considering SHARED containers are running 
LS services and CPU-intensive, they could compute a lot on these SHARED 
containers. Which could lead to even worse latency and more competitions.

3) Relationship to other features:
- Related to NUMA allocation on YARN (YARN-5764), to me the two features are 
related to each other: Allocate reserved cores to a same process on the same or 
closest NUMA zone(s) has the best performance, but satisfy one condition can 
break the other one. Should be very careful to make sure the two features can 
work together.

- Related to GPU allocation on YARN: On one machine, GPU performance is 
sensitive to topology of GPUs. Communication latency and bandwidth differs a 
lot when GPUs are connected by NVLink, PCI-E, etc. It might be valuable to 
think about is it possible to have a same framework on the same NM to do 
resource-specific scheduling and placement.

- Related to ResourcePlugin framework: We added ResourcePlugin framework since 
YARN-7224, and now GPU/FPGA are using the framework to implement the feature. 
I'm not sure if this feature can benefit from the ResourcePlugin framework, or 
some refactoring required to the framework. It's better if we can extract 
common part and workflow out.

4) To me only privileged users and applications can request non-ANY CPU mode, 
how can we enforce this (maybe not in phase#1, but we need a plan here).

> Support CPU isolation for latency-sensitive (LS) service
> 
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8334) Fix potential connection leak in GPGUtils

2018-05-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487606#comment-16487606
 ] 

Íñigo Goiri commented on YARN-8334:
---

The approach in  [^YARN-8334-YARN-7402.v2.patch] makes sense to me.
I'm not sure if there is a way to trigger a warning when this is not closed.
[~giovanni.fumarola], is there any related unit test that goes through this 
code?


> Fix potential connection leak in GPGUtils
> -
>
> Key: YARN-8334
> URL: https://issues.apache.org/jira/browse/YARN-8334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
> Attachments: YARN-8334-YARN-7402.v1.patch, 
> YARN-8334-YARN-7402.v2.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8336) Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils

2018-05-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487610#comment-16487610
 ] 

Íñigo Goiri commented on YARN-8336:
---

Same fix as YARN-8334; LGTM.
Is there any unit test covering this code path?

The only concern is the inconsistency to check the OK status between these two 
pieces of code.
We may want to open a JIRA to make the WS management consistent in the YARN 
Router.


> Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils
> -
>
> Key: YARN-8336
> URL: https://issues.apache.org/jira/browse/YARN-8336
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8336.v1.patch, YARN-8336.v2.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8191) Fair scheduler: queue deletion without RM restart

2018-05-23 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487621#comment-16487621
 ] 

Haibo Chen commented on YARN-8191:
--

+1 pending the findbug fix. The TestAMRestart.testPreemptedAMRestartOnRMRestart 
failed in the last two runs, can you take a look [~grepas]?

I think we can fix the findbug issue by returning Optional instead.

> Fair scheduler: queue deletion without RM restart
> -
>
> Key: YARN-8191
> URL: https://issues.apache.org/jira/browse/YARN-8191
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.1
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: Queue Deletion in Fair Scheduler.pdf, 
> YARN-8191.000.patch, YARN-8191.001.patch, YARN-8191.002.patch, 
> YARN-8191.003.patch, YARN-8191.004.patch, YARN-8191.005.patch, 
> YARN-8191.006.patch, YARN-8191.007.patch, YARN-8191.008.patch, 
> YARN-8191.009.patch, YARN-8191.010.patch, YARN-8191.011.patch, 
> YARN-8191.012.patch, YARN-8191.013.patch, YARN-8191.014.patch, 
> YARN-8191.015.patch
>
>
> The Fair Scheduler never cleans up queues even if they are deleted in the 
> allocation file, or were dynamically created and are never going to be used 
> again. Queues always remain in memory which leads to two following issues.
>  # Steady fairshares aren’t calculated correctly due to remaining queues
>  # WebUI shows deleted queues, which is confusing for users (YARN-4022).
> We want to support proper queue deletion without restarting the Resource 
> Manager:
>  # Static queues without any entries that are removed from fair-scheduler.xml 
> should be deleted from memory.
>  # Dynamic queues without any entries should be deleted.
>  # RM Web UI should only show the queues defined in the scheduler at that 
> point in time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8341) Yarn Service: Integration tests

2018-05-23 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487636#comment-16487636
 ] 

Eric Yang commented on YARN-8341:
-

[~csingh] Thanks for the patch.  I think it will be good to create a separate 
profile, and the profile can be activated by properties.  This will be easier 
to show the required properties to run integration test.  The separated profile 
will compile integration test, and run the test cases against the target 
cluster.  For example:

{code}

  

  
rm.host
  


  

   ...
   integration-test
   ...

  

  

{code}

This enables Jenkins job to run the integration test with minimum effort on CLI 
settings:

{code}
mvn integration-test -Drm.host=localhost
{code}

User.name property is already predefined by JVM.  It would be good to use 
current user identity to launch to support both kerberos and non-kerberos tests.

> Yarn Service: Integration tests 
> 
>
> Key: YARN-8341
> URL: https://issues.apache.org/jira/browse/YARN-8341
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8341.wip.patch
>
>
> In order to test the rest api end-to-end, we can add Integration tests for 
> Yarn service api. 
> The integration tests 
> * belong to junit category {{IntegrationTest}}.
> * will be only run when triggered by executing {{mvn 
> failsafe:integration-test}}
> * the surefire plugin for regular tests excludes {{IntegrationTest}}
> * RM host, user name, and any additional properties which are needed to 
> execute the tests against a cluster can be passed as System properties.
> For eg. {{mvn failsafe:integration-test -Drm.host=localhost -Duser.name=root}}
> We can add more integration tests which can check scalability and performance.
> Have these tests here benefits everyone in the community because anyone can 
> run these tests against there cluster. 
> Attaching a work in progress patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8337) [FederationStateStore - MySql] Deadlock In addApplicationHome

2018-05-23 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8337:
---
Summary: [FederationStateStore - MySql] Deadlock In addApplicationHome  
(was: Deadlock In Federation Router)

> [FederationStateStore - MySql] Deadlock In addApplicationHome
> -
>
> Key: YARN-8337
> URL: https://issues.apache.org/jira/browse/YARN-8337
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, router
>Reporter: Jianchao Jia
>Priority: Major
> Attachments: YARN-8337.001.patch, YARN-8337.002.patch
>
>
> We use mysql innodb as the state store engine,in router log we found dead 
> lock error like below:
> {code:java}
> [2018-05-21T15:41:40.383+08:00] [ERROR] [IPC Server handler 25 on 8050] : 
> Unable to insert the newly generated application 
> application_1526295230627_127402
> com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock 
> found when trying to get lock; try restarting transaction
> at sun.reflect.GeneratedConstructorAccessor107.newInstance(Unknown 
> Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
> at com.mysql.jdbc.Util.getInstance(Util.java:408)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
> at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
> at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
> at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
> at 
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
> at 
> com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
> at 
> com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418)
> at 
> com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887)
> at 
> com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
> at 
> com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java)
> at 
> org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547)
> {code}
> Use "show engine innodb status;" command to find what happens 
> {code:java}
> 2018-05-21 15:41:40 7f4685870700
> *** (1) TRANSACTION:
> TRANSACTION 241131538, ACTIVE 0 sec inserting, thread declared inside InnoDB 
> 4999
> mysql tables in use 2, locked 2
> LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s)
> MySQL thread id 7602335, OS thread handle 0x7f46858f2700, query id 2919792534 
> 192.168.1.138 federation executing
> INSERT INTO applicationsHomeSubCluster
> (applicationId,homeSubCluster)
> (SELECT applicationId_IN, homeSubCluster_IN
> FROM applicationsHomeSubCluster
> WHERE applicationId = applicationId_IN
> HAVING COUNT(*) = 0 )
> *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
> RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table 
> `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131538 
> lock_mode X locks gap before rec insert intention waiting
> Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info 
> bits 0
> 0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; 
> asc application_1526295230627_1274; (total 31 bytes);
> 1: len 6; hex 0ba5f32d; asc -;;
> 2: len 7; hex dd00280110; asc ( ;;
> 3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;
> *** (2) TRANSACTION:
> TRANSACTION 241131539, ACTIVE 0 sec inserting, thread declared inside InnoDB 
> 4999
> mysql tables in use 2, locked 2
> 4 lock struct(s), heap size 1184, 2 row lock(s)
> MySQL thread id 7600638, OS thread handle 0x7f4685870700, query id 2919792535 
> 192.168.1.138 federation executing
> INSERT INTO applicationsHomeSubCluster
> (applicationId,homeSubCluster)
> (SELECT applicationId_IN, homeSubCluster_IN
> FROM applicationsHomeSubCluster
> WHERE applicationId = applicationId_IN
> HAVING COUNT(*) = 0 )
> *** (2) HOLDS THE LOCK(S):
> RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table 
> `guldan_federationstatestore`.`applications

[jira] [Updated] (YARN-8337) [FederationStateStore - MySql] Deadlock In addApplicationHome

2018-05-23 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8337:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-7402

> [FederationStateStore - MySql] Deadlock In addApplicationHome
> -
>
> Key: YARN-8337
> URL: https://issues.apache.org/jira/browse/YARN-8337
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Reporter: Jianchao Jia
>Priority: Major
> Attachments: YARN-8337.001.patch, YARN-8337.002.patch
>
>
> We use mysql innodb as the state store engine,in router log we found dead 
> lock error like below:
> {code:java}
> [2018-05-21T15:41:40.383+08:00] [ERROR] [IPC Server handler 25 on 8050] : 
> Unable to insert the newly generated application 
> application_1526295230627_127402
> com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock 
> found when trying to get lock; try restarting transaction
> at sun.reflect.GeneratedConstructorAccessor107.newInstance(Unknown 
> Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
> at com.mysql.jdbc.Util.getInstance(Util.java:408)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
> at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
> at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
> at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
> at 
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
> at 
> com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
> at 
> com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418)
> at 
> com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887)
> at 
> com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
> at 
> com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java)
> at 
> org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547)
> {code}
> Use "show engine innodb status;" command to find what happens 
> {code:java}
> 2018-05-21 15:41:40 7f4685870700
> *** (1) TRANSACTION:
> TRANSACTION 241131538, ACTIVE 0 sec inserting, thread declared inside InnoDB 
> 4999
> mysql tables in use 2, locked 2
> LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s)
> MySQL thread id 7602335, OS thread handle 0x7f46858f2700, query id 2919792534 
> 192.168.1.138 federation executing
> INSERT INTO applicationsHomeSubCluster
> (applicationId,homeSubCluster)
> (SELECT applicationId_IN, homeSubCluster_IN
> FROM applicationsHomeSubCluster
> WHERE applicationId = applicationId_IN
> HAVING COUNT(*) = 0 )
> *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
> RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table 
> `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131538 
> lock_mode X locks gap before rec insert intention waiting
> Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info 
> bits 0
> 0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; 
> asc application_1526295230627_1274; (total 31 bytes);
> 1: len 6; hex 0ba5f32d; asc -;;
> 2: len 7; hex dd00280110; asc ( ;;
> 3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;
> *** (2) TRANSACTION:
> TRANSACTION 241131539, ACTIVE 0 sec inserting, thread declared inside InnoDB 
> 4999
> mysql tables in use 2, locked 2
> 4 lock struct(s), heap size 1184, 2 row lock(s)
> MySQL thread id 7600638, OS thread handle 0x7f4685870700, query id 2919792535 
> 192.168.1.138 federation executing
> INSERT INTO applicationsHomeSubCluster
> (applicationId,homeSubCluster)
> (SELECT applicationId_IN, homeSubCluster_IN
> FROM applicationsHomeSubCluster
> WHERE applicationId = applicationId_IN
> HAVING COUNT(*) = 0 )
> *** (2) HOLDS THE LOCK(S):
> RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table 
> `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131539 
> lock mode 

[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487699#comment-16487699
 ] 

Yongjun Zhang commented on YARN-8346:
-

Hi Guys,

Thanks for reporting and working on the issue. I'm preparing 3.0.3 release, 
wonder if we can prioritize this one if we deem its a blocker?

Thanks,


> Upgrading to 3.1 kills running containers with error "Opportunistic container 
> queue is full"
> 
>
> Key: YARN-8346
> URL: https://issues.apache.org/jira/browse/YARN-8346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0, 3.0.2
>Reporter: Rohith Sharma K S
>Priority: Blocker
>
> It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the 
> running containers are killed and second attempt is launched for that 
> application. The diagnostics message is "Opportunistic container queue is 
> full" which is the reason for container killed. 
> In NM log, I see below logs for after container is recovered.
> {noformat}
> 2018-05-23 17:18:50,655 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
>  Opportunistic container [container_e06_1527075664705_0001_01_01] will 
> not be queued at the NMsince max queue length [0] has been reached
> {noformat}
> Following steps are executed for rolling upgrade
> # Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
> # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
> # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8338) TimelineService V1.5 doesn't come up after HADOOP-15406

2018-05-23 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487710#comment-16487710
 ] 

Haibo Chen commented on YARN-8338:
--

[~jlowe] [~vinodkv], I did not understand who else was depending on objenesis 
2.1, so wanted to avoid the accidental upgrade of objenesis. The upgrade of 
de.ruedigermoeller:fst would have upgraded objenesis from 2.1 to 2.5.1 at the 
time, without the exclusion of objenesis from fst. Based on [this fst 
comment|https://github.com/RuedigerMoeller/fast-serialization/blob/aceaad0075b2e1ef796597a1098aeb39fbea7fc9/pom.xml#L141],
 I assumed it was safe to do so, and there was not issue found in my test (as 
Jason noted, a dependency of mockito-all previously brought in objensis 
transitively).

[This 
comment|https://github.com/RuedigerMoeller/fast-serialization/blob/aceaad0075b2e1ef796597a1098aeb39fbea7fc9/pom.xml#L141]
 says that if android is not used, we don't need objenesis. It'd be nice if we 
can remove such runtime dependency from application history service, and let 
hadoop-aws to choose whichever verion of objenesis it needs.

> TimelineService V1.5 doesn't come up after HADOOP-15406
> ---
>
> Key: YARN-8338
> URL: https://issues.apache.org/jira/browse/YARN-8338
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: YARN-8338.txt
>
>
> TimelineService V1.5 fails with the following:
> {code}
> java.lang.NoClassDefFoundError: org/objenesis/Objenesis
>   at 
> org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.(RollingLevelDBTimelineStore.java:174)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8334) Fix potential connection leak in GPGUtils

2018-05-23 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487716#comment-16487716
 ] 

Giovanni Matteo Fumarola commented on YARN-8334:


Thanks [~elgoiri] and [~botong] for the review. I am not sure if there is a way 
to trigger a warning when it is not closed correctly. 

*TestPolicyGenerator* goes through the code.

> Fix potential connection leak in GPGUtils
> -
>
> Key: YARN-8334
> URL: https://issues.apache.org/jira/browse/YARN-8334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
> Attachments: YARN-8334-YARN-7402.v1.patch, 
> YARN-8334-YARN-7402.v2.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8336) Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils

2018-05-23 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487723#comment-16487723
 ] 

Giovanni Matteo Fumarola commented on YARN-8336:


Thanks [~elgoiri] for the review. *TestSchedConfCL* and *TestLogsCLI* go 
through the codes.

I will open a Jira to make WS management consistent in the entire codebase. 

> Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils
> -
>
> Key: YARN-8336
> URL: https://issues.apache.org/jira/browse/YARN-8336
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8336.v1.patch, YARN-8336.v2.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync on Windows

2018-05-23 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8344:
---
Summary: Missing nm.close() in TestNodeManagerResync to fix 
testKillContainersOnResync on Windows  (was: Missing nm.close() in 
TestNodeManagerResync to fix unit tests on Windows)

> Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync 
> on Windows
> 
>
> Key: YARN-8344
> URL: https://issues.apache.org/jira/browse/YARN-8344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8344.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8338) TimelineService V1.5 doesn't come up after HADOOP-15406

2018-05-23 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487738#comment-16487738
 ] 

Jason Lowe commented on YARN-8338:
--

bq.  This comment says that if android is not used, we don't need objenesis.

I think this JIRA proves that the comment is wrong.  There's no direct usage of 
objenesis in the Hadoop code base, but when we try to load fst classes in a 
static code block it fails trying to lookup objenesis classes.  That looks like 
objenesis classes are required for FST to work.  Maybe the methods of those 
classes aren't called when not running on android, but the classes need to be 
there.


> TimelineService V1.5 doesn't come up after HADOOP-15406
> ---
>
> Key: YARN-8338
> URL: https://issues.apache.org/jira/browse/YARN-8338
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: YARN-8338.txt
>
>
> TimelineService V1.5 fails with the following:
> {code}
> java.lang.NoClassDefFoundError: org/objenesis/Objenesis
>   at 
> org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.(RollingLevelDBTimelineStore.java:174)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487741#comment-16487741
 ] 

Jason Lowe commented on YARN-8346:
--

I should have a patch up later today.  I already verified the change I proposed 
above fixes the case that Rohith reported when I test it manually, just need to 
add a unit test.


> Upgrading to 3.1 kills running containers with error "Opportunistic container 
> queue is full"
> 
>
> Key: YARN-8346
> URL: https://issues.apache.org/jira/browse/YARN-8346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0, 3.0.2
>Reporter: Rohith Sharma K S
>Priority: Blocker
>
> It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the 
> running containers are killed and second attempt is launched for that 
> application. The diagnostics message is "Opportunistic container queue is 
> full" which is the reason for container killed. 
> In NM log, I see below logs for after container is recovered.
> {noformat}
> 2018-05-23 17:18:50,655 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
>  Opportunistic container [container_e06_1527075664705_0001_01_01] will 
> not be queued at the NMsince max queue length [0] has been reached
> {noformat}
> Following steps are executed for rolling upgrade
> # Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
> # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
> # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups

2018-05-23 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487742#comment-16487742
 ] 

Miklos Szegedi commented on YARN-4599:
--

The unit test issues are not related to the patch.

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Major
>  Labels: oct16-medium
> Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, 
> YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, 
> YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, 
> YARN-4599.007.patch, YARN-4599.008.patch, YARN-4599.009.patch, 
> YARN-4599.010.patch, YARN-4599.011.patch, YARN-4599.012.patch, 
> YARN-4599.013.patch, YARN-4599.014.patch, YARN-4599.015.patch, 
> YARN-4599.016.patch, YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups

2018-05-23 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487749#comment-16487749
 ] 

Haibo Chen commented on YARN-4599:
--

+1 on the latest patch. Will check it in later today if no objections

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Major
>  Labels: oct16-medium
> Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, 
> YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, 
> YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, 
> YARN-4599.007.patch, YARN-4599.008.patch, YARN-4599.009.patch, 
> YARN-4599.010.patch, YARN-4599.011.patch, YARN-4599.012.patch, 
> YARN-4599.013.patch, YARN-4599.014.patch, YARN-4599.015.patch, 
> YARN-4599.016.patch, YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned YARN-8346:


Assignee: Jason Lowe

> Upgrading to 3.1 kills running containers with error "Opportunistic container 
> queue is full"
> 
>
> Key: YARN-8346
> URL: https://issues.apache.org/jira/browse/YARN-8346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0, 3.0.2
>Reporter: Rohith Sharma K S
>Assignee: Jason Lowe
>Priority: Blocker
>
> It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the 
> running containers are killed and second attempt is launched for that 
> application. The diagnostics message is "Opportunistic container queue is 
> full" which is the reason for container killed. 
> In NM log, I see below logs for after container is recovered.
> {noformat}
> 2018-05-23 17:18:50,655 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
>  Opportunistic container [container_e06_1527075664705_0001_01_01] will 
> not be queued at the NMsince max queue length [0] has been reached
> {noformat}
> Following steps are executed for rolling upgrade
> # Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
> # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
> # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync on Windows

2018-05-23 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-8344:
--
Description: Missing nm.close() in TestNodeManagerResync to fix 
testKillContainersOnResync on Windows.

> Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync 
> on Windows
> 
>
> Key: YARN-8344
> URL: https://issues.apache.org/jira/browse/YARN-8344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8344.v1.patch
>
>
> Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync 
> on Windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync on Windows

2018-05-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487762#comment-16487762
 ] 

Íñigo Goiri commented on YARN-8344:
---

Why does this fail on Windows specifically?

> Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync 
> on Windows
> 
>
> Key: YARN-8344
> URL: https://issues.apache.org/jira/browse/YARN-8344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8344.v1.patch
>
>
> Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync 
> on Windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync on Windows

2018-05-23 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8344:
---
Attachment: YARN-8344.v2.patch

> Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync 
> on Windows
> 
>
> Key: YARN-8344
> URL: https://issues.apache.org/jira/browse/YARN-8344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8344.v1.patch, YARN-8344.v2.patch
>
>
> Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync 
> on Windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8334) Fix potential connection leak in GPGUtils

2018-05-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487766#comment-16487766
 ] 

Íñigo Goiri commented on YARN-8334:
---

The TestPolicyGenerator unit test runs 
[here|https://builds.apache.org/job/PreCommit-YARN-Build/20833/testReport/org.apache.hadoop.yarn.server.globalpolicygenerator.policygenerator/TestPolicyGenerator/].
+1
Committing.

> Fix potential connection leak in GPGUtils
> -
>
> Key: YARN-8334
> URL: https://issues.apache.org/jira/browse/YARN-8334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
> Attachments: YARN-8334-YARN-7402.v1.patch, 
> YARN-8334-YARN-7402.v2.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8334) Fix potential connection leak in GPGUtils

2018-05-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487766#comment-16487766
 ] 

Íñigo Goiri edited comment on YARN-8334 at 5/23/18 6:06 PM:


The TestPolicyGenerator unit test runs 
[here|https://builds.apache.org/job/PreCommit-YARN-Build/20833/testReport/org.apache.hadoop.yarn.server.globalpolicygenerator.policygenerator/TestPolicyGenerator/].
+1
Feel free to commit to the branch.


was (Author: elgoiri):
The TestPolicyGenerator unit test runs 
[here|https://builds.apache.org/job/PreCommit-YARN-Build/20833/testReport/org.apache.hadoop.yarn.server.globalpolicygenerator.policygenerator/TestPolicyGenerator/].
+1
Committing.

> Fix potential connection leak in GPGUtils
> -
>
> Key: YARN-8334
> URL: https://issues.apache.org/jira/browse/YARN-8334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
> Attachments: YARN-8334-YARN-7402.v1.patch, 
> YARN-8334-YARN-7402.v2.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8336) Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils

2018-05-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487775#comment-16487775
 ] 

Íñigo Goiri commented on YARN-8336:
---

Both 
[TestLogsCLI|https://builds.apache.org/job/PreCommit-YARN-Build/20832/testReport/org.apache.hadoop.yarn.client.cli/TestLogsCLI/]
 and 
[TestSchedConfCLI|https://builds.apache.org/job/PreCommit-YARN-Build/20832/testReport/org.apache.hadoop.yarn.client.cli/TestSchedConfCLI/]
 pass.
+1
Feel free to commit.

> Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils
> -
>
> Key: YARN-8336
> URL: https://issues.apache.org/jira/browse/YARN-8336
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8336.v1.patch, YARN-8336.v2.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync on Windows

2018-05-23 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487778#comment-16487778
 ] 

Giovanni Matteo Fumarola commented on YARN-8344:


Attached v2 with the fix for Check style warning.

If any test in this class fails all the other tests will fail (same behavior in 
Windows or Linux).

testContainerResourceIncreaseIsSynchronizedWithRMResync fails in Windows - 
still figuring out the root cause. This patch will fix 
testKillContainersOnResync

> Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync 
> on Windows
> 
>
> Key: YARN-8344
> URL: https://issues.apache.org/jira/browse/YARN-8344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8344.v1.patch, YARN-8344.v2.patch
>
>
> Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync 
> on Windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-8346:
-
Attachment: YARN-8346.001.patch

> Upgrading to 3.1 kills running containers with error "Opportunistic container 
> queue is full"
> 
>
> Key: YARN-8346
> URL: https://issues.apache.org/jira/browse/YARN-8346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0, 3.0.2
>Reporter: Rohith Sharma K S
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: YARN-8346.001.patch
>
>
> It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the 
> running containers are killed and second attempt is launched for that 
> application. The diagnostics message is "Opportunistic container queue is 
> full" which is the reason for container killed. 
> In NM log, I see below logs for after container is recovered.
> {noformat}
> 2018-05-23 17:18:50,655 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
>  Opportunistic container [container_e06_1527075664705_0001_01_01] will 
> not be queued at the NMsince max queue length [0] has been reached
> {noformat}
> Following steps are executed for rolling upgrade
> # Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
> # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
> # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync

2018-05-23 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8344:
---
Summary: Missing nm.close() in TestNodeManagerResync to fix 
testKillContainersOnResync  (was: Missing nm.close() in TestNodeManagerResync 
to fix testKillContainersOnResync on Windows)

> Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync
> -
>
> Key: YARN-8344
> URL: https://issues.apache.org/jira/browse/YARN-8344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8344.v1.patch, YARN-8344.v2.patch
>
>
> Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync 
> on Windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync

2018-05-23 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487778#comment-16487778
 ] 

Giovanni Matteo Fumarola edited comment on YARN-8344 at 5/23/18 6:22 PM:
-

Attached v2 with the fix for Check style warning.

If any test in this class fails all the other tests will fail (same behavior in 
Windows or Linux).

testContainerResourceIncreaseIsSynchronizedWithRMResync fails in Windows - due 
the length of log directory. This patch will fix testKillContainersOnResync

java.io.IOException: Cannot launch container using script at path 
F:/short/hadoop-trunk-win/s/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync/nm0/usercache/nobody/appcache/application_0_/container_0__01_00/default_container_executor.cmd,
 because it exceeds the maximum supported path length of 260 characters. 
Consider configuring shorter directories in yarn.nodemanager.local-dirs.

I saw a bunch of tests failing in windows for this reason. I will open a Jira 
to track this fix.


was (Author: giovanni.fumarola):
Attached v2 with the fix for Check style warning.

If any test in this class fails all the other tests will fail (same behavior in 
Windows or Linux).

testContainerResourceIncreaseIsSynchronizedWithRMResync fails in Windows - 
still figuring out the root cause. This patch will fix 
testKillContainersOnResync

> Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync
> -
>
> Key: YARN-8344
> URL: https://issues.apache.org/jira/browse/YARN-8344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8344.v1.patch, YARN-8344.v2.patch
>
>
> Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync 
> on Windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Konstantinos Karanasos (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487803#comment-16487803
 ] 

Konstantinos Karanasos commented on YARN-8346:
--

Thanks for the patch, [~jlowe].

Indeed you are right –  the problem is the lack of execution type. The queue 
size should remain 0 given that opportunistic containers are disabled in this 
case.

+1 for the patch.

> Upgrading to 3.1 kills running containers with error "Opportunistic container 
> queue is full"
> 
>
> Key: YARN-8346
> URL: https://issues.apache.org/jira/browse/YARN-8346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0, 3.0.2
>Reporter: Rohith Sharma K S
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: YARN-8346.001.patch
>
>
> It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the 
> running containers are killed and second attempt is launched for that 
> application. The diagnostics message is "Opportunistic container queue is 
> full" which is the reason for container killed. 
> In NM log, I see below logs for after container is recovered.
> {noformat}
> 2018-05-23 17:18:50,655 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
>  Opportunistic container [container_e06_1527075664705_0001_01_01] will 
> not be queued at the NMsince max queue length [0] has been reached
> {noformat}
> Following steps are executed for rolling upgrade
> # Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
> # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
> # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8108) RM metrics rest API throws GSSException in kerberized environment

2018-05-23 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487806#comment-16487806
 ] 

Yongjun Zhang commented on YARN-8108:
-

Hi [~eyang],

It seems the issue also exists in 3.0.2 release. The above discussion indicates 
that it might take some time for the solution to converge, should we move 3.0.3 
out of the target release and list this jira as a known issue for 3.0.3? or we 
should fix this issue in 3.0.3?

Thanks.


> RM metrics rest API throws GSSException in kerberized environment
> -
>
> Key: YARN-8108
> URL: https://issues.apache.org/jira/browse/YARN-8108
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Kshitij Badani
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-8108.001.patch
>
>
> Test is trying to pull up metrics data from SHS after kiniting as 'test_user'
> It is throwing GSSException as follows
> {code:java}
> b2b460b80713|RUNNING: curl --silent -k -X GET -D 
> /hwqe/hadoopqe/artifacts/tmp-94845 --negotiate -u : 
> http://rm_host:8088/proxy/application_1518674952153_0070/metrics/json2018-02-15
>  07:15:48,757|INFO|MainThread|machine.py:194 - 
> run()||GUID=fc5a3266-28f8-4eed-bae2-b2b460b80713|Exit Code: 0
> 2018-02-15 07:15:48,758|INFO|MainThread|spark.py:1757 - 
> getMetricsJsonData()|metrics:
> 
> 
> 
> Error 403 GSSException: Failure unspecified at GSS-API level 
> (Mechanism level: Request is a replay (34))
> 
> HTTP ERROR 403
> Problem accessing /proxy/application_1518674952153_0070/metrics/json. 
> Reason:
>  GSSException: Failure unspecified at GSS-API level (Mechanism level: 
> Request is a replay (34))
> 
> 
> {code}
> Rootcausing : proxyserver on RM can't be supported for Kerberos enabled 
> cluster because AuthenticationFilter is applied twice in Hadoop code (once in 
> httpServer2 for RM, and another instance from AmFilterInitializer for proxy 
> server). This will require code changes to hadoop-yarn-server-web-proxy 
> project



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487810#comment-16487810
 ] 

Yongjun Zhang commented on YARN-8346:
-

Thanks a lot for the quick turnaround [~jlowe] and [~kkaranasos].


> Upgrading to 3.1 kills running containers with error "Opportunistic container 
> queue is full"
> 
>
> Key: YARN-8346
> URL: https://issues.apache.org/jira/browse/YARN-8346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0, 3.0.2
>Reporter: Rohith Sharma K S
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: YARN-8346.001.patch
>
>
> It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the 
> running containers are killed and second attempt is launched for that 
> application. The diagnostics message is "Opportunistic container queue is 
> full" which is the reason for container killed. 
> In NM log, I see below logs for after container is recovered.
> {noformat}
> 2018-05-23 17:18:50,655 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
>  Opportunistic container [container_e06_1527075664705_0001_01_01] will 
> not be queued at the NMsince max queue length [0] has been reached
> {noformat}
> Following steps are executed for rolling upgrade
> # Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
> # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
> # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8108) RM metrics rest API throws GSSException in kerberized environment

2018-05-23 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487825#comment-16487825
 ] 

Eric Yang commented on YARN-8108:
-

[~yzhangal] My preference is to fix this in 3.0.3 release.  If consensus is not 
reached, release manager can push this out of 3.0.3 release, and release note 
this as an known issue.  I am fine with the plan.

> RM metrics rest API throws GSSException in kerberized environment
> -
>
> Key: YARN-8108
> URL: https://issues.apache.org/jira/browse/YARN-8108
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Kshitij Badani
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-8108.001.patch
>
>
> Test is trying to pull up metrics data from SHS after kiniting as 'test_user'
> It is throwing GSSException as follows
> {code:java}
> b2b460b80713|RUNNING: curl --silent -k -X GET -D 
> /hwqe/hadoopqe/artifacts/tmp-94845 --negotiate -u : 
> http://rm_host:8088/proxy/application_1518674952153_0070/metrics/json2018-02-15
>  07:15:48,757|INFO|MainThread|machine.py:194 - 
> run()||GUID=fc5a3266-28f8-4eed-bae2-b2b460b80713|Exit Code: 0
> 2018-02-15 07:15:48,758|INFO|MainThread|spark.py:1757 - 
> getMetricsJsonData()|metrics:
> 
> 
> 
> Error 403 GSSException: Failure unspecified at GSS-API level 
> (Mechanism level: Request is a replay (34))
> 
> HTTP ERROR 403
> Problem accessing /proxy/application_1518674952153_0070/metrics/json. 
> Reason:
>  GSSException: Failure unspecified at GSS-API level (Mechanism level: 
> Request is a replay (34))
> 
> 
> {code}
> Rootcausing : proxyserver on RM can't be supported for Kerberos enabled 
> cluster because AuthenticationFilter is applied twice in Hadoop code (once in 
> httpServer2 for RM, and another instance from AmFilterInitializer for proxy 
> server). This will require code changes to hadoop-yarn-server-web-proxy 
> project



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8348) Incorrect and missing AfterClass in HBase-tests

2018-05-23 Thread Giovanni Matteo Fumarola (JIRA)
Giovanni Matteo Fumarola created YARN-8348:
--

 Summary: Incorrect and missing AfterClass in HBase-tests
 Key: YARN-8348
 URL: https://issues.apache.org/jira/browse/YARN-8348
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Giovanni Matteo Fumarola






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8336) Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils

2018-05-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487775#comment-16487775
 ] 

Íñigo Goiri edited comment on YARN-8336 at 5/23/18 6:54 PM:


Both 
[TestLogsCLI|https://builds.apache.org/job/PreCommit-YARN-Build/20832/testReport/org.apache.hadoop.yarn.client.cli/TestLogsCLI/]
 and 
[TestSchedConfCLI|https://builds.apache.org/job/PreCommit-YARN-Build/20832/testReport/org.apache.hadoop.yarn.client.cli/TestSchedConfCLI/]
 pass.
+1
Committing to trunk.


was (Author: elgoiri):
Both 
[TestLogsCLI|https://builds.apache.org/job/PreCommit-YARN-Build/20832/testReport/org.apache.hadoop.yarn.client.cli/TestLogsCLI/]
 and 
[TestSchedConfCLI|https://builds.apache.org/job/PreCommit-YARN-Build/20832/testReport/org.apache.hadoop.yarn.client.cli/TestSchedConfCLI/]
 pass.
+1
Feel free to commit.

> Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils
> -
>
> Key: YARN-8336
> URL: https://issues.apache.org/jira/browse/YARN-8336
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8336.v1.patch, YARN-8336.v2.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8348) Incorrect and missing AfterClass in HBase-tests

2018-05-23 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8348:
---
Attachment: YARN-8348.v1.patch

> Incorrect and missing AfterClass in HBase-tests
> ---
>
> Key: YARN-8348
> URL: https://issues.apache.org/jira/browse/YARN-8348
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8348.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8348) Incorrect and missing AfterClass in HBase-tests

2018-05-23 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola reassigned YARN-8348:
--

Assignee: Giovanni Matteo Fumarola

> Incorrect and missing AfterClass in HBase-tests
> ---
>
> Key: YARN-8348
> URL: https://issues.apache.org/jira/browse/YARN-8348
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8348.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8348) Incorrect and missing AfterClass in HBase-tests

2018-05-23 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487848#comment-16487848
 ] 

Giovanni Matteo Fumarola commented on YARN-8348:


Before my patch:
[ERROR] Errors: 
[ERROR] 
TestTimelineReaderWebServicesHBaseStorage.setupBeforeClass:79->AbstractTimelineReaderHBaseTestBase.setup:60
 » NoClassDefFound
[ERROR] 
org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageApps.org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageApps
[ERROR] Run 1: TestHBaseTimelineStorageApps.setupBeforeClass:97 » 
NoClassDefFound org/apache/...
[ERROR] Run 2: TestHBaseTimelineStorageApps.tearDownAfterClass:1939 NullPointer
[INFO] 
[ERROR] TestHBaseTimelineStorageDomain.setupBeforeClass:51 » NoClassDefFound 
org/apach...
[ERROR] 
org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageEntities.org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageEntities
[ERROR] Run 1: TestHBaseTimelineStorageEntities.setupBeforeClass:110 » 
NoClassDefFound org/ap...
[ERROR] Run 2: TestHBaseTimelineStorageEntities.tearDownAfterClass:1882 
NullPointer
[INFO] 
[ERROR] TestHBaseTimelineStorageSchema.setupBeforeClass:49 » NoClassDefFound 
org/apach...
[ERROR] 
org.apache.hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowActivity.org.apache.hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowActivity
[ERROR] Run 1: TestHBaseStorageFlowActivity.setupBeforeClass:71 » 
NoClassDefFound org/apache/...
[ERROR] Run 2: TestHBaseStorageFlowActivity.tearDownAfterClass:495 NullPointer
[INFO] 
[ERROR] 
org.apache.hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRun.org.apache.hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRun
[ERROR] Run 1: TestHBaseStorageFlowRun.setupBeforeClass:83 » NoClassDefFound 
org/apache/hadoo...
[ERROR] Run 2: TestHBaseStorageFlowRun.tearDownAfterClass:1078 NullPointer
[INFO] 
[ERROR] 
org.apache.hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRunCompaction.org.apache.hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRunCompaction
[ERROR] Run 1: TestHBaseStorageFlowRunCompaction.setupBeforeClass:82 » 
NoClassDefFound org/ap...
[ERROR] Run 2: TestHBaseStorageFlowRunCompaction.tearDownAfterClass:853 
NullPointer

After my patch:
[ERROR] Errors: 
[ERROR] 
TestTimelineReaderWebServicesHBaseStorage.setupBeforeClass:79->AbstractTimelineReaderHBaseTestBase.setup:60
 » NoClassDefFound
[ERROR] TestHBaseTimelineStorageApps.setupBeforeClass:97 » NoClassDefFound 
org/apache/...
[ERROR] TestHBaseTimelineStorageDomain.setupBeforeClass:52 » NoClassDefFound 
org/apach...
[ERROR] TestHBaseTimelineStorageEntities.setupBeforeClass:110 » NoClassDefFound 
org/ap...
[ERROR] TestHBaseTimelineStorageSchema.setupBeforeClass:50 » NoClassDefFound 
org/apach...
[ERROR] TestHBaseStorageFlowActivity.setupBeforeClass:71 » NoClassDefFound 
org/apache/...
[ERROR] TestHBaseStorageFlowRun.setupBeforeClass:83 » NoClassDefFound 
org/apache/hadoo...
[ERROR] TestHBaseStorageFlowRunCompaction.setupBeforeClass:82 » NoClassDefFound 
org/ap...

> Incorrect and missing AfterClass in HBase-tests
> ---
>
> Key: YARN-8348
> URL: https://issues.apache.org/jira/browse/YARN-8348
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8348.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8348) Incorrect and missing AfterClass in HBase-tests

2018-05-23 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8348:
---
Description: 
HBase tests are failing in 
[linux|https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/789/testReport/]
 for 2 reasons: 
* incorrect 

> Incorrect and missing AfterClass in HBase-tests
> ---
>
> Key: YARN-8348
> URL: https://issues.apache.org/jira/browse/YARN-8348
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8348.v1.patch
>
>
> HBase tests are failing in 
> [linux|https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/789/testReport/]
>  for 2 reasons: 
> * incorrect 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8348) Incorrect and missing AfterClass in HBase-tests

2018-05-23 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8348:
---
Description: 
HBase tests are failing in 
[linux|https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/789/testReport/]
 for 2 reasons: 
 * incorrect afterClass;
 * not defined KeyProviderTokenIssuer.

While in windows are failing for the previous 2 reasons plus * missing 
afterClass.

This Jira tracks the effort to fix part of HBase-tests and reduces the failed 
tests in Linux.

  was:
HBase tests are failing in 
[linux|https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/789/testReport/]
 for 2 reasons: 
* incorrect 


> Incorrect and missing AfterClass in HBase-tests
> ---
>
> Key: YARN-8348
> URL: https://issues.apache.org/jira/browse/YARN-8348
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8348.v1.patch
>
>
> HBase tests are failing in 
> [linux|https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/789/testReport/]
>  for 2 reasons: 
>  * incorrect afterClass;
>  * not defined KeyProviderTokenIssuer.
> While in windows are failing for the previous 2 reasons plus * missing 
> afterClass.
> This Jira tracks the effort to fix part of HBase-tests and reduces the failed 
> tests in Linux.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8348) Incorrect and missing AfterClass in HBase-tests

2018-05-23 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487861#comment-16487861
 ] 

Giovanni Matteo Fumarola commented on YARN-8348:


[^YARN-8348.v1.patch] will bring test failed from 21 to 16.

[Link to failed 
tests|https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/789/testReport/]

> Incorrect and missing AfterClass in HBase-tests
> ---
>
> Key: YARN-8348
> URL: https://issues.apache.org/jira/browse/YARN-8348
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8348.v1.patch
>
>
> HBase tests are failing in 
> [linux|https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/789/testReport/]
>  for 2 reasons: 
>  * incorrect afterClass;
>  * not defined KeyProviderTokenIssuer.
> While in windows are failing for the previous 2 reasons plus * missing 
> afterClass.
> This Jira tracks the effort to fix part of HBase-tests and reduces the failed 
> tests in Linux.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487887#comment-16487887
 ] 

genericqa commented on YARN-8346:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
6s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 61m 51s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8346 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924799/YARN-8346.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2a58b0d4306d 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 51ce02b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20842/testReport/ |
| Max. process+thread count | 303 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20842/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Upgrading to 3.1 kills running containers with error "Opportunistic container 
> queue is full"
> --

[jira] [Commented] (YARN-8348) Incorrect and missing AfterClass in HBase-tests

2018-05-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487886#comment-16487886
 ] 

Íñigo Goiri commented on YARN-8348:
---

Technically the null check in AfterClass shouldn't be needed as a failure in 
BeforeClass should trigger the error everywhere else.
In any case, is good to not have a NPE if the BeforeClass fails.
So in the output we went from a double NoClassDefFound+NPE to just 
NoClassDefFound.
I think this is an improvement but we need to figure out the reason for the 
NoClassDefFound (probably a separate JIRA).

The real fix here would be the one in TestHBaseTimelineStorageDomain which 
leaves the mini cluster open.
 [^YARN-8348.v1.patch] LGTM.
Let's wait for Yetus.

> Incorrect and missing AfterClass in HBase-tests
> ---
>
> Key: YARN-8348
> URL: https://issues.apache.org/jira/browse/YARN-8348
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8348.v1.patch
>
>
> HBase tests are failing in 
> [linux|https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/789/testReport/]
>  for 2 reasons: 
>  * incorrect afterClass;
>  * not defined KeyProviderTokenIssuer.
> While in windows are failing for the previous 2 reasons plus * missing 
> afterClass.
> This Jira tracks the effort to fix part of HBase-tests and reduces the failed 
> tests in Linux.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8336) Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils

2018-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487895#comment-16487895
 ] 

Hudson commented on YARN-8336:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14272 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14272/])
YARN-8336. Fix potential connection leak in SchedConfCLI and (inigoiri: rev 
e30938af1270e079587e7bc06b755f9e93e660a5)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/SchedConfCLI.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/YarnWebServiceUtils.java


> Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils
> -
>
> Key: YARN-8336
> URL: https://issues.apache.org/jira/browse/YARN-8336
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8336.v1.patch, YARN-8336.v2.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync

2018-05-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487894#comment-16487894
 ] 

genericqa commented on YARN-8344:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 20s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 0 new + 29 unchanged - 2 fixed = 29 total (was 31) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 45s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
56s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 76m 25s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8344 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924796/YARN-8344.v2.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux db23130e69a9 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 51ce02b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20841/testReport/ |
| Max. process+thread count | 306 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20841/console |
| Powered by | Apache Yetus

[jira] [Commented] (YARN-4781) Support intra-queue preemption for fairness ordering policy.

2018-05-23 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487902#comment-16487902
 ] 

Eric Payne commented on YARN-4781:
--

bq.  FairOrdering policy could be used with weights?
[~sunilg], the fair ordering preemption will generally select the 
smaller-weigted users first even when those containers are older. It's a 
hierarchy of priority ordering, though, and it does still try to be "fair," so 
you could have a situation where the youngest containers are selected even 
though they are owned by a more heavily-weighted user.

> Support intra-queue preemption for fairness ordering policy.
> 
>
> Key: YARN-4781
> URL: https://issues.apache.org/jira/browse/YARN-4781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-4781.001.patch, YARN-4781.002.patch, 
> YARN-4781.003.patch, YARN-4781.004.patch, YARN-4781.005.patch
>
>
> We introduced fairness queue policy since YARN-3319, which will let large 
> applications make progresses and not starve small applications. However, if a 
> large application takes the queue’s resources, and containers of the large 
> app has long lifespan, small applications could still wait for resources for 
> long time and SLAs cannot be guaranteed.
> Instead of wait for application release resources on their own, we need to 
> preempt resources of queue with fairness policy enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8334) [] Fix potential connection leak in GPGUtils

2018-05-23 Thread Botong Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8334:
---
Summary: [] Fix potential connection leak in GPGUtils  (was: Fix potential 
connection leak in GPGUtils)

> [] Fix potential connection leak in GPGUtils
> 
>
> Key: YARN-8334
> URL: https://issues.apache.org/jira/browse/YARN-8334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
> Attachments: YARN-8334-YARN-7402.v1.patch, 
> YARN-8334-YARN-7402.v2.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8334) [GPG] Fix potential connection leak in GPGUtils

2018-05-23 Thread Botong Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8334:
---
Summary: [GPG] Fix potential connection leak in GPGUtils  (was: [] Fix 
potential connection leak in GPGUtils)

> [GPG] Fix potential connection leak in GPGUtils
> ---
>
> Key: YARN-8334
> URL: https://issues.apache.org/jira/browse/YARN-8334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
> Attachments: YARN-8334-YARN-7402.v1.patch, 
> YARN-8334-YARN-7402.v2.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services

2018-05-23 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger reopened YARN-7530:
---

This change breaks branch-3.1 compilation if the .m2 directory is cleaned.

{noformat}
[ERROR] [ERROR] Some problems were encountered while processing the POMs:
[WARNING] 'parent.relativePath' of POM 
org.apache.hadoop:hadoop-yarn-services-api:[unknown-version] 
(/Users/ebadger/apachehadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/pom.xml)
 points at org.apache.hadoop:hadoop-yarn-services instead of 
org.apache.hadoop:hadoop-yarn-applications, please verify your project 
structure @ line 19, column 11
[FATAL] Non-resolvable parent POM for 
org.apache.hadoop:hadoop-yarn-services-api:[unknown-version]: Could not find 
artifact org.apache.hadoop:hadoop-yarn-applications:pom:3.1.1-SNAPSHOT and 
'parent.relativePath' points at wrong local POM @ line 19, column 11
[WARNING] 'build.plugins.plugin.version' for 
org.apache.maven.plugins:maven-gpg-plugin is missing. @ line 133, column 15
 @
[ERROR] The build could not read 1 project -> [Help 1]
[ERROR]
[ERROR]   The project 
org.apache.hadoop:hadoop-yarn-services-api:[unknown-version] 
(/Users/ebadger/apachehadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/pom.xml)
 has 1 error
[ERROR] Non-resolvable parent POM for 
org.apache.hadoop:hadoop-yarn-services-api:[unknown-version]: Could not find 
artifact org.apache.hadoop:hadoop-yarn-applications:pom:3.1.1-SNAPSHOT and 
'parent.relativePath' points at wrong local POM @ line 19, column 11 -> [Help 2]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
[ERROR] [Help 2] 
http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelException
{noformat}


Here's the difference between branch-3.1 and trunk. The artifactId was updated 
correctly in trunk, but not branch-3.1
{noformat}
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/pom.xml
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/pom.xml
index 45168a9fbc4..d45da093102 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/pom.xml
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/pom.xml
@@ -18,8 +18,8 @@
   4.0.0
   
 org.apache.hadoop
-hadoop-yarn-services
-3.2.0-SNAPSHOT
+hadoop-yarn-applications
+3.1.1-SNAPSHOT
   
   hadoop-yarn-services-api
   Apache Hadoop YARN Services API
{noformat}

> hadoop-yarn-services-api should be part of hadoop-yarn-services
> ---
>
> Key: YARN-7530
> URL: https://issues.apache.org/jira/browse/YARN-7530
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Chandni Singh
>Priority: Trivial
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-7530.001.patch, YARN-7530.002.patch
>
>
> Hadoop-yarn-services-api is currently a parallel project to 
> hadoop-yarn-services project.  It would be better if hadoop-yarn-services-api 
> is part of hadoop-yarn-services for correctness.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8334) [GPG] Fix potential connection leak in GPGUtils

2018-05-23 Thread Botong Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487945#comment-16487945
 ] 

Botong Huang commented on YARN-8334:


Committed to YARN-7402 as db183f2ea. Thanks [~giovanni.fumarola] for the patch 
and [~elgoiri] for the review! 

> [GPG] Fix potential connection leak in GPGUtils
> ---
>
> Key: YARN-8334
> URL: https://issues.apache.org/jira/browse/YARN-8334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
> Attachments: YARN-8334-YARN-7402.v1.patch, 
> YARN-8334-YARN-7402.v2.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8334) [GPG] Fix potential connection leak in GPGUtils

2018-05-23 Thread Botong Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang resolved YARN-8334.

Resolution: Fixed

> [GPG] Fix potential connection leak in GPGUtils
> ---
>
> Key: YARN-8334
> URL: https://issues.apache.org/jira/browse/YARN-8334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
> Attachments: YARN-8334-YARN-7402.v1.patch, 
> YARN-8334-YARN-7402.v2.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8349) Remove YARN registry entries when a service is killed by the RM

2018-05-23 Thread Shane Kumpf (JIRA)
Shane Kumpf created YARN-8349:
-

 Summary: Remove YARN registry entries when a service is killed by 
the RM
 Key: YARN-8349
 URL: https://issues.apache.org/jira/browse/YARN-8349
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: yarn-native-services
Affects Versions: 3.2.0, 3.1.1
Reporter: Shane Kumpf


As the title states, when a service is killed by the RM (for exceeding its 
lifetime for example), the YARN registry entries should be cleaned up.

Without cleanup, DNS can contain multiple hostnames for a single IP address in 
the case where IPs are reused. This impacts reverse lookups, which breaks 
services, such as kerberos, that depend on those lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8349) Remove YARN registry entries when a service is killed by the RM

2018-05-23 Thread Shane Kumpf (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf reassigned YARN-8349:
-

Assignee: Billie Rinaldi

> Remove YARN registry entries when a service is killed by the RM
> ---
>
> Key: YARN-8349
> URL: https://issues.apache.org/jira/browse/YARN-8349
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Billie Rinaldi
>Priority: Major
>
> As the title states, when a service is killed by the RM (for exceeding its 
> lifetime for example), the YARN registry entries should be cleaned up.
> Without cleanup, DNS can contain multiple hostnames for a single IP address 
> in the case where IPs are reused. This impacts reverse lookups, which breaks 
> services, such as kerberos, that depend on those lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8348) Incorrect and missing AfterClass in HBase-tests

2018-05-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487951#comment-16487951
 ] 

genericqa commented on YARN-8348:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests
 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 45s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests
 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 27s{color} 
| {color:red} hadoop-yarn-server-timelineservice-hbase-tests in the patch 
failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 46m  1s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageEntities |
|   | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageSchema 
|
|   | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageApps |
|   | 
hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage
 |
|   | 
hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowActivity |
|   | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageDomain 
|
|   | 
hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRunCompaction
 |
|   | hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRun |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05

[jira] [Commented] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-05-23 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487952#comment-16487952
 ] 

Eric Yang commented on YARN-8342:
-

[~vinodkv] Launch command was dropped in YARN-7516 due to concerns of shell 
expansion to cause the commands to run as root user via popen.  With YARN-7654 
changes to use execvp, this concern has been nullified.  It is safe to preserve 
launch command even for untrusted images.

[~shaneku...@gmail.com] [~ebadger] [~jlowe] Do you agree with this change?

> Using docker image from a non-privileged registry, the launch_command is not 
> honored
> 
>
> Key: YARN-8342
> URL: https://issues.apache.org/jira/browse/YARN-8342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Priority: Critical
>  Labels: Docker
>
> During test of the Docker feature, I found that if a container comes from 
> non-privileged docker registry, the specified launch command will be ignored. 
> Container will success without any log, which is very confusing to end users. 
> And this behavior is inconsistent to containers from privileged docker 
> registries.
> cc: [~eyang], [~shaneku...@gmail.com], [~ebadger], [~jlowe]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8333) Load balance YARN services using RegistryDNS multiple A records

2018-05-23 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang reassigned YARN-8333:
---

Assignee: Eric Yang

> Load balance YARN services using RegistryDNS multiple A records
> ---
>
> Key: YARN-8333
> URL: https://issues.apache.org/jira/browse/YARN-8333
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>
> For scaling stateless containers, it would be great to support DNS round 
> robin for fault tolerance and load balancing.  The current DNS record format 
> for RegistryDNS is 
> [container-instance].[application-name].[username].[domain].  For example:
> {code}
> appcatalog-0.appname.hbase.ycluster. IN A 123.123.123.120
> appcatalog-1.appname.hbase.ycluster. IN A 123.123.123.121
> appcatalog-2.appname.hbase.ycluster. IN A 123.123.123.122
> appcatalog-3.appname.hbase.ycluster. IN A 123.123.123.123
> {code}
> It would be nice to add multi-A record that contains all IP addresses of the 
> same component in addition to the instance based records.  For example:
> {code}
> appcatalog.appname.hbase.ycluster. IN A 123.123.123.120
> appcatalog.appname.hbase.ycluster. IN A 123.123.123.121
> appcatalog.appname.hbase.ycluster. IN A 123.123.123.122
> appcatalog.appname.hbase.ycluster. IN A 123.123.123.123
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8348) Incorrect and missing AfterClass in HBase-tests

2018-05-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487984#comment-16487984
 ] 

Íñigo Goiri commented on YARN-8348:
---

Good news is that the NPE is gone.
However, the original NoClassDefFoundError surfaces clarly now.
I'm fine committing this as is but I'd like to have a follow-up JIRA on why 
KeyProviderTokenIssuer is not found.

> Incorrect and missing AfterClass in HBase-tests
> ---
>
> Key: YARN-8348
> URL: https://issues.apache.org/jira/browse/YARN-8348
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8348.v1.patch
>
>
> HBase tests are failing in 
> [linux|https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/789/testReport/]
>  for 2 reasons: 
>  * incorrect afterClass;
>  * not defined KeyProviderTokenIssuer.
> While in windows are failing for the previous 2 reasons plus * missing 
> afterClass.
> This Jira tracks the effort to fix part of HBase-tests and reduces the failed 
> tests in Linux.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync

2018-05-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487987#comment-16487987
 ] 

Íñigo Goiri commented on YARN-8344:
---

+1 on  [^YARN-8344.v2.patch].
We still need to figure out the proper fix for the path length issue on Windows.
[~giovanni.fumarola], please link this JIRA once opening the Windows fix.

> Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync
> -
>
> Key: YARN-8344
> URL: https://issues.apache.org/jira/browse/YARN-8344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8344.v1.patch, YARN-8344.v2.patch
>
>
> Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync 
> on Windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-05-23 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487992#comment-16487992
 ] 

Shane Kumpf commented on YARN-8342:
---

This sounds like a reasonable proposal. In cases where the current behavior is 
desired, the user can set "launch_command" to an empty string I guess?

To be clear, there is no replacement with an "empty bash". The current 
"untrusted" mode leaves it up to the Docker image to specify the 
ENTRYPOINT/CMD. Nothing is overwritten by YARN in this "untrusted" mode. It is 
very common for images to use "bash" as the CMD. When an image does this and 
YARN runs in this "untrusted" mode, a non-interactive "bash" shell starts in 
the container and immediately exits with success. YARN reports that the 
container ran successfully, but this is confusing to the user because the code 
they expected to run did not run. The launch script depends on mounts and 
"untrusted" mode strips all mounts, meaning we flat out can't use a 
launch_script in this mode as we would in "trusted" mode. Allowing the 
"launch_command" supplied by the user, without embedding that "launch_command" 
in the launch script seems like a viable way to support both. Confused yet? :)

> Using docker image from a non-privileged registry, the launch_command is not 
> honored
> 
>
> Key: YARN-8342
> URL: https://issues.apache.org/jira/browse/YARN-8342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Priority: Critical
>  Labels: Docker
>
> During test of the Docker feature, I found that if a container comes from 
> non-privileged docker registry, the specified launch command will be ignored. 
> Container will success without any log, which is very confusing to end users. 
> And this behavior is inconsistent to containers from privileged docker 
> registries.
> cc: [~eyang], [~shaneku...@gmail.com], [~ebadger], [~jlowe]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8348) Incorrect and missing AfterClass in HBase-tests

2018-05-23 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488002#comment-16488002
 ] 

Giovanni Matteo Fumarola commented on YARN-8348:


Thanks [~elgoiri] for the review. I will open a follow-up Jira for 
KeyProviderTokenIssuer. As I said before, this patch will bring the failed test 
from 21 to 16 in Linux.

As HDFS-13558 that you and [~huanbang1993] fixed by closing the cluster, the 
patch will fix failures in Windows for TestHBaseTimelineStorageDomain and 
TestHBaseTimelineStorageSchema.

> Incorrect and missing AfterClass in HBase-tests
> ---
>
> Key: YARN-8348
> URL: https://issues.apache.org/jira/browse/YARN-8348
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8348.v1.patch
>
>
> HBase tests are failing in 
> [linux|https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/789/testReport/]
>  for 2 reasons: 
>  * incorrect afterClass;
>  * not defined KeyProviderTokenIssuer.
> While in windows are failing for the previous 2 reasons plus * missing 
> afterClass.
> This Jira tracks the effort to fix part of HBase-tests and reduces the failed 
> tests in Linux.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6

2018-05-23 Thread Hsin-Liang Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488013#comment-16488013
 ] 

Hsin-Liang Huang commented on YARN-8326:


Hi  [~eyang]

   Here is another update.  Even though the simple job that I ran with the 
suggested setting changed, the performance was improved.   However, I ran our 
unit testcases, and it still ran 14 hours compared to 7 hours in 2.6 
environment.  I also ran another sample job, with the changed settings, it 
still ran 15 seconds compared to 6 or 7 seconds in 2.6 environment.  So I think 
even though monitoring setting might affect the performance issue, but it only 
plays a little part,  the major issue could still be in the exiting container 
that in 3.0 environment is much slower than 2.6 environment.  Is there anyone 
looking into this area?   Thanks!

 

> Yarn 3.0 seems runs slower than Yarn 2.6
> 
>
> Key: YARN-8326
> URL: https://issues.apache.org/jira/browse/YARN-8326
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
> Environment: This is the yarn-site.xml for 3.0. 
>  
> 
> 
>  hadoop.registry.dns.bind-port
>  5353
>  
> 
>  hadoop.registry.dns.domain-name
>  hwx.site
>  
> 
>  hadoop.registry.dns.enabled
>  true
>  
> 
>  hadoop.registry.dns.zone-mask
>  255.255.255.0
>  
> 
>  hadoop.registry.dns.zone-subnet
>  172.17.0.0
>  
> 
>  manage.include.files
>  false
>  
> 
>  yarn.acl.enable
>  false
>  
> 
>  yarn.admin.acl
>  yarn
>  
> 
>  yarn.client.nodemanager-connect.max-wait-ms
>  6
>  
> 
>  yarn.client.nodemanager-connect.retry-interval-ms
>  1
>  
> 
>  yarn.http.policy
>  HTTP_ONLY
>  
> 
>  yarn.log-aggregation-enable
>  false
>  
> 
>  yarn.log-aggregation.retain-seconds
>  2592000
>  
> 
>  yarn.log.server.url
>  
> [http://xx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs]
>  
> 
>  yarn.log.server.web-service.url
>  
> [http://xx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory]
>  
> 
>  yarn.node-labels.enabled
>  false
>  
> 
>  yarn.node-labels.fs-store.retry-policy-spec
>  2000, 500
>  
> 
>  yarn.node-labels.fs-store.root-dir
>  /system/yarn/node-labels
>  
> 
>  yarn.nodemanager.address
>  0.0.0.0:45454
>  
> 
>  yarn.nodemanager.admin-env
>  MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX
>  
> 
>  yarn.nodemanager.aux-services
>  mapreduce_shuffle,spark2_shuffle,timeline_collector
>  
> 
>  yarn.nodemanager.aux-services.mapreduce_shuffle.class
>  org.apache.hadoop.mapred.ShuffleHandler
>  
> 
>  yarn.nodemanager.aux-services.spark2_shuffle.class
>  org.apache.spark.network.yarn.YarnShuffleService
>  
> 
>  yarn.nodemanager.aux-services.spark2_shuffle.classpath
>  /usr/spark2/aux/*
>  
> 
>  yarn.nodemanager.aux-services.spark_shuffle.class
>  org.apache.spark.network.yarn.YarnShuffleService
>  
> 
>  yarn.nodemanager.aux-services.timeline_collector.class
>  
> org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService
>  
> 
>  yarn.nodemanager.bind-host
>  0.0.0.0
>  
> 
>  yarn.nodemanager.container-executor.class
>  
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor
>  
> 
>  yarn.nodemanager.container-metrics.unregister-delay-ms
>  6
>  
> 
>  yarn.nodemanager.container-monitor.interval-ms
>  3000
>  
> 
>  yarn.nodemanager.delete.debug-delay-sec
>  0
>  
> 
>  
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
>  90
>  
> 
>  yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb
>  1000
>  
> 
>  yarn.nodemanager.disk-health-checker.min-healthy-disks
>  0.25
>  
> 
>  yarn.nodemanager.health-checker.interval-ms
>  135000
>  
> 
>  yarn.nodemanager.health-checker.script.timeout-ms
>  6
>  
> 
>  
> yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage
>  false
>  
> 
>  yarn.nodemanager.linux-container-executor.group
>  hadoop
>  
> 
>  
> yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users
>  false
>  
> 
>  yarn.nodemanager.local-dirs
>  /hadoop/yarn/local
>  
> 
>  yarn.nodemanager.log-aggregation.compression-type
>  gz
>  
> 
>  yarn.nodemanager.log-aggregation.debug-enabled
>  false
>  
> 
>  yarn.nodemanager.log-aggregation.num-log-files-per-app
>  30
>  
> 
>  
> yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds
>  3600
>  
> 
>  yarn.nodemanager.log-dirs
>  /hadoop/yarn/log
>  
> 
>  yarn.nodemanager.log.retain-seconds
>  604800
>  
> 
>  yarn.nodemanager.pmem-check-enabled
>  false
>  
> 
>  yarn.nodemanager.recovery.dir
>  /var/log/hadoop-yarn/nodemanager/recovery-state
>  
> 
>  yarn.nodemanager.recovery.enabled
>  true
>  
> 
>  yarn.nodemanager.recovery.supervised
>  true
>  
> 
>  yarn.nodemanager.remote-app-log-dir
>  /app-logs
>  
> 
>  yarn

[jira] [Updated] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services

2018-05-23 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-7530:
-
Fix Version/s: (was: 3.2.0)

> hadoop-yarn-services-api should be part of hadoop-yarn-services
> ---
>
> Key: YARN-7530
> URL: https://issues.apache.org/jira/browse/YARN-7530
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Chandni Singh
>Priority: Blocker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-7530.001.patch, YARN-7530.002.patch
>
>
> Hadoop-yarn-services-api is currently a parallel project to 
> hadoop-yarn-services project.  It would be better if hadoop-yarn-services-api 
> is part of hadoop-yarn-services for correctness.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services

2018-05-23 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-7530:
-
Priority: Blocker  (was: Trivial)

> hadoop-yarn-services-api should be part of hadoop-yarn-services
> ---
>
> Key: YARN-7530
> URL: https://issues.apache.org/jira/browse/YARN-7530
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Chandni Singh
>Priority: Blocker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-7530.001.patch, YARN-7530.002.patch
>
>
> Hadoop-yarn-services-api is currently a parallel project to 
> hadoop-yarn-services project.  It would be better if hadoop-yarn-services-api 
> is part of hadoop-yarn-services for correctness.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >