[jira] [Created] (YARN-8651) We must increase min Resource in FairScheduler after increase number of NM

2018-08-10 Thread stefanlee (JIRA)
stefanlee created YARN-8651:
---

 Summary: We must increase min Resource in FairScheduler after 
increase number of NM
 Key: YARN-8651
 URL: https://issues.apache.org/jira/browse/YARN-8651
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.0
Reporter: stefanlee


Nowadays,our cluster has a strange phenomena,before we increase the scale of 
NodeManager, the resource utilization could be 100%, but we found the resource 
utilization does not promote as the cluster expansion and many queue's used 
resource stay at min resource  although they have many demand request.

Then we increase their min resource dynamically,the resource utilization of 
these queues  goes  up and the resources of the entire cluster are also used 
after that.

So i doubte if the bug in *FairSharePolicy#compare.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6972) Adding RM ClusterId in AppInfo

2018-08-10 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577021#comment-16577021
 ] 

genericqa commented on YARN-6972:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 59s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 68m 
52s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}128m 11s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-6972 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935219/YARN-6972.015.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8887c417e31c 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a2a8c48 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21574/testReport/ |
| Max. process+thread count | 879 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21574/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Adding RM ClusterId in AppInfo
> 

[jira] [Commented] (YARN-8650) Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and Invalid event: CONTAINER_LAUNCHED at DONE

2018-08-10 Thread lujie (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577015#comment-16577015
 ] 

lujie commented on YARN-8650:
-

I have uploaded the log that contains the exception. Please check!

We also find one NPE bug while nodemanager is 
shutdown![YARN-8649|https://issues.apache.org/jira/browse/YARN-8649]

> Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and  Invalid event: 
> CONTAINER_LAUNCHED at DONE
> -
>
> Key: YARN-8650
> URL: https://issues.apache.org/jira/browse/YARN-8650
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: lujie
>Priority: Major
> Attachments: hadoop-hires-nodemanager-hadoop11.log, 
> hadoop-hires-nodemanager-hadoop15.log
>
>
> We have tested the hadoop while  nodemanager is shutting down and encounter 
> two InvalidStateTransitionException:
> {code:java}
> 2018-08-04 14:29:33,025 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Can't handle this event at current state: Current: [DONE], eventType: 
> [CONTAINER_KILLED_ON_REQUEST], container: 
> [container_1533364185282_0001_01_01]
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CONTAINER_KILLED_ON_REQUEST at DONE
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CONTAINER_LAUNCHED at DONE
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> We have analysis these two bugs, and find that shutdown will send kill event 
> and hence cause these two exception. We have test the our cluster for many 
> time and can determinately  reproduce it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8650) Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and Invalid event: CONTAINER_LAUNCHED at DONE

2018-08-10 Thread lujie (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577015#comment-16577015
 ] 

lujie edited comment on YARN-8650 at 8/11/18 2:20 AM:
--

I have uploaded the log files that contains the exception. Please check!

We also find one NPE bug while nodemanager is shutdown!YARN-8649


was (Author: xiaoheipangzi):
I have uploaded the log that contains the exception. Please check!

We also find one NPE bug while nodemanager is 
shutdown![YARN-8649|https://issues.apache.org/jira/browse/YARN-8649]

> Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and  Invalid event: 
> CONTAINER_LAUNCHED at DONE
> -
>
> Key: YARN-8650
> URL: https://issues.apache.org/jira/browse/YARN-8650
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: lujie
>Priority: Major
> Attachments: hadoop-hires-nodemanager-hadoop11.log, 
> hadoop-hires-nodemanager-hadoop15.log
>
>
> We have tested the hadoop while  nodemanager is shutting down and encounter 
> two InvalidStateTransitionException:
> {code:java}
> 2018-08-04 14:29:33,025 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Can't handle this event at current state: Current: [DONE], eventType: 
> [CONTAINER_KILLED_ON_REQUEST], container: 
> [container_1533364185282_0001_01_01]
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CONTAINER_KILLED_ON_REQUEST at DONE
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CONTAINER_LAUNCHED at DONE
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> We have analysis these two bugs, and find that shutdown will send kill event 
> and hence cause these two exception. We have test the our cluster for many 
> time and can determinately  reproduce it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat

2018-08-10 Thread lujie (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577017#comment-16577017
 ] 

lujie commented on YARN-8649:
-

I also find another two InvalidStateTransitionException while nodemanger is 
shutting down:

Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE

Invalid event: CONTAINER_LAUNCHED at DONE

I also create a new 
issue[YARN-8650|https://issues.apache.org/jira/browse/YARN-8650] to track it. 

 

> Similar as YARN-4355:NPE while processing localizer heartbeat
> -
>
> Key: YARN-8649
> URL: https://issues.apache.org/jira/browse/YARN-8649
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: lujie
>Priority: Major
> Attachments: hadoop-hires-nodemanager-hadoop11.log
>
>
> I have noticed that a nodemanager was getting NPEs while tearing down. The 
> reason maybe  similar to YARN-4355 which is reported by [# Jason Lowe]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8650) Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and Invalid event: CONTAINER_LAUNCHED at DONE

2018-08-10 Thread lujie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-8650:

Attachment: hadoop-hires-nodemanager-hadoop11.log

> Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and  Invalid event: 
> CONTAINER_LAUNCHED at DONE
> -
>
> Key: YARN-8650
> URL: https://issues.apache.org/jira/browse/YARN-8650
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: lujie
>Priority: Major
> Attachments: hadoop-hires-nodemanager-hadoop11.log, 
> hadoop-hires-nodemanager-hadoop15.log
>
>
> We have tested the hadoop while  nodemanager is shutting down and encounter 
> two InvalidStateTransitionException:
> {code:java}
> 2018-08-04 14:29:33,025 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Can't handle this event at current state: Current: [DONE], eventType: 
> [CONTAINER_KILLED_ON_REQUEST], container: 
> [container_1533364185282_0001_01_01]
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CONTAINER_KILLED_ON_REQUEST at DONE
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CONTAINER_LAUNCHED at DONE
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> We have analysis these two bugs, and find that shutdown will send kill event 
> and hence cause these two exception. We have test the our cluster for many 
> time and can determinately  reproduce it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8650) Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and Invalid event: CONTAINER_LAUNCHED at DONE

2018-08-10 Thread lujie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-8650:

Attachment: hadoop-hires-nodemanager-hadoop15.log

> Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and  Invalid event: 
> CONTAINER_LAUNCHED at DONE
> -
>
> Key: YARN-8650
> URL: https://issues.apache.org/jira/browse/YARN-8650
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: lujie
>Priority: Major
> Attachments: hadoop-hires-nodemanager-hadoop15.log
>
>
> We have tested the hadoop while  nodemanager is shutting down and encounter 
> two InvalidStateTransitionException:
> {code:java}
> 2018-08-04 14:29:33,025 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Can't handle this event at current state: Current: [DONE], eventType: 
> [CONTAINER_KILLED_ON_REQUEST], container: 
> [container_1533364185282_0001_01_01]
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CONTAINER_KILLED_ON_REQUEST at DONE
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CONTAINER_LAUNCHED at DONE
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> We have analysis these two bugs, and find that shutdown will send kill event 
> and hence cause these two exception. We have test the our cluster for many 
> time and can determinately  reproduce it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8650) Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and Invalid event: CONTAINER_LAUNCHED at DONE

2018-08-10 Thread lujie (JIRA)
lujie created YARN-8650:
---

 Summary: Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and  
Invalid event: CONTAINER_LAUNCHED at DONE
 Key: YARN-8650
 URL: https://issues.apache.org/jira/browse/YARN-8650
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: lujie


We have tested the hadoop while  nodemanager is shutting down and encounter two 
InvalidStateTransitionException:
{code:java}
2018-08-04 14:29:33,025 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Can't handle this event at current state: Current: [DONE], eventType: 
[CONTAINER_KILLED_ON_REQUEST], container: 
[container_1533364185282_0001_01_01]
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
CONTAINER_KILLED_ON_REQUEST at DONE
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
at java.lang.Thread.run(Thread.java:745)
{code}
{code:java}
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
CONTAINER_LAUNCHED at DONE
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
at java.lang.Thread.run(Thread.java:745)
{code}
We have analysis these two bugs, and find that shutdown will send kill event 
and hence cause these two exception. We have test the our cluster for many time 
and can determinately  reproduce it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat

2018-08-10 Thread lujie (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577000#comment-16577000
 ] 

lujie edited comment on YARN-8649 at 8/11/18 1:50 AM:
--

Stacktrace(I also upload the log file generated by our cluster):

 
{code:java}
java.io.IOException: java.lang.NullPointerException: 
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1187)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1151)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:752)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:370)
at 
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48)
at 
org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:199)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180){code}
NodeManager is tearing down and will clean up the local resource. So when the 
heartbeat comes in, it will do :
{code:java}
//getPathForLocalization
LocalizedResource rsrc = localrsrc.get(req);
rsrc.setLocalPath(localPath);
{code}
rsrc is null and hence NPE happens.

 

I wonder if only adding null check is ok? If is, I will upload a patch.


was (Author: xiaoheipangzi):
Stacktrace(I also upload the log file generated by our cluster):

 
{code:java}
java.io.IOException: java.lang.NullPointerException: 
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1187)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1151)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:752)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:370)
at 
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48)
at 
org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:199)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180){code}
NodeManager is tearing down and will clean up the local resource. So when the 
heartbeat comes in, it will do :
{code:java}
//getPathForLocalization
LocalizedResource 

[jira] [Comment Edited] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat

2018-08-10 Thread lujie (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577000#comment-16577000
 ] 

lujie edited comment on YARN-8649 at 8/11/18 1:48 AM:
--

Stacktrace(I also upload the log file generated by our cluster):

 
{code:java}
java.io.IOException: java.lang.NullPointerException: 
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1187)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1151)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:752)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:370)
at 
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48)
at 
org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:199)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180){code}
NodeManager is tearing down and will clean up the local resource. So when the 
heartbeat comes in, it will do :
{code:java}
//getPathForLocalization
LocalizedResource rsrc = localrsrc.get(req);
rsrc.setLocalPath(localPath);
{code}
rsrc is null and hence NPE happens.


was (Author: xiaoheipangzi):
Stacktrace(I also upload the log file generated by our cluster):

 
{code:java}
java.io.IOException: java.lang.NullPointerException: 
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1187)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1151)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:752)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:370)
at 
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48)
at 
org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:199)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180)
{code}

> Similar as YARN-4355:NPE while processing localizer heartbeat
> -
>
> Key: YARN-8649
> URL: https://issues.apache.org/jira/browse/YARN-8649
>  

[jira] [Updated] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat

2018-08-10 Thread lujie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-8649:

Summary: Similar as YARN-4355:NPE while processing localizer heartbeat  
(was: Same as YARN-4355:NPE while processing localizer heartbeat)

> Similar as YARN-4355:NPE while processing localizer heartbeat
> -
>
> Key: YARN-8649
> URL: https://issues.apache.org/jira/browse/YARN-8649
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: lujie
>Priority: Major
> Attachments: hadoop-hires-nodemanager-hadoop11.log
>
>
> I have noticed that a nodemanager was getting NPEs while tearing down. The 
> reason maybe  similar to YARN-4355 which is reported by [# Jason Lowe]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8649) Same as YARN-4355:NPE while processing localizer heartbeat

2018-08-10 Thread lujie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-8649:

Description: I have noticed that a nodemanager was getting NPEs while 
tearing down. The reason maybe  similar to YARN-4355 which is reported by [# 
Jason Lowe].   (was: I have noticed that a nodemanager was getting NPEs while 
tearing down. The reason maybe  similar to YARN-4355 which reported by [# Jason 
Lowe]. )

> Same as YARN-4355:NPE while processing localizer heartbeat
> --
>
> Key: YARN-8649
> URL: https://issues.apache.org/jira/browse/YARN-8649
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: lujie
>Priority: Major
> Attachments: hadoop-hires-nodemanager-hadoop11.log
>
>
> I have noticed that a nodemanager was getting NPEs while tearing down. The 
> reason maybe  similar to YARN-4355 which is reported by [# Jason Lowe]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8649) Same as YARN-4355:NPE while processing localizer heartbeat

2018-08-10 Thread lujie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-8649:

Attachment: hadoop-hires-nodemanager-hadoop11.log

> Same as YARN-4355:NPE while processing localizer heartbeat
> --
>
> Key: YARN-8649
> URL: https://issues.apache.org/jira/browse/YARN-8649
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: lujie
>Priority: Major
> Attachments: hadoop-hires-nodemanager-hadoop11.log
>
>
> I have noticed that a nodemanager was getting NPEs while tearing down. The 
> reason maybe  similar to YARN-4355 which reported by [# Jason Lowe]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8649) Same as YARN-4355:NPE while processing localizer heartbeat

2018-08-10 Thread lujie (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577000#comment-16577000
 ] 

lujie edited comment on YARN-8649 at 8/11/18 1:27 AM:
--

Stacktrace(I also upload the log file generated by our cluster):

 
{code:java}
java.io.IOException: java.lang.NullPointerException: 
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1187)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1151)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:752)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:370)
at 
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48)
at 
org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:199)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180)
{code}


was (Author: xiaoheipangzi):
Stacktrace:

 
{code:java}
java.io.IOException: java.lang.NullPointerException: 
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1187)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1151)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:752)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:370)
at 
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48)
at 
org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:199)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180)
{code}

> Same as YARN-4355:NPE while processing localizer heartbeat
> --
>
> Key: YARN-8649
> URL: https://issues.apache.org/jira/browse/YARN-8649
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: lujie
>Priority: Major
> Attachments: hadoop-hires-nodemanager-hadoop11.log
>
>
> I have noticed that a nodemanager was getting NPEs while tearing down. The 
> reason maybe  similar to YARN-4355 

[jira] [Updated] (YARN-8649) Same as YARN-4355:NPE while processing localizer heartbeat

2018-08-10 Thread lujie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-8649:

Description: I have noticed that a nodemanager was getting NPEs while 
tearing down. The reason maybe  similar to YARN-4355 which reported by [# Jason 
Lowe].   (was: I have noticed that a nodemanager was getting NPEs processing a 
heartbeat. This is  similar to 
[YARN-4355|https://issues.apache.org/jira/browse/YARN-4355 ] which reported by 
[# Jason Lowe] )

> Same as YARN-4355:NPE while processing localizer heartbeat
> --
>
> Key: YARN-8649
> URL: https://issues.apache.org/jira/browse/YARN-8649
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: lujie
>Priority: Major
>
> I have noticed that a nodemanager was getting NPEs while tearing down. The 
> reason maybe  similar to YARN-4355 which reported by [# Jason Lowe]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8649) Same as YARN-4355:NPE while processing localizer heartbeat

2018-08-10 Thread lujie (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577000#comment-16577000
 ] 

lujie edited comment on YARN-8649 at 8/11/18 1:24 AM:
--

Stacktrace:

 
{code:java}
java.io.IOException: java.lang.NullPointerException: 
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1187)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1151)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:752)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:370)
at 
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48)
at 
org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:199)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180)
{code}


was (Author: xiaoheipangzi):
Stacktrace:
java.io.IOException: java.lang.NullPointerException: 
java.lang.NullPointerException
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1187)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1151)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:752)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:370)
 at 
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48)
 at 
org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:199)
 at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180)

> Same as YARN-4355:NPE while processing localizer heartbeat
> --
>
> Key: YARN-8649
> URL: https://issues.apache.org/jira/browse/YARN-8649
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: lujie
>Priority: Major
>
> I have noticed that a nodemanager was getting NPEs processing a heartbeat. 
> This is  similar to 
> [YARN-4355|https://issues.apache.org/jira/browse/YARN-4355 ] which reported 
> by [# Jason Lowe] 



--
This message was sent by 

[jira] [Commented] (YARN-8649) Same as YARN-4355:NPE while processing localizer heartbeat

2018-08-10 Thread lujie (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577000#comment-16577000
 ] 

lujie commented on YARN-8649:
-

Stacktrace:
java.io.IOException: java.lang.NullPointerException: 
java.lang.NullPointerException
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1187)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1151)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:752)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:370)
 at 
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48)
 at 
org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:199)
 at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180)

> Same as YARN-4355:NPE while processing localizer heartbeat
> --
>
> Key: YARN-8649
> URL: https://issues.apache.org/jira/browse/YARN-8649
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: lujie
>Priority: Major
>
> I have noticed that a nodemanager was getting NPEs processing a heartbeat. 
> This is  similar to 
> [YARN-4355|https://issues.apache.org/jira/browse/YARN-4355 ] which reported 
> by [# Jason Lowe] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8649) Same as YARN-4355:NPE while processing localizer heartbeat

2018-08-10 Thread lujie (JIRA)
lujie created YARN-8649:
---

 Summary: Same as YARN-4355:NPE while processing localizer heartbeat
 Key: YARN-8649
 URL: https://issues.apache.org/jira/browse/YARN-8649
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: lujie


I have noticed that a nodemanager was getting NPEs processing a heartbeat. This 
is  similar to [YARN-4355|https://issues.apache.org/jira/browse/YARN-4355 ] 
which reported by [# Jason Lowe] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8488) Need to add "SUCCEED" state to YARN service

2018-08-10 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576995#comment-16576995
 ] 

genericqa commented on YARN-8488:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
26s{color} | {color:red} hadoop-yarn-services-core in the patch failed. {color} 
|
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  1s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
48s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 34s{color} 
| {color:red} hadoop-yarn-services-core in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 63m 34s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 |
|  |  Inconsistent synchronization of 
org.apache.hadoop.yarn.service.ServiceScheduler.timelineServiceEnabled; locked 
50% of time  Unsynchronized access at ServiceScheduler.java:50% of time  
Unsynchronized access at ServiceScheduler.java:[line 270] |
| Failed junit tests | hadoop.yarn.service.TestServiceAM |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8488 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935216/YARN-8488.4.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8bc962529026 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a2a8c48 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | 

[jira] [Commented] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers

2018-08-10 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576981#comment-16576981
 ] 

genericqa commented on YARN-8160:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 20s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 31s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
43s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 
35s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}113m 51s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8160 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935205/YARN-8160.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 42b2ded5b7ea 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e7951c6 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 

[jira] [Updated] (YARN-6972) Adding RM ClusterId in AppInfo

2018-08-10 Thread Tanuj Nayak (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanuj Nayak updated YARN-6972:
--
Attachment: YARN-6972.015.patch

> Adding RM ClusterId in AppInfo
> --
>
> Key: YARN-6972
> URL: https://issues.apache.org/jira/browse/YARN-6972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Tanuj Nayak
>Priority: Major
> Attachments: YARN-6972.001.patch, YARN-6972.002.patch, 
> YARN-6972.003.patch, YARN-6972.004.patch, YARN-6972.005.patch, 
> YARN-6972.006.patch, YARN-6972.007.patch, YARN-6972.008.patch, 
> YARN-6972.009.patch, YARN-6972.010.patch, YARN-6972.011.patch, 
> YARN-6972.012.patch, YARN-6972.013.patch, YARN-6972.014.patch, 
> YARN-6972.015.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes

2018-08-10 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576978#comment-16576978
 ] 

genericqa commented on YARN-7417:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 14s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 43s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
20s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 59m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-7417 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935213/YARN-7417.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8fa556e37cf0 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e7951c6 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21572/testReport/ |
| Max. process+thread count | 451 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21572/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> re-factory 

[jira] [Commented] (YARN-8488) Need to add "SUCCEED" state to YARN service

2018-08-10 Thread Suma Shivaprasad (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576964#comment-16576964
 ] 

Suma Shivaprasad commented on YARN-8488:


Rebased with trunk

> Need to add "SUCCEED" state to YARN service
> ---
>
> Key: YARN-8488
> URL: https://issues.apache.org/jira/browse/YARN-8488
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8488.1.patch, YARN-8488.2.patch, YARN-8488.3.patch, 
> YARN-8488.4.patch
>
>
> Existing YARN service has following states:
> {code} 
> public enum ServiceState {
>   ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING,
>   UPGRADING_AUTO_FINALIZE;
> }
> {code} 
> Ideally we should add "SUCCEEDED" state in order to support long running 
> applications like Tensorflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8488) Need to add "SUCCEED" state to YARN service

2018-08-10 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8488:
---
Attachment: YARN-8488.4.patch

> Need to add "SUCCEED" state to YARN service
> ---
>
> Key: YARN-8488
> URL: https://issues.apache.org/jira/browse/YARN-8488
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8488.1.patch, YARN-8488.2.patch, YARN-8488.3.patch, 
> YARN-8488.4.patch
>
>
> Existing YARN service has following states:
> {code} 
> public enum ServiceState {
>   ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING,
>   UPGRADING_AUTO_FINALIZE;
> }
> {code} 
> Ideally we should add "SUCCEEDED" state in order to support long running 
> applications like Tensorflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes

2018-08-10 Thread Zian Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen updated YARN-7417:

Attachment: YARN-7417.003.patch

> re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to 
> remove duplicate codes
> 
>
> Key: YARN-7417
> URL: https://issues.apache.org/jira/browse/YARN-7417
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-7417.001.patch, YARN-7417.002.patch, 
> YARN-7417.003.patch
>
>
> This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock and 
> TFileAggregatedLogsBlock
>  # We have duplicate code in current implementation of 
> IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock which can be 
> abstract into common methods. 
>  # render method is too long in both of these class, we want to make it clear 
> by abstracting some helper methods out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7129) Application Catalog for YARN applications

2018-08-10 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576947#comment-16576947
 ] 

Eric Yang commented on YARN-7129:
-

The failed hdfs unit tests are not related to this patch.

> Application Catalog for YARN applications
> -
>
> Key: YARN-7129
> URL: https://issues.apache.org/jira/browse/YARN-7129
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN Appstore.pdf, YARN-7129.001.patch, 
> YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, 
> YARN-7129.005.patch, YARN-7129.006.patch, YARN-7129.007.patch, 
> YARN-7129.008.patch, YARN-7129.009.patch, YARN-7129.010.patch, 
> YARN-7129.011.patch
>
>
> YARN native services provides web services API to improve usability of 
> application deployment on Hadoop using collection of docker images.  It would 
> be nice to have an application catalog system which provides an editorial and 
> search interface for YARN applications.  This improves usability of YARN for 
> manage the life cycle of applications.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8523) Interactive docker shell

2018-08-10 Thread Zian Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen reassigned YARN-8523:
---

Assignee: Zian Chen

> Interactive docker shell
> 
>
> Key: YARN-8523
> URL: https://issues.apache.org/jira/browse/YARN-8523
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Zian Chen
>Priority: Major
>  Labels: Docker
>
> Some application might require interactive unix commands executions to carry 
> out operations.  Container-executor can interface with docker exec to debug 
> or analyze docker containers while the application is running.  It would be 
> nice to support an API to invoke docker exec to perform unix commands and 
> report back the output to application master.  Application master can 
> distribute and aggregate execution of the commands to record in application 
> master log file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8523) Interactive docker shell

2018-08-10 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576945#comment-16576945
 ] 

Zian Chen commented on YARN-8523:
-

Make sense. I'll work on provide an initial patch for this idea. Thanks [~eyang]

> Interactive docker shell
> 
>
> Key: YARN-8523
> URL: https://issues.apache.org/jira/browse/YARN-8523
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Priority: Major
>  Labels: Docker
>
> Some application might require interactive unix commands executions to carry 
> out operations.  Container-executor can interface with docker exec to debug 
> or analyze docker containers while the application is running.  It would be 
> nice to support an API to invoke docker exec to perform unix commands and 
> report back the output to application master.  Application master can 
> distribute and aggregate execution of the commands to record in application 
> master log file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint

2018-08-10 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576935#comment-16576935
 ] 

Jonathan Hung commented on YARN-8559:
-

Thx [~cheersyang]! Unit test failures in branch-3.0 also fail locally without 
the patch.

Committed to branch-3.0 and branch-2.

> Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
> 
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 3.0.4, 3.1.2
>
> Attachments: YARN-8559-branch-2.001.patch, 
> YARN-8559-branch-3.0.001.patch, YARN-8559.001.patch, YARN-8559.002.patch, 
> YARN-8559.003.patch, YARN-8559.004.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint

2018-08-10 Thread Jonathan Hung (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8559:

Fix Version/s: 3.0.4
   2.10.0

> Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
> 
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 3.0.4, 3.1.2
>
> Attachments: YARN-8559-branch-2.001.patch, 
> YARN-8559-branch-3.0.001.patch, YARN-8559.001.patch, YARN-8559.002.patch, 
> YARN-8559.003.patch, YARN-8559.004.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8488) Need to add "SUCCEED" state to YARN service

2018-08-10 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576934#comment-16576934
 ] 

genericqa commented on YARN-8488:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} YARN-8488 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8488 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935209/YARN-8488.3.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21571/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Need to add "SUCCEED" state to YARN service
> ---
>
> Key: YARN-8488
> URL: https://issues.apache.org/jira/browse/YARN-8488
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8488.1.patch, YARN-8488.2.patch, YARN-8488.3.patch
>
>
> Existing YARN service has following states:
> {code} 
> public enum ServiceState {
>   ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING,
>   UPGRADING_AUTO_FINALIZE;
> }
> {code} 
> Ideally we should add "SUCCEEDED" state in order to support long running 
> applications like Tensorflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8488) Need to add "SUCCEED" state to YARN service

2018-08-10 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8488:
---
Attachment: YARN-8488.3.patch

> Need to add "SUCCEED" state to YARN service
> ---
>
> Key: YARN-8488
> URL: https://issues.apache.org/jira/browse/YARN-8488
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8488.1.patch, YARN-8488.2.patch, YARN-8488.3.patch
>
>
> Existing YARN service has following states:
> {code} 
> public enum ServiceState {
>   ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING,
>   UPGRADING_AUTO_FINALIZE;
> }
> {code} 
> Ideally we should add "SUCCEEDED" state in order to support long running 
> applications like Tensorflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers

2018-08-10 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576924#comment-16576924
 ] 

Chandni Singh commented on YARN-8160:
-

Patch 2 contains the two fixes:
1. Exit code 255 during re-init. This is because cleanup of the docker 
container interferes with the docker inspect. Please see [~eyang]'s comment 
https://issues.apache.org/jira/browse/YARN-8160?focusedCommentId=16570918=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16570918

2. With entry point, yarn service was not using the updated launch command.

> Yarn Service Upgrade: Support upgrade of service that use docker containers 
> 
>
> Key: YARN-8160
> URL: https://issues.apache.org/jira/browse/YARN-8160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8160.001.patch, YARN-8160.002.patch, 
> container_e02_1533231998644_0009_01_03.nm.log
>
>
> Ability to upgrade dockerized  yarn native services.
> Ref: YARN-5637
> *Background*
> Container upgrade is supported by the NM via {{reInitializeContainer}} api. 
> {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded 
> container.
> NM performs the following steps during {{reInitializeContainer}}:
> - kills the existing process
> - cleans up the container
> - launches another container with the new {{ContainerLaunchContext}}
> NOTE: {{ContainerLaunchContext}} holds all the information that needs to 
> upgrade the container.
> With {{reInitializeContainer}}, the following does *NOT* change
> - container ID. This is not created by NM. It is provided to it and here RM 
> is not creating another container allocation.
> - {{localizedResources}} this stays the same if the upgrade does *NOT* 
> require additional resources IIUC.
>  
> The following changes with {{reInitializeContainer}}
> - the working directory of the upgraded container changes. It is *NOT* a 
> relaunch. 
> *Changes required in the case of docker container*
> - {{reInitializeContainer}} seems to not be working with Docker containers. 
> Investigate and fix this.
> - [Future change] Add an additional api to NM to pull the images and modify 
> {{reInitializeContainer}} to trigger docker container launch without pulling 
> the image first which could be based on a flag.
> -- When the service upgrade is initialized, we can provide the user with 
> an option to just pull the images  on the NMs.
> -- When a component instance is upgrade, it calls the 
> {{reInitializeContainer}} with the flag pull-image set to false, since the NM 
> will have already pulled the images.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6495) check docker container's exit code when writing to cgroup task files

2018-08-10 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576925#comment-16576925
 ] 

Jim Brennan commented on YARN-6495:
---

As part of YARN-8648, I am proposing that we can just remove the code that this 
patch is fixing.  If we are using cgroups, we are passing the {{cgroup-parent}} 
argument to docker, which accomplishes what this code was trying to do in a 
much more deterministic and reliable way.

My proposal would be to remove this code as part of YARN-8648, but if there is 
a preference for doing that in a separate Jira, I can file a new one.  Assuming 
there is agreement, I think we can close out this Jira.

[~Jaeboo], [~ebadger], do you agree?

> check docker container's exit code when writing to cgroup task files
> 
>
> Key: YARN-6495
> URL: https://issues.apache.org/jira/browse/YARN-6495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jaeboo Jeong
>Assignee: Jim Brennan
>Priority: Major
>  Labels: Docker
> Attachments: YARN-6495.001.patch, YARN-6495.002.patch
>
>
> If I execute simple command like date on docker container, the application 
> failed to complete successfully.
> for example, 
> {code}
> $ yarn  jar 
> $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar
>  -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop-docker -shell_command "date" -jar 
> $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar
>  -num_containers 1 -timeout 360
> …
> 17/04/12 00:16:40 INFO distributedshell.Client: Application did finished 
> unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring 
> loop
> 17/04/12 00:16:40 ERROR distributedshell.Client: Application failed to 
> complete successfully
> {code}
> The error log is like below.
> {code}
> ...
> Failed to write pid to file 
> /cgroup_parent/cpu/hadoop-yarn/container_/tasks - No such process
> ...
> {code}
> When writing pid to cgroup tasks, container-executor doesn’t check docker 
> container’s status.
> If the container finished very quickly, we can’t write pid to cgroup tasks, 
> and it is not problem.
> So container-executor needs to check docker container’s exit code during 
> writing pid to cgroup tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes

2018-08-10 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576917#comment-16576917
 ] 

Zian Chen commented on YARN-7417:
-

But looks like we can make AggregatedLogFormat.ContainerLogsReader to extend 
InputStream to achieve this. Let me update the patch.

> re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to 
> remove duplicate codes
> 
>
> Key: YARN-7417
> URL: https://issues.apache.org/jira/browse/YARN-7417
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-7417.001.patch, YARN-7417.002.patch
>
>
> This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock and 
> TFileAggregatedLogsBlock
>  # We have duplicate code in current implementation of 
> IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock which can be 
> abstract into common methods. 
>  # render method is too long in both of these class, we want to make it clear 
> by abstracting some helper methods out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers

2018-08-10 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8160:

Attachment: YARN-8160.002.patch

> Yarn Service Upgrade: Support upgrade of service that use docker containers 
> 
>
> Key: YARN-8160
> URL: https://issues.apache.org/jira/browse/YARN-8160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8160.001.patch, YARN-8160.002.patch, 
> container_e02_1533231998644_0009_01_03.nm.log
>
>
> Ability to upgrade dockerized  yarn native services.
> Ref: YARN-5637
> *Background*
> Container upgrade is supported by the NM via {{reInitializeContainer}} api. 
> {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded 
> container.
> NM performs the following steps during {{reInitializeContainer}}:
> - kills the existing process
> - cleans up the container
> - launches another container with the new {{ContainerLaunchContext}}
> NOTE: {{ContainerLaunchContext}} holds all the information that needs to 
> upgrade the container.
> With {{reInitializeContainer}}, the following does *NOT* change
> - container ID. This is not created by NM. It is provided to it and here RM 
> is not creating another container allocation.
> - {{localizedResources}} this stays the same if the upgrade does *NOT* 
> require additional resources IIUC.
>  
> The following changes with {{reInitializeContainer}}
> - the working directory of the upgraded container changes. It is *NOT* a 
> relaunch. 
> *Changes required in the case of docker container*
> - {{reInitializeContainer}} seems to not be working with Docker containers. 
> Investigate and fix this.
> - [Future change] Add an additional api to NM to pull the images and modify 
> {{reInitializeContainer}} to trigger docker container launch without pulling 
> the image first which could be based on a flag.
> -- When the service upgrade is initialized, we can provide the user with 
> an option to just pull the images  on the NMs.
> -- When a component instance is upgrade, it calls the 
> {{reInitializeContainer}} with the flag pull-image set to false, since the NM 
> will have already pulled the images.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker

2018-08-10 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576916#comment-16576916
 ] 

Jim Brennan commented on YARN-8648:
---

One proposal to fix the leaking cgroups is to have docker put its containers 
directly under the 
{{yarn.nodemanager.linux-container-executor.cgroups.hierarchy}} directory. For 
example, instead of using {{cgroup-parent=/hadoop-yarn/container_id-}}, we use 
{{cgroup-parent=/hadoop-yarn}}. This does cause docker to create a 
{{hadoop-yarn}} cgroup under each resource type, and it does not clean those 
up, but that is just one unused cgroup per resource type vs hundreds of 
thousands.

This can be done by just passing an empty string to 
DockerLinuxContainerRuntime.addCGroupParentIfRequired(), or otherwise changing 
it to ignore the containerIdStr. Doing this and removing the code that 
cherry-picks the PID in container-executor does work, but the NM still creates 
the per-container cgroups as well - they're just not used. The other issue with 
this approach is that the cpu.shares is still updated (to reflect the requested 
vcores allotment) in the per-container cgroup, so it is ignored. In our code, 
we addressed this by passing the cpu.shares value in the docker run 
--cpu-shares command line argument.

I'm still thinking about the best way to address this. Currently most of the 
resourceHandler processing happens at the linuxContainerExecutor level. But 
there is clearly a difference in how cgroups need to be handled for docker vs 
linux cases. In the docker case, we should arguably use docker command line 
arguments instead of directly setting up cgroups.

One option would be to provide a runtime interface useResourceHandlers() which 
for Docker would return false. We could then disable all of the resource 
handling processing that happens in the container executor, and add the 
necessary interfaces to handle cgroup parameters to the docker runtime.

Another option would be to move the resource handler processing down into the 
runtime. This is a bigger change, but may be cleaner. The docker runtime may 
still just ignore those handlers, but that detail would be hidden at the 
container executor level.

cc:, [~ebadger] [~jlowe] [~eyang] [~shaneku...@gmail.com] [~billie.rinaldi]

 

> Container cgroups are leaked when using docker
> --
>
> Key: YARN-8648
> URL: https://issues.apache.org/jira/browse/YARN-8648
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
>  Labels: Docker
>
> When you run with docker and enable cgroups for cpu, docker creates cgroups 
> for all resources on the system, not just for cpu.  For instance, if the 
> {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, 
> the nodemanager will create a cgroup for each container under 
> {{/sys/fs/cgroup/cpu/hadoop-yarn}}.  In the docker case, we pass this path 
> via the {{--cgroup-parent}} command line argument.   Docker then creates a 
> cgroup for the docker container under that, for instance: 
> {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}.
> When the container exits, docker cleans up the {{docker_container_id}} 
> cgroup, and the nodemanager cleans up the {{container_id}} cgroup,   All is 
> good under {{/sys/fs/cgroup/hadoop-yarn}}.
> The problem is that docker also creates that same hierarchy under every 
> resource under {{/sys/fs/cgroup}}.  On the rhel7 system I am using, these 
> are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, 
> perf_event, and systemd.So for instance, docker creates 
> {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but 
> it only cleans up the leaf cgroup {{docker_container_id}}.  Nobody cleans up 
> the {{container_id}} cgroups for these other resources.  On one of our busy 
> clusters, we found > 100,000 of these leaked cgroups.
> I found this in our 2.8-based version of hadoop, but I have been able to 
> repro with current hadoop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent

2018-08-10 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576900#comment-16576900
 ] 

Zian Chen commented on YARN-8509:
-

Hi [~eepayne], sure, let address these two questions,

1) the summation is for each user, we calculate the minimum of these two 
expression, one is the pending resource for this user per partition, another 
one is user limit (which is queue_capacity * user_limit_factor) - user used 
resource per partition

2) I think there is some misunderstanding here. First of all, after the title 
been changed, this Jira is not intend to only support balancing of queues after 
satisfied. It intend to change the general strategy of how user limit is been 
calculated in preemption scenario.

So the queue capacities I mentioned here for the example is an initial state, 
which is like this,

 
|| ||queue-a||queue-b||queue-c||queue-b||
|Guaranteed|30|30|30|10|
|Used|10|40|50|0|
|Pending|6|30|30|0|

this configuration should able to happen if we set user_limit_percent to 50 and 
user_limit_factor to 1.0f, 3.0f, 3.0f and 2.0f respectively. But within current 
equation, this initial state won't happen.

user_limit = 
          min(max(current_capacity)/ #active_users, 
                         current_capacity * user_limit_percent), 
          queue_capacity * user_limit_factor)

in above case, queue-b's  queue_capacity * user_limit_factor is 90GB while 
max(current_capacity)/ #active_users, current_capacity * user_limit_percent) is 
40GB, this will make user-limit-factor don't make any effect at all, and 
headroom becomes zero for queue-b. 

So the point is, we should let user-limit to reach at most queue_capacity * 
user_limit_factor

 

> Total pending resource calculation in preemption should use user-limit factor 
> instead of minimum-user-limit-percent
> ---
>
> Key: YARN-8509
> URL: https://issues.apache.org/jira/browse/YARN-8509
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8509.001.patch, YARN-8509.002.patch, 
> YARN-8509.003.patch
>
>
> In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total 
> pending resource based on user-limit percent and user-limit factor which will 
> cap pending resource for each user to the minimum of user-limit pending and 
> actual pending. This will prevent queue from taking more pending resource to 
> achieve queue balance after all queue satisfied with its ideal allocation.
>   
>  We need to change the logic to let queue pending can go beyond userlimit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8488) Need to add "SUCCEED" state to YARN service

2018-08-10 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576888#comment-16576888
 ] 

genericqa commented on YARN-8488:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} YARN-8488 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8488 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935198/YARN-8488.2.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21569/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Need to add "SUCCEED" state to YARN service
> ---
>
> Key: YARN-8488
> URL: https://issues.apache.org/jira/browse/YARN-8488
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8488.1.patch, YARN-8488.2.patch
>
>
> Existing YARN service has following states:
> {code} 
> public enum ServiceState {
>   ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING,
>   UPGRADING_AUTO_FINALIZE;
> }
> {code} 
> Ideally we should add "SUCCEEDED" state in order to support long running 
> applications like Tensorflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8648) Container cgroups are leaked when using docker

2018-08-10 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576855#comment-16576855
 ] 

Jim Brennan edited comment on YARN-8648 at 8/10/18 9:37 PM:


Another problem we have seen is that container-executor still has code that 
cherry-picks the PID of the launch shell from the docker container and writes 
that into the {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/tasks}} file, 
effectively moving it from 
{{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}} to 
{{/sys/fs/cgroup/cpu/hadoop-yarn/container_id}}.   So you end up with one 
process out of the container in the {{container_id}} cgroup, and the rest in 
the {{container_id/docker_container_id}} cgroup.

Since we are passing the {{--cgroup-parent}} to docker, there is no need to 
manually write the pid - we can just remove the code that does this.  


was (Author: jim_brennan):
Another problem we have seen is that container-executor still has code that 
cherry-picks the PID of the launch shell from the docker container and writes 
that into the {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/tasks}} file, 
effectively moving it from 
{{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}} to 
{{/sys/fs/cgroup/cpu/hadoop-yarn/container_id}}.   So you end up with one 
process out of the container in the {{container_id}} cgroup, and the rest in 
the {{container_id/docker_container_id}} cgroup.



> Container cgroups are leaked when using docker
> --
>
> Key: YARN-8648
> URL: https://issues.apache.org/jira/browse/YARN-8648
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
>  Labels: Docker
>
> When you run with docker and enable cgroups for cpu, docker creates cgroups 
> for all resources on the system, not just for cpu.  For instance, if the 
> {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, 
> the nodemanager will create a cgroup for each container under 
> {{/sys/fs/cgroup/cpu/hadoop-yarn}}.  In the docker case, we pass this path 
> via the {{--cgroup-parent}} command line argument.   Docker then creates a 
> cgroup for the docker container under that, for instance: 
> {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}.
> When the container exits, docker cleans up the {{docker_container_id}} 
> cgroup, and the nodemanager cleans up the {{container_id}} cgroup,   All is 
> good under {{/sys/fs/cgroup/hadoop-yarn}}.
> The problem is that docker also creates that same hierarchy under every 
> resource under {{/sys/fs/cgroup}}.  On the rhel7 system I am using, these 
> are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, 
> perf_event, and systemd.So for instance, docker creates 
> {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but 
> it only cleans up the leaf cgroup {{docker_container_id}}.  Nobody cleans up 
> the {{container_id}} cgroups for these other resources.  On one of our busy 
> clusters, we found > 100,000 of these leaked cgroups.
> I found this in our 2.8-based version of hadoop, but I have been able to 
> repro with current hadoop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8488) Need to add "SUCCEED" state to YARN service

2018-08-10 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576878#comment-16576878
 ] 

genericqa commented on YARN-8488:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} YARN-8488 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8488 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935185/YARN-8488.1.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21568/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Need to add "SUCCEED" state to YARN service
> ---
>
> Key: YARN-8488
> URL: https://issues.apache.org/jira/browse/YARN-8488
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8488.1.patch, YARN-8488.2.patch
>
>
> Existing YARN service has following states:
> {code} 
> public enum ServiceState {
>   ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING,
>   UPGRADING_AUTO_FINALIZE;
> }
> {code} 
> Ideally we should add "SUCCEEDED" state in order to support long running 
> applications like Tensorflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8488) Need to add "SUCCEED" state to YARN service

2018-08-10 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8488:
---
Attachment: (was: YARN-8488.2.patch)

> Need to add "SUCCEED" state to YARN service
> ---
>
> Key: YARN-8488
> URL: https://issues.apache.org/jira/browse/YARN-8488
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8488.1.patch, YARN-8488.2.patch
>
>
> Existing YARN service has following states:
> {code} 
> public enum ServiceState {
>   ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING,
>   UPGRADING_AUTO_FINALIZE;
> }
> {code} 
> Ideally we should add "SUCCEEDED" state in order to support long running 
> applications like Tensorflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8488) Need to add "SUCCEED" state to YARN service

2018-08-10 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8488:
---
Attachment: YARN-8488.2.patch

> Need to add "SUCCEED" state to YARN service
> ---
>
> Key: YARN-8488
> URL: https://issues.apache.org/jira/browse/YARN-8488
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8488.1.patch, YARN-8488.2.patch
>
>
> Existing YARN service has following states:
> {code} 
> public enum ServiceState {
>   ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING,
>   UPGRADING_AUTO_FINALIZE;
> }
> {code} 
> Ideally we should add "SUCCEEDED" state in order to support long running 
> applications like Tensorflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8488) Need to add "SUCCEED" state to YARN service

2018-08-10 Thread Suma Shivaprasad (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576877#comment-16576877
 ] 

Suma Shivaprasad commented on YARN-8488:


Attached patch which adds states SUCCEEDED to ServiceState, SUCCEEDED/FAILED to 
ComponentState.

Earlier ComponentInstance State was marked as STOPPED for all available restart 
policies. Now it is SUCCEEDED/FAILED depending on the exit status. One pending 
issue is when there is a graceful stop sent via Client RPC, component instance 
STATE is not marked as STOPPED. Will fix this in subsequent patch.

For restartPolicy=ON_FAILURE/NEVER, when all component instances terminate, 
component = SUCCEEDED if all component instances succeed else marked as FAILED.





> Need to add "SUCCEED" state to YARN service
> ---
>
> Key: YARN-8488
> URL: https://issues.apache.org/jira/browse/YARN-8488
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8488.1.patch, YARN-8488.2.patch
>
>
> Existing YARN service has following states:
> {code} 
> public enum ServiceState {
>   ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING,
>   UPGRADING_AUTO_FINALIZE;
> }
> {code} 
> Ideally we should add "SUCCEEDED" state in order to support long running 
> applications like Tensorflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-10 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576875#comment-16576875
 ] 

Yufei Gu edited comment on YARN-8632 at 8/10/18 9:28 PM:
-

Your patch doesn't apply to trunk. You said the bug is in trunk as well, can 
you provide a patch for the trunk?
What is the version does your patch target? 2.7.2?


was (Author: yufeigu):
Your patch doesn't apply to trunk? You said the bug is in trunk as well, can 
you provide a patch for the trunk?
What is the version does your patch target? 2.7.2?

> No data in file realtimetrack.json after running SchedulerLoadSimulator
> ---
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Assignee: Xianghao Lu
>Priority: Major
> Attachments: YARN-8632.001.patch
>
>
> Recently, I have beenning using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, Everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
> second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-10 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576875#comment-16576875
 ] 

Yufei Gu commented on YARN-8632:


Your patch doesn't apply to trunk? You said the bug is in trunk as well, can 
you provide a patch for the trunk?
What is the version does your patch target? 2.7.2?

> No data in file realtimetrack.json after running SchedulerLoadSimulator
> ---
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Assignee: Xianghao Lu
>Priority: Major
> Attachments: YARN-8632.001.patch
>
>
> Recently, I have beenning using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, Everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
> second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8488) Need to add "SUCCEED" state to YARN service

2018-08-10 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8488:
---
Attachment: YARN-8488.2.patch

> Need to add "SUCCEED" state to YARN service
> ---
>
> Key: YARN-8488
> URL: https://issues.apache.org/jira/browse/YARN-8488
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8488.1.patch, YARN-8488.2.patch
>
>
> Existing YARN service has following states:
> {code} 
> public enum ServiceState {
>   ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING,
>   UPGRADING_AUTO_FINALIZE;
> }
> {code} 
> Ideally we should add "SUCCEEDED" state in order to support long running 
> applications like Tensorflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent

2018-08-10 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573997#comment-16573997
 ] 

Zian Chen edited comment on YARN-8509 at 8/10/18 9:21 PM:
--

Hi Eric, thanks for the comments. Discussed with Wangda, the patch uploaded 
before is not correct due to misunderstand of the original problem.

I have changed the Jira title. The intention of this Jira is to fix calculation 
of pending resource consider user-limit in preemption scenario. Currently, 
pending resource calculation in preemption uses the calculation algorithm in 
scheduling which is this one,
{code:java}
user_limit = min(max(current_capacity)/ #active_users, current_capacity * 
user_limit_percent), queue_capacity * user_limit_factor)
{code}
this is good for scheduling cause we want to make sure users can get at least 
"minimum-user-limit-percent" of resource to use, which is more like a lower 
bound of user-limit. However we should not capture total pending resource a 
leaf queue can get by minimum-user-limit-percent, instead, we want to use 
user-limit-factor which is the upper bound to capture pending resource in 
preemption. Cause if we use minimum-user-limit-percent to capture pending 
resource, resource under-utilization will happen in preemption scenario. Thus, 
we suggest the pending resource calculation for preemption should use this 
formula.

 
{code:java}
total_pending(partition,queue) = min {Q_max(partition) - Q_used(partition), Σ 
(min {
User.ulf(partition) - User.used(partition), User.pending(partition})}
{code}
Let me give an example,

 
{code:java}
   Root
/  |  \  \
   a   b   c  d
  30  30  30  10


 1) Only one node (n1) in the cluster, it has 100G.

 2) app1 submit to queue-a, asks for 10G used, 6G pending.

 3) app2 submit to queue-b, asks for 40G used, 30G pending.

 4) app3 submit to queue-c, asks for 50G used, 30G pending.
{code}
Here we only have one user, and user-limit-factor for queues are

 

 
||Queue name|| minimum-user-limit-percent ||user-limit-factor||
|         a|                      50|        1.0 f|
|         b|                      50|        3.0 f|
|         c|                      50|        3.0 f|
|         d|                      50|        2.0 f|

With old calculation, user-limit for queue-a is 30G, which can let app1 has 6G 
pending, but user-limit for queue-b becomes 40G, which makes headroom become 
zero after subtract 40G used, the 30G pending resource been asked can not be 
accepted, same thing with queue-c too.

However if we see this test case in preemption point of view, we should allow 
queue-b and queue-c take more pending resources. Because even though queue-a 
has 30G guaranteed configured, it's under utilization. And by pending resource 
captured by the old algorithm, queue-b and queue-c can not take available 
resource through preemption which make the cluster resource not used 
effectively. 

To summarize, since user-limit-factor maintains the hard-limit of how much 
resource can be used by a user, we should calculate pending resource consider 
user-limit-factor instead of minimum-user-limit-percent. 

Could you share your opinion on this, [~eepayne]?

 

 


was (Author: zian chen):
Hi Eric, thanks for the comments. Discussed with Wangda, the patch uploaded 
before is not correct due to misunderstand of the original problem.

I have changed the Jira title. The intention of this Jira is to fix calculation 
of pending resource consider user-limit in preemption scenario. Currently, 
pending resource calculation in preemption uses the calculation algorithm in 
scheduling which is this one,
{code:java}
user_limit = min(max(current_capacity)/ #active_users, current_capacity * 
user_limit_percent), queue_capacity * user_limit_factor)
{code}
this is good for scheduling cause we want to make sure users can get at least 
"minimum-user-limit-percent" of resource to use, which is more like a lower 
bound of user-limit. However we should not capture total pending resource a 
leaf queue can get by minimum-user-limit-percent, instead, we want to use 
user-limit-factor which is the upper bound to capture pending resource in 
preemption. Cause if we use minimum-user-limit-percent to capture pending 
resource, resource under-utilization will happen in preemption scenario. Thus, 
we suggest the pending resource calculation for preemption should use this 
formula.

 
{code:java}

total_pending(partition,queue) = min {Q_max(partition) - Q_used(partition), Σ 
(min {
User.ulf(partition) - User.used(partition), User.pending(partition})}
{code}
Let me give an example,

 
{code:java}
   Root
/  |  \  \
   a   b   c  d
  30  30  30  10


 1) Only one node (n1) in the cluster, it has 100G.

 2) app1 submit to queue-a, asks for 10G used, 6G pending.

 3) app2 submit to queue-b, asks for 40G used, 30G pending.

 

[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker

2018-08-10 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576855#comment-16576855
 ] 

Jim Brennan commented on YARN-8648:
---

Another problem we have seen is that container-executor still has code that 
cherry-picks the PID of the launch shell from the docker container and writes 
that into the {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/tasks}} file, 
effectively moving it from 
{{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}} to 
{{/sys/fs/cgroup/cpu/hadoop-yarn/container_id}}.   So you end up with one 
process out of the container in the {{container_id}} cgroup, and the rest in 
the {{container_id/docker_container_id}} cgroup.



> Container cgroups are leaked when using docker
> --
>
> Key: YARN-8648
> URL: https://issues.apache.org/jira/browse/YARN-8648
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
>  Labels: Docker
>
> When you run with docker and enable cgroups for cpu, docker creates cgroups 
> for all resources on the system, not just for cpu.  For instance, if the 
> {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, 
> the nodemanager will create a cgroup for each container under 
> {{/sys/fs/cgroup/cpu/hadoop-yarn}}.  In the docker case, we pass this path 
> via the {{--cgroup-parent}} command line argument.   Docker then creates a 
> cgroup for the docker container under that, for instance: 
> {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}.
> When the container exits, docker cleans up the {{docker_container_id}} 
> cgroup, and the nodemanager cleans up the {{container_id}} cgroup,   All is 
> good under {{/sys/fs/cgroup/hadoop-yarn}}.
> The problem is that docker also creates that same hierarchy under every 
> resource under {{/sys/fs/cgroup}}.  On the rhel7 system I am using, these 
> are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, 
> perf_event, and systemd.So for instance, docker creates 
> {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but 
> it only cleans up the leaf cgroup {{docker_container_id}}.  Nobody cleans up 
> the {{container_id}} cgroups for these other resources.  On one of our busy 
> clusters, we found > 100,000 of these leaked cgroups.
> I found this in our 2.8-based version of hadoop, but I have been able to 
> repro with current hadoop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8648) Container cgroups are leaked when using docker

2018-08-10 Thread Jim Brennan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated YARN-8648:
--
Labels: Docker  (was: )

> Container cgroups are leaked when using docker
> --
>
> Key: YARN-8648
> URL: https://issues.apache.org/jira/browse/YARN-8648
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
>  Labels: Docker
>
> When you run with docker and enable cgroups for cpu, docker creates cgroups 
> for all resources on the system, not just for cpu.  For instance, if the 
> {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, 
> the nodemanager will create a cgroup for each container under 
> {{/sys/fs/cgroup/cpu/hadoop-yarn}}.  In the docker case, we pass this path 
> via the {{--cgroup-parent}} command line argument.   Docker then creates a 
> cgroup for the docker container under that, for instance: 
> {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}.
> When the container exits, docker cleans up the {{docker_container_id}} 
> cgroup, and the nodemanager cleans up the {{container_id}} cgroup,   All is 
> good under {{/sys/fs/cgroup/hadoop-yarn}}.
> The problem is that docker also creates that same hierarchy under every 
> resource under {{/sys/fs/cgroup}}.  On the rhel7 system I am using, these 
> are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, 
> perf_event, and systemd.So for instance, docker creates 
> {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but 
> it only cleans up the leaf cgroup {{docker_container_id}}.  Nobody cleans up 
> the {{container_id}} cgroups for these other resources.  On one of our busy 
> clusters, we found > 100,000 of these leaked cgroups.
> I found this in our 2.8-based version of hadoop, but I have been able to 
> repro with current hadoop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8648) Container cgroups are leaked when using docker

2018-08-10 Thread Jim Brennan (JIRA)
Jim Brennan created YARN-8648:
-

 Summary: Container cgroups are leaked when using docker
 Key: YARN-8648
 URL: https://issues.apache.org/jira/browse/YARN-8648
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jim Brennan
Assignee: Jim Brennan


When you run with docker and enable cgroups for cpu, docker creates cgroups for 
all resources on the system, not just for cpu.  For instance, if the 
{{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, 
the nodemanager will create a cgroup for each container under 
{{/sys/fs/cgroup/cpu/hadoop-yarn}}.  In the docker case, we pass this path via 
the {{--cgroup-parent}} command line argument.   Docker then creates a cgroup 
for the docker container under that, for instance: 
{{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}.

When the container exits, docker cleans up the {{docker_container_id}} cgroup, 
and the nodemanager cleans up the {{container_id}} cgroup,   All is good under 
{{/sys/fs/cgroup/hadoop-yarn}}.

The problem is that docker also creates that same hierarchy under every 
resource under {{/sys/fs/cgroup}}.  On the rhel7 system I am using, these are: 
blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, 
perf_event, and systemd.So for instance, docker creates 
{{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but it 
only cleans up the leaf cgroup {{docker_container_id}}.  Nobody cleans up the 
{{container_id}} cgroups for these other resources.  On one of our busy 
clusters, we found > 100,000 of these leaked cgroups.

I found this in our 2.8-based version of hadoop, but I have been able to repro 
with current hadoop.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8520) Document best practice for user management

2018-08-10 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576840#comment-16576840
 ] 

Hudson commented on YARN-8520:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14749 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14749/])
YARN-8520. Document best practice for user management. Contributed by (skumpf: 
rev e7951c69cbc85604f72cdd3559122d4e2c1ea127)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md


> Document best practice for user management
> --
>
> Key: YARN-8520
> URL: https://issues.apache.org/jira/browse/YARN-8520
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation, yarn
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8520.001.patch, YARN-8520.002.patch, 
> YARN-8520.003.patch, YARN-8520.004.patch, YARN-8520.005.patch
>
>
> Docker container must have consistent username and groups with host operating 
> system when external mount points are exposed to docker container.  This 
> prevents malicious or unauthorized impersonation to occur.  This task is to 
> document the best practice to ensure user and group membership are consistent 
> across docker containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8520) Document best practice for user management

2018-08-10 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576838#comment-16576838
 ] 

Eric Yang commented on YARN-8520:
-

Thank you [~shaneku...@gmail.com].

> Document best practice for user management
> --
>
> Key: YARN-8520
> URL: https://issues.apache.org/jira/browse/YARN-8520
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation, yarn
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8520.001.patch, YARN-8520.002.patch, 
> YARN-8520.003.patch, YARN-8520.004.patch, YARN-8520.005.patch
>
>
> Docker container must have consistent username and groups with host operating 
> system when external mount points are exposed to docker container.  This 
> prevents malicious or unauthorized impersonation to occur.  This task is to 
> document the best practice to ensure user and group membership are consistent 
> across docker containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8520) Document best practice for user management

2018-08-10 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576832#comment-16576832
 ] 

Shane Kumpf commented on YARN-8520:
---

Thanks for the contribution, [~eyang]! I committed this to trunk and branch-3.1.

> Document best practice for user management
> --
>
> Key: YARN-8520
> URL: https://issues.apache.org/jira/browse/YARN-8520
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation, yarn
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8520.001.patch, YARN-8520.002.patch, 
> YARN-8520.003.patch, YARN-8520.004.patch, YARN-8520.005.patch
>
>
> Docker container must have consistent username and groups with host operating 
> system when external mount points are exposed to docker container.  This 
> prevents malicious or unauthorized impersonation to occur.  This task is to 
> document the best practice to ensure user and group membership are consistent 
> across docker containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8488) Need to add "SUCCEED" state to YARN service

2018-08-10 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576825#comment-16576825
 ] 

genericqa commented on YARN-8488:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} YARN-8488 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8488 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935185/YARN-8488.1.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21567/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Need to add "SUCCEED" state to YARN service
> ---
>
> Key: YARN-8488
> URL: https://issues.apache.org/jira/browse/YARN-8488
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8488.1.patch
>
>
> Existing YARN service has following states:
> {code} 
> public enum ServiceState {
>   ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING,
>   UPGRADING_AUTO_FINALIZE;
> }
> {code} 
> Ideally we should add "SUCCEEDED" state in order to support long running 
> applications like Tensorflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes

2018-08-10 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576815#comment-16576815
 ] 

Zian Chen commented on YARN-7417:
-

Thanks for the review [~eyang], that was my original plan to make it reusable, 
but after investigating the logic, it's very almost impossible to achieve this.

The main reason is one formal parameter can not be abstracted into a common 
class type. The "AggregatedLogFormat.ContainerLogsReader logReader" in 
TFileAggregatedLogsBlock is a static class which can not be converted into any 
of the parent class of the formal parameter "InputStream in" in 
IndexedFileAggregatedLogsBlock

> re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to 
> remove duplicate codes
> 
>
> Key: YARN-7417
> URL: https://issues.apache.org/jira/browse/YARN-7417
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-7417.001.patch, YARN-7417.002.patch
>
>
> This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock and 
> TFileAggregatedLogsBlock
>  # We have duplicate code in current implementation of 
> IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock which can be 
> abstract into common methods. 
>  # render method is too long in both of these class, we want to make it clear 
> by abstracting some helper methods out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8488) Need to add "SUCCEED" state to YARN service

2018-08-10 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8488:
---
Attachment: YARN-8488.1.patch

> Need to add "SUCCEED" state to YARN service
> ---
>
> Key: YARN-8488
> URL: https://issues.apache.org/jira/browse/YARN-8488
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8488.1.patch
>
>
> Existing YARN service has following states:
> {code} 
> public enum ServiceState {
>   ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING,
>   UPGRADING_AUTO_FINALIZE;
> }
> {code} 
> Ideally we should add "SUCCEEDED" state in order to support long running 
> applications like Tensorflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes

2018-08-10 Thread Zian Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen updated YARN-7417:

Description: 
This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock and 
TFileAggregatedLogsBlock
 # We have duplicate code in current implementation of 
IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock which can be 
abstract into common methods. 
 # render method is too long in both of these class, we want to make it clear 
by abstracting some helper methods out.

  was:
This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock

We have duplicate code in current implementation of 
IndexedFileAggregatedLogsBlock and IndexedFileAggregatedLogsBlock which can be 
abstract into common method. 


> re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to 
> remove duplicate codes
> 
>
> Key: YARN-7417
> URL: https://issues.apache.org/jira/browse/YARN-7417
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-7417.001.patch, YARN-7417.002.patch
>
>
> This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock and 
> TFileAggregatedLogsBlock
>  # We have duplicate code in current implementation of 
> IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock which can be 
> abstract into common methods. 
>  # render method is too long in both of these class, we want to make it clear 
> by abstracting some helper methods out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes

2018-08-10 Thread Zian Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen updated YARN-7417:

Description: 
This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock

We have duplicate code in current implementation of 
IndexedFileAggregatedLogsBlock and IndexedFileAggregatedLogsBlock which can be 
abstract into common method. 

  was:We have duplicate code in current implementation of 
IndexedFileAggregatedLogsBlock and 


> re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to 
> remove duplicate codes
> 
>
> Key: YARN-7417
> URL: https://issues.apache.org/jira/browse/YARN-7417
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-7417.001.patch, YARN-7417.002.patch
>
>
> This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock
> We have duplicate code in current implementation of 
> IndexedFileAggregatedLogsBlock and IndexedFileAggregatedLogsBlock which can 
> be abstract into common method. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8523) Interactive docker shell

2018-08-10 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576801#comment-16576801
 ] 

Eric Yang edited comment on YARN-8523 at 8/10/18 8:21 PM:
--

[~Zian Chen] 

# Without step 2 session management, the terminal session will terminate with 
Connection Closed when node manager restarts.  User can retry with browser 
reload to obtain a new session.  I think web socket connection is reliable 
enough to keep the connection alive.  If it drops, user can always get a new 
session of docker exec.
# There is nothing to handle on node manager shutdown or crash because remote 
connection closed will be displayed to browser.



was (Author: eyang):
[~Zian Chen] # Without step 2 session management, the terminal session will 
terminate with Connection Closed when node manager restarts.  User can retry 
with browser reload to obtain a new session.  I think web socket connection is 
reliable enough to keep the connection alive.  If it drops, user can always get 
a new session of docker exec.

# There is nothing to handle on node manager shutdown or crash because remote 
connection closed will be displayed to browser.


> Interactive docker shell
> 
>
> Key: YARN-8523
> URL: https://issues.apache.org/jira/browse/YARN-8523
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Priority: Major
>  Labels: Docker
>
> Some application might require interactive unix commands executions to carry 
> out operations.  Container-executor can interface with docker exec to debug 
> or analyze docker containers while the application is running.  It would be 
> nice to support an API to invoke docker exec to perform unix commands and 
> report back the output to application master.  Application master can 
> distribute and aggregate execution of the commands to record in application 
> master log file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8523) Interactive docker shell

2018-08-10 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576801#comment-16576801
 ] 

Eric Yang commented on YARN-8523:
-

[~Zian Chen] # Without step 2 session management, the terminal session will 
terminate with Connection Closed when node manager restarts.  User can retry 
with browser reload to obtain a new session.  I think web socket connection is 
reliable enough to keep the connection alive.  If it drops, user can always get 
a new session of docker exec.

# There is nothing to handle on node manager shutdown or crash because remote 
connection closed will be displayed to browser.


> Interactive docker shell
> 
>
> Key: YARN-8523
> URL: https://issues.apache.org/jira/browse/YARN-8523
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Priority: Major
>  Labels: Docker
>
> Some application might require interactive unix commands executions to carry 
> out operations.  Container-executor can interface with docker exec to debug 
> or analyze docker containers while the application is running.  It would be 
> nice to support an API to invoke docker exec to perform unix commands and 
> report back the output to application master.  Application master can 
> distribute and aggregate execution of the commands to record in application 
> master log file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes

2018-08-10 Thread Zian Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen updated YARN-7417:

Description: We have duplicate code in current implementation of 
IndexedFileAggregatedLogsBlock and 

> re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to 
> remove duplicate codes
> 
>
> Key: YARN-7417
> URL: https://issues.apache.org/jira/browse/YARN-7417
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-7417.001.patch, YARN-7417.002.patch
>
>
> We have duplicate code in current implementation of 
> IndexedFileAggregatedLogsBlock and 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint

2018-08-10 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576795#comment-16576795
 ] 

genericqa commented on YARN-8559:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
44s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
20s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
39s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
55s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
26s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
50s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 51s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 3 new + 17 unchanged - 0 fixed = 20 total (was 17) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 64m 
28s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
18s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}125m 44s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:a716388 |
| JIRA Issue | YARN-8559 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935167/YARN-8559-branch-2.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux fdea2c2d6bf7 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2 / 2024260 |
| maven | version: Apache Maven 3.3.9 
(bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) |
| Default Java | 1.7.0_181 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/21566/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21566/testReport/ |
| Max. process+thread count | 873 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn |
| 

[jira] [Commented] (YARN-7494) Add muti node lookup support for better placement

2018-08-10 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576769#comment-16576769
 ] 

genericqa commented on YARN-7494:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
48s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 20s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 52s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 29s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}136m 26s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueManagementDynamicEditPolicy
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-7494 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935162/YARN-7494.14.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ca679d8c7cbf 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 15241c6 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/21565/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21565/testReport/ |
| Max. process+thread count | 928 (vs. ulimit of 1) |
| modules | C: 

[jira] [Commented] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes

2018-08-10 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576732#comment-16576732
 ] 

Eric Yang commented on YARN-7417:
-

[~Zian Chen] Thank you for the patch.  Is it possible to reuse 
processContainerLog?  I think they look similar that the method maybe reusable? 
 I think it is safe to assume that logs stored in TFile is also UTF-8 encoding.

> re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to 
> remove duplicate codes
> 
>
> Key: YARN-7417
> URL: https://issues.apache.org/jira/browse/YARN-7417
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-7417.001.patch, YARN-7417.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8647) Add a flag to disable move app between queues

2018-08-10 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi reassigned YARN-8647:
---

Assignee: Abhishek Modi

> Add a flag to disable move app between queues
> -
>
> Key: YARN-8647
> URL: https://issues.apache.org/jira/browse/YARN-8647
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: sarun singla
>Assignee: Abhishek Modi
>Priority: Critical
>
> For large clusters where we have a number of users submitting application, we 
> can result into scenarios where app developers try to move the queues for 
> their applications using something like 
> {code:java}
> yarn application -movetoqueue  -queue {code}
> Today there is no way of disabling the feature if one does not want 
> application developers to use  the feature.
> *Solution:*
> We should probably add an option to disable move queue feature from RM side 
> on the cluster level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8523) Interactive docker shell

2018-08-10 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576697#comment-16576697
 ] 

Zian Chen commented on YARN-8523:
-

Good point, I think we can make this Jira focus on building this pipline and 
create a second Jira for persistent docker exec state while NM restart. Two 
more questions here,
 # Should we give user sone kind of notification while NM restart and we are 
trying to resuming the docker exec? What if we get several retries to reconnect 
and don't succeed? We may need to give user some friendly reminder to avoid the 
misunderstanding of session been stuck for too long, right?
 # How to handle NM unexpected shutdown(like crash, etc) scenario?

> Interactive docker shell
> 
>
> Key: YARN-8523
> URL: https://issues.apache.org/jira/browse/YARN-8523
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Priority: Major
>  Labels: Docker
>
> Some application might require interactive unix commands executions to carry 
> out operations.  Container-executor can interface with docker exec to debug 
> or analyze docker containers while the application is running.  It would be 
> nice to support an API to invoke docker exec to perform unix commands and 
> report back the output to application master.  Application master can 
> distribute and aggregate execution of the commands to record in application 
> master log file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8647) Add a flag to disable move app between queues

2018-08-10 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8647:
-
Summary: Add a flag to disable move app between queues  (was: Add a flag to 
disable move queue)

> Add a flag to disable move app between queues
> -
>
> Key: YARN-8647
> URL: https://issues.apache.org/jira/browse/YARN-8647
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: sarun singla
>Priority: Critical
>
> For large clusters where we have a number of users submitting application, we 
> can result into scenarios where app developers try to move the queues for 
> their applications using something like 
> {code:java}
> yarn application -movetoqueue  -queue {code}
> Today there is no way of disabling the feature if one does not want 
> application developers to use  the feature.
> *Solution:*
> We should probably add an option to disable move queue feature from RM side 
> on the cluster level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8647) Add a flag to disable move queue

2018-08-10 Thread sarun singla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sarun singla updated YARN-8647:
---
Description: 
For large clusters where we have a number of users submitting application, we 
can result into scenarios where app developers try to move the queues for their 
applications using something like 
{code:java}
yarn application -movetoqueue  -queue {code}
Today there is no way of disabling the feature if one does not want application 
developers to use  the feature.

*Solution:*

We should probably add an option to disable move queue feature from RM side on 
the cluster level.

  was:
For large clusters where we have a number of users submitting application, we 
can result into scenarios where app developers try to move the queues for their 
applications using something like 
{code:java}
yarn application -movetoqueue  -queue {code}
Today there is no way of disabling the feature if one does not want application 
developers to use  the feature.

Solution:

We probably add an option to disable move queue feature from RM side on the 
cluster level.


> Add a flag to disable move queue
> 
>
> Key: YARN-8647
> URL: https://issues.apache.org/jira/browse/YARN-8647
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: sarun singla
>Priority: Critical
>
> For large clusters where we have a number of users submitting application, we 
> can result into scenarios where app developers try to move the queues for 
> their applications using something like 
> {code:java}
> yarn application -movetoqueue  -queue {code}
> Today there is no way of disabling the feature if one does not want 
> application developers to use  the feature.
> *Solution:*
> We should probably add an option to disable move queue feature from RM side 
> on the cluster level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8647) Add a flag to disable move queue

2018-08-10 Thread sarun singla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sarun singla updated YARN-8647:
---
Description: 
For large clusters where we have a number of users submitting application, we 
can result into scenarios where app developers try to move the queues for their 
applications using something like 
{code:java}
yarn application -movetoqueue  -queue {code}
Today there is no way of disabling the feature if one does not want application 
developers to use  the feature.

Solution:

We probably add an option to disable move queue feature from RM side on the 
cluster level.

  was:
For large clusters where we have a number of users submitting application, we 
can result into scenarios where app developers try to move the queues for their 
applications using something like 

{code}yarn application -movetoqueue  -queue \{/code}

Today there is no way of disabling the feature if one does not want application 
developers to use  the feature.

Solution:

We probably add an option to disable move queue feature from RM side on the 
cluster level.


> Add a flag to disable move queue
> 
>
> Key: YARN-8647
> URL: https://issues.apache.org/jira/browse/YARN-8647
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: sarun singla
>Priority: Critical
>
> For large clusters where we have a number of users submitting application, we 
> can result into scenarios where app developers try to move the queues for 
> their applications using something like 
> {code:java}
> yarn application -movetoqueue  -queue {code}
> Today there is no way of disabling the feature if one does not want 
> application developers to use  the feature.
> Solution:
> We probably add an option to disable move queue feature from RM side on the 
> cluster level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

2018-08-10 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576681#comment-16576681
 ] 

Shane Kumpf commented on YARN-8569:
---

Thanks for filing this [~eyang]. I have a use case that could benefit from this 
as well.

When running in containers, one challenging piece is determining how much CPU 
and memory was allocated to the container. Traditional os tooling shows the 
totals from the host. This is especially problematic for tools like Ambari, 
which use os tooling to dynamically set configuration. Exposing the resource 
request details via this mechanism could be used to solve this problem.

> Create an interface to provide cluster information to application
> -
>
> Key: YARN-8569
> URL: https://issues.apache.org/jira/browse/YARN-8569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Priority: Major
>  Labels: Docker
>
> Some program requires container hostnames to be known for application to run. 
>  For example, distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=0
> # On ps1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=1
> # On worker0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=0
> # On worker1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=1
> {code}
> This is a bit cumbersome to orchestrate via Distributed Shell, or YARN 
> services launch_command.  In addition, the dynamic parameters do not work 
> with YARN flex command.  This is the classic pain point for application 
> developer attempt to automate system environment settings as parameter to end 
> user application.
> It would be great if YARN Docker integration can provide a simple option to 
> expose hostnames of the yarn service via a mounted file.  The file content 
> gets updated when flex command is performed.  This allows application 
> developer to consume system environment settings via a standard interface.  
> It is like /proc/devices for Linux, but for Hadoop.  This may involve 
> updating a file in distributed cache, and allow mounting of the file via 
> container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8647) Add a flag to disable move queue

2018-08-10 Thread sarun (JIRA)
sarun created YARN-8647:
---

 Summary: Add a flag to disable move queue
 Key: YARN-8647
 URL: https://issues.apache.org/jira/browse/YARN-8647
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.3
Reporter: sarun


For large clusters where we have a number of users submitting application, we 
can result into scenarios where app developers try to move the queues for their 
applications using something like 

{code}yarn application -movetoqueue  -queue \{/code}

Today there is no way of disabling the feature if one does not want application 
developers to use  the feature.

Solution:

We probably add an option to disable move queue feature from RM side on the 
cluster level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-10 Thread Yufei Gu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-8632:
--

Assignee: Xianghao Lu

> No data in file realtimetrack.json after running SchedulerLoadSimulator
> ---
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Assignee: Xianghao Lu
>Priority: Major
> Attachments: YARN-8632.001.patch
>
>
> Recently, I have beenning using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, Everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
> second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-10 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1657#comment-1657
 ] 

Yufei Gu commented on YARN-8632:


Added you to the contributor list and assign this to you. Will review later.

> No data in file realtimetrack.json after running SchedulerLoadSimulator
> ---
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Assignee: Xianghao Lu
>Priority: Major
> Attachments: YARN-8632.001.patch
>
>
> Recently, I have beenning using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, Everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
> second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent

2018-08-10 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576342#comment-16576342
 ] 

Eric Payne edited comment on YARN-8509 at 8/10/18 6:12 PM:
---

[~Zian Chen], can I please get a couple of clarifications?
{quote}
total_pending(partition,queue) = min {Q_max(partition) - Q_used(partition), Σ 
(min \{
User.ulf\(partition\) - User.used\(partition\), User.pending\(partition\}\)\}
{quote}
1) In the above pseudo-code, what is being summed by the summation?

2) In the above example, queue-a is the only one that's underserved, so the the 
first round of preemption should actually preempt 6G from queues b and c. The 
amount preempted from each queue depends on the age of the containers, but you 
could end up with something like queue-b consuming 40G and pending 30G and 
queue-c consuming 44G and pending 36G before the second round of preemption, at 
which point queue-a would be satisfied and only queues b and c have pending 
resource requests. Since this issue is meant to address the balancing of queues 
that are over their capacity, I don't understand why queue-a is involved in the 
above use case. Can you provide a simpler example that only involves the 
balancing of over-served queues?


was (Author: eepayne):
[~Zian Chen], can I please get a couple of clarifications?
{quote}total_pending(partition,queue) = min {Q_max(partition) - 
Q_used(partition), Σ (min
Unknown macro: \{User.ulf(partition) - User.used(partition), 
User.pending(partition})}{quote}
1) In the above pseudo-code, what is being summed by the summation?

2) In the above example, queue-a is the only one that's underserved, so the the 
first round of preemption should actually preempt 6G from queues b and c. The 
amount preempted from each queue depends on the age of the containers, but you 
could end up with something like queue-b consuming 40G and pending 30G and 
queue-c consuming 44G and pending 36G before the second round of preemption, at 
which point queue-a would be satisfied and only queues b and c have pending 
resource requests. Since this issue is meant to address the balancing of queues 
that are over their capacity, I don't understand why queue-a is involved in the 
above use case. Can you provide a simpler example that only involves the 
balancing of over-served queues?

> Total pending resource calculation in preemption should use user-limit factor 
> instead of minimum-user-limit-percent
> ---
>
> Key: YARN-8509
> URL: https://issues.apache.org/jira/browse/YARN-8509
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8509.001.patch, YARN-8509.002.patch, 
> YARN-8509.003.patch
>
>
> In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total 
> pending resource based on user-limit percent and user-limit factor which will 
> cap pending resource for each user to the minimum of user-limit pending and 
> actual pending. This will prevent queue from taking more pending resource to 
> achieve queue balance after all queue satisfied with its ideal allocation.
>   
>  We need to change the logic to let queue pending can go beyond userlimit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8623) Update Docker examples to use image which exists

2018-08-10 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576655#comment-16576655
 ] 

Shane Kumpf edited comment on YARN-8623 at 8/10/18 6:07 PM:


[~ccondit-target] - thanks for looking into this. I see what you mean about the 
challenge with using that image. I think you are correct that the existing 
apache/hadoop-runner image serves a different type of use case than we need 
here.

IMO, our target should be an image capable of running MapReduce pi, as that's 
the example we provide in the docs. If it also works for the Spark shell 
example we provide in our docs, with the appropriate spark install/config, that 
would be great, but I don't think it's a requirement to start.

Thinking about what we need to meet that goal, I think a majority of the users 
we would be targeting with this guide will have all of Hadoop installed on the 
nodes where these containers are running. Instead of trying to package the 
latest version of Apache Hadoop as an image, I think our example would be 
easier to maintain if we guide the user towards bind mounting the Hadoop 
binaries and configuration from the NodeManager hosts. If we take that 
approach, I believe the image should only need to include a JDK and set up 
JAVA_HOME. We might even be able to use an existing openjdk image.

Assuming we can't leverage an existing image, one question I'm unsure about is 
the process of creating an "official" image under the apache docker hub 
namespace. [~elek] - can you share any insights around this process?

 


was (Author: shaneku...@gmail.com):
[~ccondit-target] - thanks for looking into this. I see what you mean about the 
challenge with using that image. I think you are correct that the existing 
apache/hadoop-runner image serves a different type of use case than we need 
here.

IMO, our target should be an image capable of running MapReduce pi, as that's 
the example we provide in the docs. If it also works for Spark shell example we 
provide in our docs, with the appropriate spark install/config, that would be 
great, but I don't think it's a requirement to start.  
!/jira/images/icons/emoticons/smile.png!

Thinking about what we need to meet that goal, I think a majority of the users 
we would be targeting with this guide will have all of Hadoop installed on the 
nodes where these containers are running. Instead of trying to package the 
latest version of Apache Hadoop as an image, I think our example would be 
easier to maintain if we guide the user towards bind mounting the Hadoop 
binaries and configuration from the NodeManager hosts. If we take that 
approach, I believe the image should only need to include a JDK and set up 
JAVA_HOME. We might even be able to use an existing openjdk image.

Assuming we can't leverage an existing image, one question I'm unsure about is 
the process of creating an "official" image under the apache docker hub 
namespace. [~elek] - can you share any insights around this process?

 

> Update Docker examples to use image which exists
> 
>
> Key: YARN-8623
> URL: https://issues.apache.org/jira/browse/YARN-8623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Craig Condit
>Priority: Minor
>  Labels: Docker
>
> The example Docker image given in the documentation 
> (images/hadoop-docker:latest) does not exist. We could change 
> images/hadoop-docker:latest to apache/hadoop-runner:latest, which does exist. 
> We'd need to do a quick sanity test to see if the image works with YARN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8623) Update Docker examples to use image which exists

2018-08-10 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576655#comment-16576655
 ] 

Shane Kumpf commented on YARN-8623:
---

[~ccondit-target] - thanks for looking into this. I see what you mean about the 
challenge with using that image. I think you are correct that the existing 
apache/hadoop-runner image serves a different type of use case than we need 
here.

IMO, our target should be an image capable of running MapReduce pi, as that's 
the example we provide in the docs. If it also works for Spark shell example we 
provide in our docs, with the appropriate spark install/config, that would be 
great, but I don't think it's a requirement to start.  
!/jira/images/icons/emoticons/smile.png!

Thinking about what we need to meet that goal, I think a majority of the users 
we would be targeting with this guide will have all of Hadoop installed on the 
nodes where these containers are running. Instead of trying to package the 
latest version of Apache Hadoop as an image, I think our example would be 
easier to maintain if we guide the user towards bind mounting the Hadoop 
binaries and configuration from the NodeManager hosts. If we take that 
approach, I believe the image should only need to include a JDK and set up 
JAVA_HOME. We might even be able to use an existing openjdk image.

Assuming we can't leverage an existing image, one question I'm unsure about is 
the process of creating an "official" image under the apache docker hub 
namespace. [~elek] - can you share any insights around this process?

 

> Update Docker examples to use image which exists
> 
>
> Key: YARN-8623
> URL: https://issues.apache.org/jira/browse/YARN-8623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Craig Condit
>Priority: Minor
>  Labels: Docker
>
> The example Docker image given in the documentation 
> (images/hadoop-docker:latest) does not exist. We could change 
> images/hadoop-docker:latest to apache/hadoop-runner:latest, which does exist. 
> We'd need to do a quick sanity test to see if the image works with YARN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint

2018-08-10 Thread Jonathan Hung (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8559:

Attachment: YARN-8559-branch-2.001.patch

> Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
> 
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8559-branch-2.001.patch, 
> YARN-8559-branch-3.0.001.patch, YARN-8559.001.patch, YARN-8559.002.patch, 
> YARN-8559.003.patch, YARN-8559.004.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint

2018-08-10 Thread Jonathan Hung (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8559:

Attachment: (was: YARN-8559-branch-2.001.patch)

> Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
> 
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8559-branch-2.001.patch, 
> YARN-8559-branch-3.0.001.patch, YARN-8559.001.patch, YARN-8559.002.patch, 
> YARN-8559.003.patch, YARN-8559.004.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

2018-08-10 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576613#comment-16576613
 ] 

Sunil Govindan commented on YARN-8561:
--

Thanks [~leftnoteasy], Overall looks good to me

Will create additional jiras as discussed in this ticket. If there are no 
objections, I will commit this patch tomorrow. +1

Thanks

> [Submarine] Add initial implementation: training job submission and job 
> history retrieve.
> -
>
> Key: YARN-8561
> URL: https://issues.apache.org/jira/browse/YARN-8561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8561.001.patch, YARN-8561.002.patch, 
> YARN-8561.003.patch, YARN-8561.004.patch, YARN-8561.005.patch
>
>
> Added following parts:
> 1) New subcomponent of YARN, under applications/ project. 
> 2) Tensorflow training job submission, including training (single node and 
> distributed). 
> - Supported Docker container. 
> - Support GPU isolation. 
> - Support YARN registry DNS.
> 3) Retrieve job history.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7494) Add muti node lookup support for better placement

2018-08-10 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576604#comment-16576604
 ] 

Sunil Govindan commented on YARN-7494:
--

As discussed, removed updating multiNodePolicyName in Queue interface. This is 
changed to CSQueue.

[~cheersyang] pls help to review.

> Add muti node lookup support for better placement
> -
>
> Key: YARN-7494
> URL: https://issues.apache.org/jira/browse/YARN-7494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-7494.001.patch, YARN-7494.002.patch, 
> YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, 
> YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, 
> YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, 
> YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, 
> YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png
>
>
> Instead of single node, for effectiveness we can consider a multi node lookup 
> based on partition to start with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8635) Container Resource localization fails if umask is 077

2018-08-10 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8635:
---
Summary: Container Resource localization fails if umask is 077  (was: 
Container fails to start if umask is 077)

> Container Resource localization fails if umask is 077
> -
>
> Key: YARN-8635
> URL: https://issues.apache.org/jira/browse/YARN-8635
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
>
> {code}
> java.io.IOException: Application application_1533652359071_0001 
> initialization failed (exitCode=255) with output: main : command provided 0
> main : run as user is mapred
> main : requested yarn user is mapred
> Path 
> /opt/HA/OSBR310/nmlocal/usercache/mapred/appcache/application_1533652359071_0001
>  has permission 700 but needs permission 750.
> Did not create any app directories
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:411)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1229)
> Caused by: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=255:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:402)
> ... 1 more
> Caused by: ExitCodeException exitCode=255:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009)
> at org.apache.hadoop.util.Shell.run(Shell.java:902)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
> ... 2 more
> 2018-08-08 17:43:26,918 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_e04_1533652359071_0001_01_27 transitioned from 
> LOCALIZING to LOCALIZATION_FAILED
> 2018-08-08 17:43:26,916 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_e04_1533652359071_0001_01_31 startLocalizer is : 
> 255
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=255:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:402)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1229)
> Caused by: ExitCodeException exitCode=255:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009)
> at org.apache.hadoop.util.Shell.run(Shell.java:902)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
> ... 2 more
> 2018-08-08 17:43:26,923 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Localizer failed for containe
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7494) Add muti node lookup support for better placement

2018-08-10 Thread Sunil Govindan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-7494:
-
Attachment: YARN-7494.14.patch

> Add muti node lookup support for better placement
> -
>
> Key: YARN-7494
> URL: https://issues.apache.org/jira/browse/YARN-7494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-7494.001.patch, YARN-7494.002.patch, 
> YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, 
> YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, 
> YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, 
> YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, 
> YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png
>
>
> Instead of single node, for effectiveness we can consider a multi node lookup 
> based on partition to start with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8520) Document best practice for user management

2018-08-10 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576597#comment-16576597
 ] 

Shane Kumpf commented on YARN-8520:
---

Thanks for the updated patch, [~eyang]! +1 on the latest patch. I'll commit 
this later today if there is no additional feedback.

 

> Document best practice for user management
> --
>
> Key: YARN-8520
> URL: https://issues.apache.org/jira/browse/YARN-8520
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation, yarn
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8520.001.patch, YARN-8520.002.patch, 
> YARN-8520.003.patch, YARN-8520.004.patch, YARN-8520.005.patch
>
>
> Docker container must have consistent username and groups with host operating 
> system when external mount points are exposed to docker container.  This 
> prevents malicious or unauthorized impersonation to occur.  This task is to 
> document the best practice to ensure user and group membership are consistent 
> across docker containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8520) Document best practice for user management

2018-08-10 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576565#comment-16576565
 ] 

genericqa commented on YARN-8520:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
34m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 32s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 46m  8s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8520 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935150/YARN-8520.005.patch |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux aa253326073f 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 0a71bf1 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 407 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21564/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Document best practice for user management
> --
>
> Key: YARN-8520
> URL: https://issues.apache.org/jira/browse/YARN-8520
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation, yarn
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8520.001.patch, YARN-8520.002.patch, 
> YARN-8520.003.patch, YARN-8520.004.patch, YARN-8520.005.patch
>
>
> Docker container must have consistent username and groups with host operating 
> system when external mount points are exposed to docker container.  This 
> prevents malicious or unauthorized impersonation to occur.  This task is to 
> document the best practice to ensure user and group membership are consistent 
> across docker containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7935) Expose container's hostname to applications running within the docker container

2018-08-10 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7935:

Parent Issue: YARN-8472  (was: YARN-3611)

> Expose container's hostname to applications running within the docker 
> container
> ---
>
> Key: YARN-7935
> URL: https://issues.apache.org/jira/browse/YARN-7935
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
>  Labels: Docker
> Attachments: YARN-7935.1.patch, YARN-7935.2.patch, YARN-7935.3.patch, 
> YARN-7935.4.patch
>
>
> Some applications have a need to bind to the container's hostname (like 
> Spark) which is different from the NodeManager's hostname(NM_HOST which is 
> available as an env during container launch) when launched through Docker 
> runtime. The container's hostname can be exposed to applications via an env 
> CONTAINER_HOSTNAME. Another potential candidate is the container's IP but 
> this can be addressed in a separate jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7994) Add support for network-alias in docker run for user defined networks

2018-08-10 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7994:

Parent Issue: YARN-8472  (was: YARN-3611)

> Add support for network-alias in docker run for user defined networks 
> --
>
> Key: YARN-7994
> URL: https://issues.apache.org/jira/browse/YARN-7994
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
>  Labels: Docker
>
> Docker Embedded DNS supports DNS resolution for containers by one or more of 
> its configured {{--network-alias}} within a user-defined network. 
> DockerRunCommand should support this option for DNS resolution to work 
> through docker embedded DNS 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8520) Document best practice for user management

2018-08-10 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576501#comment-16576501
 ] 

Eric Yang commented on YARN-8520:
-

[~shaneku...@gmail.com] Thanks for the feedback offline.  Patch 005 includes 
your edits for static user and bind mount /etc/passwd solutions.

> Document best practice for user management
> --
>
> Key: YARN-8520
> URL: https://issues.apache.org/jira/browse/YARN-8520
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation, yarn
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8520.001.patch, YARN-8520.002.patch, 
> YARN-8520.003.patch, YARN-8520.004.patch, YARN-8520.005.patch
>
>
> Docker container must have consistent username and groups with host operating 
> system when external mount points are exposed to docker container.  This 
> prevents malicious or unauthorized impersonation to occur.  This task is to 
> document the best practice to ensure user and group membership are consistent 
> across docker containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8520) Document best practice for user management

2018-08-10 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8520:

Attachment: YARN-8520.005.patch

> Document best practice for user management
> --
>
> Key: YARN-8520
> URL: https://issues.apache.org/jira/browse/YARN-8520
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation, yarn
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8520.001.patch, YARN-8520.002.patch, 
> YARN-8520.003.patch, YARN-8520.004.patch, YARN-8520.005.patch
>
>
> Docker container must have consistent username and groups with host operating 
> system when external mount points are exposed to docker container.  This 
> prevents malicious or unauthorized impersonation to occur.  This task is to 
> document the best practice to ensure user and group membership are consistent 
> across docker containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7957) [UI2] Yarn service delete option disappears after stopping application

2018-08-10 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576395#comment-16576395
 ] 

Sunil Govindan commented on YARN-7957:
--

Thanks [~akhilpb] Makes sense to me. Pls help to implement same.

> [UI2] Yarn service delete option disappears after stopping application
> --
>
> Key: YARN-7957
> URL: https://issues.apache.org/jira/browse/YARN-7957
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Akhil PB
>Priority: Critical
> Attachments: YARN-7957.001.patch
>
>
> Steps:
> 1) Launch yarn service
> 2) Go to service page and click on Setting button->"Stop Service". The 
> application will be stopped.
> 3) Refresh page
> Here, setting button disappears. Thus, user can not delete service from UI 
> after stopping application
> Expected behavior:
> Setting button should be present on UI page after application is stopped. If 
> application is stopped, setting button should only have "Delete Service" 
> action available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7190) Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath

2018-08-10 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576384#comment-16576384
 ] 

Sean Busbey commented on YARN-7190:
---

fix version now updated and filed YARN-8646 for myself to track getting the 
website updated.

> Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user 
> classpath
> 
>
> Key: YARN-7190
> URL: https://issues.apache.org/jira/browse/YARN-7190
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineclient, timelinereader, timelineserver
>Affects Versions: 2.9.0, 3.0.1, 3.0.2, 3.0.3, 3.0.x
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Major
> Fix For: YARN-5355_branch2, 3.1.0, 2.9.1, 3.0.3
>
> Attachments: YARN-7190-YARN-5355_branch2.01.patch, 
> YARN-7190-YARN-5355_branch2.02.patch, YARN-7190-YARN-5355_branch2.03.patch, 
> YARN-7190.01.patch, YARN-7190.02.patch
>
>
> [~jlowe] had a good observation about the user classpath getting extra jars 
> in hadoop 2.x brought in with TSv2.  If users start picking up Hadoop 2,x's 
> version of HBase jars instead of the ones they shipped with their job, it 
> could be a problem.
> So when TSv2 is to be used in 2,x, the hbase related jars should come into 
> only the NM classpath not the user classpath.
> Here is a list of some jars
> {code}
> commons-csv-1.0.jar
> commons-el-1.0.jar
> commons-httpclient-3.1.jar
> disruptor-3.3.0.jar
> findbugs-annotations-1.3.9-1.jar
> hbase-annotations-1.2.6.jar
> hbase-client-1.2.6.jar
> hbase-common-1.2.6.jar
> hbase-hadoop2-compat-1.2.6.jar
> hbase-hadoop-compat-1.2.6.jar
> hbase-prefix-tree-1.2.6.jar
> hbase-procedure-1.2.6.jar
> hbase-protocol-1.2.6.jar
> hbase-server-1.2.6.jar
> htrace-core-3.1.0-incubating.jar
> jamon-runtime-2.4.1.jar
> jasper-compiler-5.5.23.jar
> jasper-runtime-5.5.23.jar
> jcodings-1.0.8.jar
> joni-2.1.2.jar
> jsp-2.1-6.1.14.jar
> jsp-api-2.1-6.1.14.jar
> jsr311-api-1.1.1.jar
> metrics-core-2.2.0.jar
> servlet-api-2.5-6.1.14.jar
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8646) republish 3.0.3 release notes so they include YARN-7190

2018-08-10 Thread Sean Busbey (JIRA)
Sean Busbey created YARN-8646:
-

 Summary: republish 3.0.3 release notes so they include YARN-7190
 Key: YARN-8646
 URL: https://issues.apache.org/jira/browse/YARN-8646
 Project: Hadoop YARN
  Issue Type: Task
  Components: documentation
Affects Versions: 3.0.3
Reporter: Sean Busbey
Assignee: Sean Busbey


now that 3.0.3 is listed as a fix version for YARN-7190, figure out what needs 
to happen for the release notes page to include it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7190) Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath

2018-08-10 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated YARN-7190:
--
Fix Version/s: 3.0.3

> Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user 
> classpath
> 
>
> Key: YARN-7190
> URL: https://issues.apache.org/jira/browse/YARN-7190
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineclient, timelinereader, timelineserver
>Affects Versions: 2.9.0, 3.0.1, 3.0.2, 3.0.3, 3.0.x
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Major
> Fix For: YARN-5355_branch2, 3.1.0, 2.9.1, 3.0.3
>
> Attachments: YARN-7190-YARN-5355_branch2.01.patch, 
> YARN-7190-YARN-5355_branch2.02.patch, YARN-7190-YARN-5355_branch2.03.patch, 
> YARN-7190.01.patch, YARN-7190.02.patch
>
>
> [~jlowe] had a good observation about the user classpath getting extra jars 
> in hadoop 2.x brought in with TSv2.  If users start picking up Hadoop 2,x's 
> version of HBase jars instead of the ones they shipped with their job, it 
> could be a problem.
> So when TSv2 is to be used in 2,x, the hbase related jars should come into 
> only the NM classpath not the user classpath.
> Here is a list of some jars
> {code}
> commons-csv-1.0.jar
> commons-el-1.0.jar
> commons-httpclient-3.1.jar
> disruptor-3.3.0.jar
> findbugs-annotations-1.3.9-1.jar
> hbase-annotations-1.2.6.jar
> hbase-client-1.2.6.jar
> hbase-common-1.2.6.jar
> hbase-hadoop2-compat-1.2.6.jar
> hbase-hadoop-compat-1.2.6.jar
> hbase-prefix-tree-1.2.6.jar
> hbase-procedure-1.2.6.jar
> hbase-protocol-1.2.6.jar
> hbase-server-1.2.6.jar
> htrace-core-3.1.0-incubating.jar
> jamon-runtime-2.4.1.jar
> jasper-compiler-5.5.23.jar
> jasper-runtime-5.5.23.jar
> jcodings-1.0.8.jar
> joni-2.1.2.jar
> jsp-2.1-6.1.14.jar
> jsp-api-2.1-6.1.14.jar
> jsr311-api-1.1.1.jar
> metrics-core-2.2.0.jar
> servlet-api-2.5-6.1.14.jar
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7190) Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath

2018-08-10 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576370#comment-16576370
 ] 

Sean Busbey commented on YARN-7190:
---

Today I dug through the git history and branch-3.0. I can confirm that this fix 
is present in 3.0.3. I've sent an email to yarn-dev@ because I can't edit fix 
versions yet. Once I can I'll update this and figure out republishing the 3.0.3 
release notes.

> Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user 
> classpath
> 
>
> Key: YARN-7190
> URL: https://issues.apache.org/jira/browse/YARN-7190
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineclient, timelinereader, timelineserver
>Affects Versions: 2.9.0, 3.0.1, 3.0.2, 3.0.3, 3.0.x
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Major
> Fix For: YARN-5355_branch2, 3.1.0, 2.9.1
>
> Attachments: YARN-7190-YARN-5355_branch2.01.patch, 
> YARN-7190-YARN-5355_branch2.02.patch, YARN-7190-YARN-5355_branch2.03.patch, 
> YARN-7190.01.patch, YARN-7190.02.patch
>
>
> [~jlowe] had a good observation about the user classpath getting extra jars 
> in hadoop 2.x brought in with TSv2.  If users start picking up Hadoop 2,x's 
> version of HBase jars instead of the ones they shipped with their job, it 
> could be a problem.
> So when TSv2 is to be used in 2,x, the hbase related jars should come into 
> only the NM classpath not the user classpath.
> Here is a list of some jars
> {code}
> commons-csv-1.0.jar
> commons-el-1.0.jar
> commons-httpclient-3.1.jar
> disruptor-3.3.0.jar
> findbugs-annotations-1.3.9-1.jar
> hbase-annotations-1.2.6.jar
> hbase-client-1.2.6.jar
> hbase-common-1.2.6.jar
> hbase-hadoop2-compat-1.2.6.jar
> hbase-hadoop-compat-1.2.6.jar
> hbase-prefix-tree-1.2.6.jar
> hbase-procedure-1.2.6.jar
> hbase-protocol-1.2.6.jar
> hbase-server-1.2.6.jar
> htrace-core-3.1.0-incubating.jar
> jamon-runtime-2.4.1.jar
> jasper-compiler-5.5.23.jar
> jasper-runtime-5.5.23.jar
> jcodings-1.0.8.jar
> joni-2.1.2.jar
> jsp-2.1-6.1.14.jar
> jsp-api-2.1-6.1.14.jar
> jsr311-api-1.1.1.jar
> metrics-core-2.2.0.jar
> servlet-api-2.5-6.1.14.jar
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent

2018-08-10 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576342#comment-16576342
 ] 

Eric Payne commented on YARN-8509:
--

[~Zian Chen], can I please get a couple of clarifications?
{quote}total_pending(partition,queue) = min {Q_max(partition) - 
Q_used(partition), Σ (min
Unknown macro: \{User.ulf(partition) - User.used(partition), 
User.pending(partition})}{quote}
1) In the above pseudo-code, what is being summed by the summation?

2) In the above example, queue-a is the only one that's underserved, so the the 
first round of preemption should actually preempt 6G from queues b and c. The 
amount preempted from each queue depends on the age of the containers, but you 
could end up with something like queue-b consuming 40G and pending 30G and 
queue-c consuming 44G and pending 36G before the second round of preemption, at 
which point queue-a would be satisfied and only queues b and c have pending 
resource requests. Since this issue is meant to address the balancing of queues 
that are over their capacity, I don't understand why queue-a is involved in the 
above use case. Can you provide a simpler example that only involves the 
balancing of over-served queues?

> Total pending resource calculation in preemption should use user-limit factor 
> instead of minimum-user-limit-percent
> ---
>
> Key: YARN-8509
> URL: https://issues.apache.org/jira/browse/YARN-8509
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8509.001.patch, YARN-8509.002.patch, 
> YARN-8509.003.patch
>
>
> In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total 
> pending resource based on user-limit percent and user-limit factor which will 
> cap pending resource for each user to the minimum of user-limit pending and 
> actual pending. This will prevent queue from taking more pending resource to 
> achieve queue balance after all queue satisfied with its ideal allocation.
>   
>  We need to change the logic to let queue pending can go beyond userlimit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8640) Restore previous state in container-executor if write_exit_code_file_as_nm fails

2018-08-10 Thread Jim Brennan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated YARN-8640:
--
Attachment: YARN-8640.001.patch

> Restore previous state in container-executor if write_exit_code_file_as_nm 
> fails
> 
>
> Key: YARN-8640
> URL: https://issues.apache.org/jira/browse/YARN-8640
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-8640.001.patch
>
>
> The container-executor function {{write_exit_code_file_as_nm}} had a number 
> of failure conditions where it just returns -1 without restoring previous 
> state.
> This is not a problem in any of the places where it is currently called, but 
> it could be a problem if future code changes call it before code that depends 
> on the previous state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8303) YarnClient should contact TimelineReader for application/attempt/container report

2018-08-10 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576216#comment-16576216
 ] 

genericqa commented on YARN-8303:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 46s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  8s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  3m 16s{color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
19s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 24m 
46s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
36s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}120m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.client.api.impl.TestTimelineReaderClientImpl 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8303 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935078/YARN-8303.poc.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0d7314950243 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 
17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (YARN-8644) Make RMAppImpl$FinalTransition more readable + add more test coverage

2018-08-10 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576150#comment-16576150
 ] 

genericqa commented on YARN-8644:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
4s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 68m 
42s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}131m 58s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8644 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935116/YARN-8644.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a5be5d0d 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 0a71bf1 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21561/testReport/ |
| Max. process+thread count | 941 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21561/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Make RMAppImpl$FinalTransition more readable + 

[jira] [Commented] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-10 Thread Xianghao Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576121#comment-16576121
 ] 

Xianghao Lu commented on YARN-8632:
---

[~ywskycn] or [~yufeigu]  Woud you like to review the patch? Thanks!

> No data in file realtimetrack.json after running SchedulerLoadSimulator
> ---
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Priority: Major
> Attachments: YARN-8632.001.patch
>
>
> Recently, I have beenning using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, Everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
> second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7494) Add muti node lookup support for better placement

2018-08-10 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576105#comment-16576105
 ] 

genericqa commented on YARN-7494:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 53s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  8s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 68m  
9s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}120m 19s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-7494 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935111/YARN-7494.13.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2fa15e46833d 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 0a71bf1 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21560/testReport/ |
| Max. process+thread count | 914 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21560/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add muti node lookup support for better placement

  1   2   >