[jira] [Created] (YARN-8651) We must increase min Resource in FairScheduler after increase number of NM
stefanlee created YARN-8651: --- Summary: We must increase min Resource in FairScheduler after increase number of NM Key: YARN-8651 URL: https://issues.apache.org/jira/browse/YARN-8651 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0 Reporter: stefanlee Nowadays,our cluster has a strange phenomena,before we increase the scale of NodeManager, the resource utilization could be 100%, but we found the resource utilization does not promote as the cluster expansion and many queue's used resource stay at min resource although they have many demand request. Then we increase their min resource dynamically,the resource utilization of these queues goes up and the resources of the entire cluster are also used after that. So i doubte if the bug in *FairSharePolicy#compare.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6972) Adding RM ClusterId in AppInfo
[ https://issues.apache.org/jira/browse/YARN-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577021#comment-16577021 ] genericqa commented on YARN-6972: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 59s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 29s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 68m 52s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}128m 11s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-6972 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935219/YARN-6972.015.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8887c417e31c 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a2a8c48 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21574/testReport/ | | Max. process+thread count | 879 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21574/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Adding RM ClusterId in AppInfo >
[jira] [Commented] (YARN-8650) Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and Invalid event: CONTAINER_LAUNCHED at DONE
[ https://issues.apache.org/jira/browse/YARN-8650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577015#comment-16577015 ] lujie commented on YARN-8650: - I have uploaded the log that contains the exception. Please check! We also find one NPE bug while nodemanager is shutdown![YARN-8649|https://issues.apache.org/jira/browse/YARN-8649] > Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and Invalid event: > CONTAINER_LAUNCHED at DONE > - > > Key: YARN-8650 > URL: https://issues.apache.org/jira/browse/YARN-8650 > Project: Hadoop YARN > Issue Type: Bug >Reporter: lujie >Priority: Major > Attachments: hadoop-hires-nodemanager-hadoop11.log, > hadoop-hires-nodemanager-hadoop15.log > > > We have tested the hadoop while nodemanager is shutting down and encounter > two InvalidStateTransitionException: > {code:java} > 2018-08-04 14:29:33,025 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Can't handle this event at current state: Current: [DONE], eventType: > [CONTAINER_KILLED_ON_REQUEST], container: > [container_1533364185282_0001_01_01] > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > CONTAINER_KILLED_ON_REQUEST at DONE > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > {code} > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > CONTAINER_LAUNCHED at DONE > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > {code} > We have analysis these two bugs, and find that shutdown will send kill event > and hence cause these two exception. We have test the our cluster for many > time and can determinately reproduce it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8650) Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and Invalid event: CONTAINER_LAUNCHED at DONE
[ https://issues.apache.org/jira/browse/YARN-8650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577015#comment-16577015 ] lujie edited comment on YARN-8650 at 8/11/18 2:20 AM: -- I have uploaded the log files that contains the exception. Please check! We also find one NPE bug while nodemanager is shutdown!YARN-8649 was (Author: xiaoheipangzi): I have uploaded the log that contains the exception. Please check! We also find one NPE bug while nodemanager is shutdown![YARN-8649|https://issues.apache.org/jira/browse/YARN-8649] > Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and Invalid event: > CONTAINER_LAUNCHED at DONE > - > > Key: YARN-8650 > URL: https://issues.apache.org/jira/browse/YARN-8650 > Project: Hadoop YARN > Issue Type: Bug >Reporter: lujie >Priority: Major > Attachments: hadoop-hires-nodemanager-hadoop11.log, > hadoop-hires-nodemanager-hadoop15.log > > > We have tested the hadoop while nodemanager is shutting down and encounter > two InvalidStateTransitionException: > {code:java} > 2018-08-04 14:29:33,025 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Can't handle this event at current state: Current: [DONE], eventType: > [CONTAINER_KILLED_ON_REQUEST], container: > [container_1533364185282_0001_01_01] > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > CONTAINER_KILLED_ON_REQUEST at DONE > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > {code} > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > CONTAINER_LAUNCHED at DONE > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > {code} > We have analysis these two bugs, and find that shutdown will send kill event > and hence cause these two exception. We have test the our cluster for many > time and can determinately reproduce it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577017#comment-16577017 ] lujie commented on YARN-8649: - I also find another two InvalidStateTransitionException while nodemanger is shutting down: Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE Invalid event: CONTAINER_LAUNCHED at DONE I also create a new issue[YARN-8650|https://issues.apache.org/jira/browse/YARN-8650] to track it. > Similar as YARN-4355:NPE while processing localizer heartbeat > - > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: lujie >Priority: Major > Attachments: hadoop-hires-nodemanager-hadoop11.log > > > I have noticed that a nodemanager was getting NPEs while tearing down. The > reason maybe similar to YARN-4355 which is reported by [# Jason Lowe]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8650) Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and Invalid event: CONTAINER_LAUNCHED at DONE
[ https://issues.apache.org/jira/browse/YARN-8650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8650: Attachment: hadoop-hires-nodemanager-hadoop11.log > Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and Invalid event: > CONTAINER_LAUNCHED at DONE > - > > Key: YARN-8650 > URL: https://issues.apache.org/jira/browse/YARN-8650 > Project: Hadoop YARN > Issue Type: Bug >Reporter: lujie >Priority: Major > Attachments: hadoop-hires-nodemanager-hadoop11.log, > hadoop-hires-nodemanager-hadoop15.log > > > We have tested the hadoop while nodemanager is shutting down and encounter > two InvalidStateTransitionException: > {code:java} > 2018-08-04 14:29:33,025 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Can't handle this event at current state: Current: [DONE], eventType: > [CONTAINER_KILLED_ON_REQUEST], container: > [container_1533364185282_0001_01_01] > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > CONTAINER_KILLED_ON_REQUEST at DONE > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > {code} > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > CONTAINER_LAUNCHED at DONE > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > {code} > We have analysis these two bugs, and find that shutdown will send kill event > and hence cause these two exception. We have test the our cluster for many > time and can determinately reproduce it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8650) Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and Invalid event: CONTAINER_LAUNCHED at DONE
[ https://issues.apache.org/jira/browse/YARN-8650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8650: Attachment: hadoop-hires-nodemanager-hadoop15.log > Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and Invalid event: > CONTAINER_LAUNCHED at DONE > - > > Key: YARN-8650 > URL: https://issues.apache.org/jira/browse/YARN-8650 > Project: Hadoop YARN > Issue Type: Bug >Reporter: lujie >Priority: Major > Attachments: hadoop-hires-nodemanager-hadoop15.log > > > We have tested the hadoop while nodemanager is shutting down and encounter > two InvalidStateTransitionException: > {code:java} > 2018-08-04 14:29:33,025 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Can't handle this event at current state: Current: [DONE], eventType: > [CONTAINER_KILLED_ON_REQUEST], container: > [container_1533364185282_0001_01_01] > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > CONTAINER_KILLED_ON_REQUEST at DONE > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > {code} > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > CONTAINER_LAUNCHED at DONE > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > {code} > We have analysis these two bugs, and find that shutdown will send kill event > and hence cause these two exception. We have test the our cluster for many > time and can determinately reproduce it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8650) Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and Invalid event: CONTAINER_LAUNCHED at DONE
lujie created YARN-8650: --- Summary: Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and Invalid event: CONTAINER_LAUNCHED at DONE Key: YARN-8650 URL: https://issues.apache.org/jira/browse/YARN-8650 Project: Hadoop YARN Issue Type: Bug Reporter: lujie We have tested the hadoop while nodemanager is shutting down and encounter two InvalidStateTransitionException: {code:java} 2018-08-04 14:29:33,025 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Can't handle this event at current state: Current: [DONE], eventType: [CONTAINER_KILLED_ON_REQUEST], container: [container_1533364185282_0001_01_01] org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) at java.lang.Thread.run(Thread.java:745) {code} {code:java} org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: CONTAINER_LAUNCHED at DONE at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) at java.lang.Thread.run(Thread.java:745) {code} We have analysis these two bugs, and find that shutdown will send kill event and hence cause these two exception. We have test the our cluster for many time and can determinately reproduce it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577000#comment-16577000 ] lujie edited comment on YARN-8649 at 8/11/18 1:50 AM: -- Stacktrace(I also upload the log file generated by our cluster): {code:java} java.io.IOException: java.lang.NullPointerException: java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1187) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1151) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:752) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:370) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48) at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:199) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180){code} NodeManager is tearing down and will clean up the local resource. So when the heartbeat comes in, it will do : {code:java} //getPathForLocalization LocalizedResource rsrc = localrsrc.get(req); rsrc.setLocalPath(localPath); {code} rsrc is null and hence NPE happens. I wonder if only adding null check is ok? If is, I will upload a patch. was (Author: xiaoheipangzi): Stacktrace(I also upload the log file generated by our cluster): {code:java} java.io.IOException: java.lang.NullPointerException: java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1187) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1151) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:752) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:370) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48) at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:199) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180){code} NodeManager is tearing down and will clean up the local resource. So when the heartbeat comes in, it will do : {code:java} //getPathForLocalization LocalizedResource
[jira] [Comment Edited] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577000#comment-16577000 ] lujie edited comment on YARN-8649 at 8/11/18 1:48 AM: -- Stacktrace(I also upload the log file generated by our cluster): {code:java} java.io.IOException: java.lang.NullPointerException: java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1187) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1151) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:752) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:370) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48) at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:199) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180){code} NodeManager is tearing down and will clean up the local resource. So when the heartbeat comes in, it will do : {code:java} //getPathForLocalization LocalizedResource rsrc = localrsrc.get(req); rsrc.setLocalPath(localPath); {code} rsrc is null and hence NPE happens. was (Author: xiaoheipangzi): Stacktrace(I also upload the log file generated by our cluster): {code:java} java.io.IOException: java.lang.NullPointerException: java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1187) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1151) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:752) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:370) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48) at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:199) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180) {code} > Similar as YARN-4355:NPE while processing localizer heartbeat > - > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 >
[jira] [Updated] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8649: Summary: Similar as YARN-4355:NPE while processing localizer heartbeat (was: Same as YARN-4355:NPE while processing localizer heartbeat) > Similar as YARN-4355:NPE while processing localizer heartbeat > - > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: lujie >Priority: Major > Attachments: hadoop-hires-nodemanager-hadoop11.log > > > I have noticed that a nodemanager was getting NPEs while tearing down. The > reason maybe similar to YARN-4355 which is reported by [# Jason Lowe]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8649) Same as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8649: Description: I have noticed that a nodemanager was getting NPEs while tearing down. The reason maybe similar to YARN-4355 which is reported by [# Jason Lowe]. (was: I have noticed that a nodemanager was getting NPEs while tearing down. The reason maybe similar to YARN-4355 which reported by [# Jason Lowe]. ) > Same as YARN-4355:NPE while processing localizer heartbeat > -- > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: lujie >Priority: Major > Attachments: hadoop-hires-nodemanager-hadoop11.log > > > I have noticed that a nodemanager was getting NPEs while tearing down. The > reason maybe similar to YARN-4355 which is reported by [# Jason Lowe]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8649) Same as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8649: Attachment: hadoop-hires-nodemanager-hadoop11.log > Same as YARN-4355:NPE while processing localizer heartbeat > -- > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: lujie >Priority: Major > Attachments: hadoop-hires-nodemanager-hadoop11.log > > > I have noticed that a nodemanager was getting NPEs while tearing down. The > reason maybe similar to YARN-4355 which reported by [# Jason Lowe]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8649) Same as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577000#comment-16577000 ] lujie edited comment on YARN-8649 at 8/11/18 1:27 AM: -- Stacktrace(I also upload the log file generated by our cluster): {code:java} java.io.IOException: java.lang.NullPointerException: java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1187) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1151) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:752) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:370) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48) at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:199) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180) {code} was (Author: xiaoheipangzi): Stacktrace: {code:java} java.io.IOException: java.lang.NullPointerException: java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1187) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1151) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:752) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:370) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48) at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:199) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180) {code} > Same as YARN-4355:NPE while processing localizer heartbeat > -- > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: lujie >Priority: Major > Attachments: hadoop-hires-nodemanager-hadoop11.log > > > I have noticed that a nodemanager was getting NPEs while tearing down. The > reason maybe similar to YARN-4355
[jira] [Updated] (YARN-8649) Same as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8649: Description: I have noticed that a nodemanager was getting NPEs while tearing down. The reason maybe similar to YARN-4355 which reported by [# Jason Lowe]. (was: I have noticed that a nodemanager was getting NPEs processing a heartbeat. This is similar to [YARN-4355|https://issues.apache.org/jira/browse/YARN-4355 ] which reported by [# Jason Lowe] ) > Same as YARN-4355:NPE while processing localizer heartbeat > -- > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: lujie >Priority: Major > > I have noticed that a nodemanager was getting NPEs while tearing down. The > reason maybe similar to YARN-4355 which reported by [# Jason Lowe]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8649) Same as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577000#comment-16577000 ] lujie edited comment on YARN-8649 at 8/11/18 1:24 AM: -- Stacktrace: {code:java} java.io.IOException: java.lang.NullPointerException: java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1187) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1151) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:752) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:370) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48) at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:199) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180) {code} was (Author: xiaoheipangzi): Stacktrace: java.io.IOException: java.lang.NullPointerException: java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1187) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1151) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:752) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:370) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48) at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:199) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180) > Same as YARN-4355:NPE while processing localizer heartbeat > -- > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: lujie >Priority: Major > > I have noticed that a nodemanager was getting NPEs processing a heartbeat. > This is similar to > [YARN-4355|https://issues.apache.org/jira/browse/YARN-4355 ] which reported > by [# Jason Lowe] -- This message was sent by
[jira] [Commented] (YARN-8649) Same as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577000#comment-16577000 ] lujie commented on YARN-8649: - Stacktrace: java.io.IOException: java.lang.NullPointerException: java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1187) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1151) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:752) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:370) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48) at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:199) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180) > Same as YARN-4355:NPE while processing localizer heartbeat > -- > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: lujie >Priority: Major > > I have noticed that a nodemanager was getting NPEs processing a heartbeat. > This is similar to > [YARN-4355|https://issues.apache.org/jira/browse/YARN-4355 ] which reported > by [# Jason Lowe] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8649) Same as YARN-4355:NPE while processing localizer heartbeat
lujie created YARN-8649: --- Summary: Same as YARN-4355:NPE while processing localizer heartbeat Key: YARN-8649 URL: https://issues.apache.org/jira/browse/YARN-8649 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.1.1 Reporter: lujie I have noticed that a nodemanager was getting NPEs processing a heartbeat. This is similar to [YARN-4355|https://issues.apache.org/jira/browse/YARN-4355 ] which reported by [# Jason Lowe] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8488) Need to add "SUCCEED" state to YARN service
[ https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576995#comment-16576995 ] genericqa commented on YARN-8488: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 18s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 26s{color} | {color:red} hadoop-yarn-services-core in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 1s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 48s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 34s{color} | {color:red} hadoop-yarn-services-core in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 63m 34s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core | | | Inconsistent synchronization of org.apache.hadoop.yarn.service.ServiceScheduler.timelineServiceEnabled; locked 50% of time Unsynchronized access at ServiceScheduler.java:50% of time Unsynchronized access at ServiceScheduler.java:[line 270] | | Failed junit tests | hadoop.yarn.service.TestServiceAM | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8488 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935216/YARN-8488.4.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8bc962529026 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a2a8c48 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs |
[jira] [Commented] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers
[ https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576981#comment-16576981 ] genericqa commented on YARN-8160: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 31s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 43s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 35s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}113m 51s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8160 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935205/YARN-8160.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 42b2ded5b7ea 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e7951c6 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results |
[jira] [Updated] (YARN-6972) Adding RM ClusterId in AppInfo
[ https://issues.apache.org/jira/browse/YARN-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanuj Nayak updated YARN-6972: -- Attachment: YARN-6972.015.patch > Adding RM ClusterId in AppInfo > -- > > Key: YARN-6972 > URL: https://issues.apache.org/jira/browse/YARN-6972 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Tanuj Nayak >Priority: Major > Attachments: YARN-6972.001.patch, YARN-6972.002.patch, > YARN-6972.003.patch, YARN-6972.004.patch, YARN-6972.005.patch, > YARN-6972.006.patch, YARN-6972.007.patch, YARN-6972.008.patch, > YARN-6972.009.patch, YARN-6972.010.patch, YARN-6972.011.patch, > YARN-6972.012.patch, YARN-6972.013.patch, YARN-6972.014.patch, > YARN-6972.015.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576978#comment-16576978 ] genericqa commented on YARN-7417: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 14s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 43s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 20s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 59m 54s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-7417 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935213/YARN-7417.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8fa556e37cf0 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e7951c6 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21572/testReport/ | | Max. process+thread count | 451 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21572/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > re-factory
[jira] [Commented] (YARN-8488) Need to add "SUCCEED" state to YARN service
[ https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576964#comment-16576964 ] Suma Shivaprasad commented on YARN-8488: Rebased with trunk > Need to add "SUCCEED" state to YARN service > --- > > Key: YARN-8488 > URL: https://issues.apache.org/jira/browse/YARN-8488 > Project: Hadoop YARN > Issue Type: Task > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-8488.1.patch, YARN-8488.2.patch, YARN-8488.3.patch, > YARN-8488.4.patch > > > Existing YARN service has following states: > {code} > public enum ServiceState { > ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING, > UPGRADING_AUTO_FINALIZE; > } > {code} > Ideally we should add "SUCCEEDED" state in order to support long running > applications like Tensorflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8488) Need to add "SUCCEED" state to YARN service
[ https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8488: --- Attachment: YARN-8488.4.patch > Need to add "SUCCEED" state to YARN service > --- > > Key: YARN-8488 > URL: https://issues.apache.org/jira/browse/YARN-8488 > Project: Hadoop YARN > Issue Type: Task > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-8488.1.patch, YARN-8488.2.patch, YARN-8488.3.patch, > YARN-8488.4.patch > > > Existing YARN service has following states: > {code} > public enum ServiceState { > ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING, > UPGRADING_AUTO_FINALIZE; > } > {code} > Ideally we should add "SUCCEEDED" state in order to support long running > applications like Tensorflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-7417: Attachment: YARN-7417.003.patch > re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to > remove duplicate codes > > > Key: YARN-7417 > URL: https://issues.apache.org/jira/browse/YARN-7417 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7417.001.patch, YARN-7417.002.patch, > YARN-7417.003.patch > > > This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock and > TFileAggregatedLogsBlock > # We have duplicate code in current implementation of > IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock which can be > abstract into common methods. > # render method is too long in both of these class, we want to make it clear > by abstracting some helper methods out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7129) Application Catalog for YARN applications
[ https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576947#comment-16576947 ] Eric Yang commented on YARN-7129: - The failed hdfs unit tests are not related to this patch. > Application Catalog for YARN applications > - > > Key: YARN-7129 > URL: https://issues.apache.org/jira/browse/YARN-7129 > Project: Hadoop YARN > Issue Type: New Feature > Components: applications >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN Appstore.pdf, YARN-7129.001.patch, > YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, > YARN-7129.005.patch, YARN-7129.006.patch, YARN-7129.007.patch, > YARN-7129.008.patch, YARN-7129.009.patch, YARN-7129.010.patch, > YARN-7129.011.patch > > > YARN native services provides web services API to improve usability of > application deployment on Hadoop using collection of docker images. It would > be nice to have an application catalog system which provides an editorial and > search interface for YARN applications. This improves usability of YARN for > manage the life cycle of applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8523) Interactive docker shell
[ https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen reassigned YARN-8523: --- Assignee: Zian Chen > Interactive docker shell > > > Key: YARN-8523 > URL: https://issues.apache.org/jira/browse/YARN-8523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Zian Chen >Priority: Major > Labels: Docker > > Some application might require interactive unix commands executions to carry > out operations. Container-executor can interface with docker exec to debug > or analyze docker containers while the application is running. It would be > nice to support an API to invoke docker exec to perform unix commands and > report back the output to application master. Application master can > distribute and aggregate execution of the commands to record in application > master log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8523) Interactive docker shell
[ https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576945#comment-16576945 ] Zian Chen commented on YARN-8523: - Make sense. I'll work on provide an initial patch for this idea. Thanks [~eyang] > Interactive docker shell > > > Key: YARN-8523 > URL: https://issues.apache.org/jira/browse/YARN-8523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Priority: Major > Labels: Docker > > Some application might require interactive unix commands executions to carry > out operations. Container-executor can interface with docker exec to debug > or analyze docker containers while the application is running. It would be > nice to support an API to invoke docker exec to perform unix commands and > report back the output to application master. Application master can > distribute and aggregate execution of the commands to record in application > master log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576935#comment-16576935 ] Jonathan Hung commented on YARN-8559: - Thx [~cheersyang]! Unit test failures in branch-3.0 also fail locally without the patch. Committed to branch-3.0 and branch-2. > Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint > > > Key: YARN-8559 > URL: https://issues.apache.org/jira/browse/YARN-8559 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Anna Savarin >Assignee: Weiwei Yang >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.0.4, 3.1.2 > > Attachments: YARN-8559-branch-2.001.patch, > YARN-8559-branch-3.0.001.patch, YARN-8559.001.patch, YARN-8559.002.patch, > YARN-8559.003.patch, YARN-8559.004.patch > > > All Hadoop services provide a set of common endpoints (/stacks, /logLevel, > /metrics, /jmx, /conf). In the case of the Resource Manager, part of the > configuration comes from the scheduler being used. Currently, these > configuration key/values are not exposed through the /conf endpoint, thereby > revealing an incomplete configuration picture. > Make an improvement and expose the scheduling configuration info through the > RM's /conf endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-8559: Fix Version/s: 3.0.4 2.10.0 > Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint > > > Key: YARN-8559 > URL: https://issues.apache.org/jira/browse/YARN-8559 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Anna Savarin >Assignee: Weiwei Yang >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.0.4, 3.1.2 > > Attachments: YARN-8559-branch-2.001.patch, > YARN-8559-branch-3.0.001.patch, YARN-8559.001.patch, YARN-8559.002.patch, > YARN-8559.003.patch, YARN-8559.004.patch > > > All Hadoop services provide a set of common endpoints (/stacks, /logLevel, > /metrics, /jmx, /conf). In the case of the Resource Manager, part of the > configuration comes from the scheduler being used. Currently, these > configuration key/values are not exposed through the /conf endpoint, thereby > revealing an incomplete configuration picture. > Make an improvement and expose the scheduling configuration info through the > RM's /conf endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8488) Need to add "SUCCEED" state to YARN service
[ https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576934#comment-16576934 ] genericqa commented on YARN-8488: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} YARN-8488 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8488 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935209/YARN-8488.3.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21571/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Need to add "SUCCEED" state to YARN service > --- > > Key: YARN-8488 > URL: https://issues.apache.org/jira/browse/YARN-8488 > Project: Hadoop YARN > Issue Type: Task > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-8488.1.patch, YARN-8488.2.patch, YARN-8488.3.patch > > > Existing YARN service has following states: > {code} > public enum ServiceState { > ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING, > UPGRADING_AUTO_FINALIZE; > } > {code} > Ideally we should add "SUCCEEDED" state in order to support long running > applications like Tensorflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8488) Need to add "SUCCEED" state to YARN service
[ https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8488: --- Attachment: YARN-8488.3.patch > Need to add "SUCCEED" state to YARN service > --- > > Key: YARN-8488 > URL: https://issues.apache.org/jira/browse/YARN-8488 > Project: Hadoop YARN > Issue Type: Task > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-8488.1.patch, YARN-8488.2.patch, YARN-8488.3.patch > > > Existing YARN service has following states: > {code} > public enum ServiceState { > ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING, > UPGRADING_AUTO_FINALIZE; > } > {code} > Ideally we should add "SUCCEEDED" state in order to support long running > applications like Tensorflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers
[ https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576924#comment-16576924 ] Chandni Singh commented on YARN-8160: - Patch 2 contains the two fixes: 1. Exit code 255 during re-init. This is because cleanup of the docker container interferes with the docker inspect. Please see [~eyang]'s comment https://issues.apache.org/jira/browse/YARN-8160?focusedCommentId=16570918=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16570918 2. With entry point, yarn service was not using the updated launch command. > Yarn Service Upgrade: Support upgrade of service that use docker containers > > > Key: YARN-8160 > URL: https://issues.apache.org/jira/browse/YARN-8160 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-8160.001.patch, YARN-8160.002.patch, > container_e02_1533231998644_0009_01_03.nm.log > > > Ability to upgrade dockerized yarn native services. > Ref: YARN-5637 > *Background* > Container upgrade is supported by the NM via {{reInitializeContainer}} api. > {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded > container. > NM performs the following steps during {{reInitializeContainer}}: > - kills the existing process > - cleans up the container > - launches another container with the new {{ContainerLaunchContext}} > NOTE: {{ContainerLaunchContext}} holds all the information that needs to > upgrade the container. > With {{reInitializeContainer}}, the following does *NOT* change > - container ID. This is not created by NM. It is provided to it and here RM > is not creating another container allocation. > - {{localizedResources}} this stays the same if the upgrade does *NOT* > require additional resources IIUC. > > The following changes with {{reInitializeContainer}} > - the working directory of the upgraded container changes. It is *NOT* a > relaunch. > *Changes required in the case of docker container* > - {{reInitializeContainer}} seems to not be working with Docker containers. > Investigate and fix this. > - [Future change] Add an additional api to NM to pull the images and modify > {{reInitializeContainer}} to trigger docker container launch without pulling > the image first which could be based on a flag. > -- When the service upgrade is initialized, we can provide the user with > an option to just pull the images on the NMs. > -- When a component instance is upgrade, it calls the > {{reInitializeContainer}} with the flag pull-image set to false, since the NM > will have already pulled the images. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6495) check docker container's exit code when writing to cgroup task files
[ https://issues.apache.org/jira/browse/YARN-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576925#comment-16576925 ] Jim Brennan commented on YARN-6495: --- As part of YARN-8648, I am proposing that we can just remove the code that this patch is fixing. If we are using cgroups, we are passing the {{cgroup-parent}} argument to docker, which accomplishes what this code was trying to do in a much more deterministic and reliable way. My proposal would be to remove this code as part of YARN-8648, but if there is a preference for doing that in a separate Jira, I can file a new one. Assuming there is agreement, I think we can close out this Jira. [~Jaeboo], [~ebadger], do you agree? > check docker container's exit code when writing to cgroup task files > > > Key: YARN-6495 > URL: https://issues.apache.org/jira/browse/YARN-6495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jaeboo Jeong >Assignee: Jim Brennan >Priority: Major > Labels: Docker > Attachments: YARN-6495.001.patch, YARN-6495.002.patch > > > If I execute simple command like date on docker container, the application > failed to complete successfully. > for example, > {code} > $ yarn jar > $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar > -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop-docker -shell_command "date" -jar > $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar > -num_containers 1 -timeout 360 > … > 17/04/12 00:16:40 INFO distributedshell.Client: Application did finished > unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring > loop > 17/04/12 00:16:40 ERROR distributedshell.Client: Application failed to > complete successfully > {code} > The error log is like below. > {code} > ... > Failed to write pid to file > /cgroup_parent/cpu/hadoop-yarn/container_/tasks - No such process > ... > {code} > When writing pid to cgroup tasks, container-executor doesn’t check docker > container’s status. > If the container finished very quickly, we can’t write pid to cgroup tasks, > and it is not problem. > So container-executor needs to check docker container’s exit code during > writing pid to cgroup tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576917#comment-16576917 ] Zian Chen commented on YARN-7417: - But looks like we can make AggregatedLogFormat.ContainerLogsReader to extend InputStream to achieve this. Let me update the patch. > re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to > remove duplicate codes > > > Key: YARN-7417 > URL: https://issues.apache.org/jira/browse/YARN-7417 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7417.001.patch, YARN-7417.002.patch > > > This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock and > TFileAggregatedLogsBlock > # We have duplicate code in current implementation of > IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock which can be > abstract into common methods. > # render method is too long in both of these class, we want to make it clear > by abstracting some helper methods out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers
[ https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8160: Attachment: YARN-8160.002.patch > Yarn Service Upgrade: Support upgrade of service that use docker containers > > > Key: YARN-8160 > URL: https://issues.apache.org/jira/browse/YARN-8160 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-8160.001.patch, YARN-8160.002.patch, > container_e02_1533231998644_0009_01_03.nm.log > > > Ability to upgrade dockerized yarn native services. > Ref: YARN-5637 > *Background* > Container upgrade is supported by the NM via {{reInitializeContainer}} api. > {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded > container. > NM performs the following steps during {{reInitializeContainer}}: > - kills the existing process > - cleans up the container > - launches another container with the new {{ContainerLaunchContext}} > NOTE: {{ContainerLaunchContext}} holds all the information that needs to > upgrade the container. > With {{reInitializeContainer}}, the following does *NOT* change > - container ID. This is not created by NM. It is provided to it and here RM > is not creating another container allocation. > - {{localizedResources}} this stays the same if the upgrade does *NOT* > require additional resources IIUC. > > The following changes with {{reInitializeContainer}} > - the working directory of the upgraded container changes. It is *NOT* a > relaunch. > *Changes required in the case of docker container* > - {{reInitializeContainer}} seems to not be working with Docker containers. > Investigate and fix this. > - [Future change] Add an additional api to NM to pull the images and modify > {{reInitializeContainer}} to trigger docker container launch without pulling > the image first which could be based on a flag. > -- When the service upgrade is initialized, we can provide the user with > an option to just pull the images on the NMs. > -- When a component instance is upgrade, it calls the > {{reInitializeContainer}} with the flag pull-image set to false, since the NM > will have already pulled the images. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker
[ https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576916#comment-16576916 ] Jim Brennan commented on YARN-8648: --- One proposal to fix the leaking cgroups is to have docker put its containers directly under the {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy}} directory. For example, instead of using {{cgroup-parent=/hadoop-yarn/container_id-}}, we use {{cgroup-parent=/hadoop-yarn}}. This does cause docker to create a {{hadoop-yarn}} cgroup under each resource type, and it does not clean those up, but that is just one unused cgroup per resource type vs hundreds of thousands. This can be done by just passing an empty string to DockerLinuxContainerRuntime.addCGroupParentIfRequired(), or otherwise changing it to ignore the containerIdStr. Doing this and removing the code that cherry-picks the PID in container-executor does work, but the NM still creates the per-container cgroups as well - they're just not used. The other issue with this approach is that the cpu.shares is still updated (to reflect the requested vcores allotment) in the per-container cgroup, so it is ignored. In our code, we addressed this by passing the cpu.shares value in the docker run --cpu-shares command line argument. I'm still thinking about the best way to address this. Currently most of the resourceHandler processing happens at the linuxContainerExecutor level. But there is clearly a difference in how cgroups need to be handled for docker vs linux cases. In the docker case, we should arguably use docker command line arguments instead of directly setting up cgroups. One option would be to provide a runtime interface useResourceHandlers() which for Docker would return false. We could then disable all of the resource handling processing that happens in the container executor, and add the necessary interfaces to handle cgroup parameters to the docker runtime. Another option would be to move the resource handler processing down into the runtime. This is a bigger change, but may be cleaner. The docker runtime may still just ignore those handlers, but that detail would be hidden at the container executor level. cc:, [~ebadger] [~jlowe] [~eyang] [~shaneku...@gmail.com] [~billie.rinaldi] > Container cgroups are leaked when using docker > -- > > Key: YARN-8648 > URL: https://issues.apache.org/jira/browse/YARN-8648 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Labels: Docker > > When you run with docker and enable cgroups for cpu, docker creates cgroups > for all resources on the system, not just for cpu. For instance, if the > {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, > the nodemanager will create a cgroup for each container under > {{/sys/fs/cgroup/cpu/hadoop-yarn}}. In the docker case, we pass this path > via the {{--cgroup-parent}} command line argument. Docker then creates a > cgroup for the docker container under that, for instance: > {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}. > When the container exits, docker cleans up the {{docker_container_id}} > cgroup, and the nodemanager cleans up the {{container_id}} cgroup, All is > good under {{/sys/fs/cgroup/hadoop-yarn}}. > The problem is that docker also creates that same hierarchy under every > resource under {{/sys/fs/cgroup}}. On the rhel7 system I am using, these > are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, > perf_event, and systemd.So for instance, docker creates > {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but > it only cleans up the leaf cgroup {{docker_container_id}}. Nobody cleans up > the {{container_id}} cgroups for these other resources. On one of our busy > clusters, we found > 100,000 of these leaked cgroups. > I found this in our 2.8-based version of hadoop, but I have been able to > repro with current hadoop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576900#comment-16576900 ] Zian Chen commented on YARN-8509: - Hi [~eepayne], sure, let address these two questions, 1) the summation is for each user, we calculate the minimum of these two expression, one is the pending resource for this user per partition, another one is user limit (which is queue_capacity * user_limit_factor) - user used resource per partition 2) I think there is some misunderstanding here. First of all, after the title been changed, this Jira is not intend to only support balancing of queues after satisfied. It intend to change the general strategy of how user limit is been calculated in preemption scenario. So the queue capacities I mentioned here for the example is an initial state, which is like this, || ||queue-a||queue-b||queue-c||queue-b|| |Guaranteed|30|30|30|10| |Used|10|40|50|0| |Pending|6|30|30|0| this configuration should able to happen if we set user_limit_percent to 50 and user_limit_factor to 1.0f, 3.0f, 3.0f and 2.0f respectively. But within current equation, this initial state won't happen. user_limit = min(max(current_capacity)/ #active_users, current_capacity * user_limit_percent), queue_capacity * user_limit_factor) in above case, queue-b's queue_capacity * user_limit_factor is 90GB while max(current_capacity)/ #active_users, current_capacity * user_limit_percent) is 40GB, this will make user-limit-factor don't make any effect at all, and headroom becomes zero for queue-b. So the point is, we should let user-limit to reach at most queue_capacity * user_limit_factor > Total pending resource calculation in preemption should use user-limit factor > instead of minimum-user-limit-percent > --- > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch, YARN-8509.002.patch, > YARN-8509.003.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8488) Need to add "SUCCEED" state to YARN service
[ https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576888#comment-16576888 ] genericqa commented on YARN-8488: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} YARN-8488 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8488 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935198/YARN-8488.2.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21569/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Need to add "SUCCEED" state to YARN service > --- > > Key: YARN-8488 > URL: https://issues.apache.org/jira/browse/YARN-8488 > Project: Hadoop YARN > Issue Type: Task > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-8488.1.patch, YARN-8488.2.patch > > > Existing YARN service has following states: > {code} > public enum ServiceState { > ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING, > UPGRADING_AUTO_FINALIZE; > } > {code} > Ideally we should add "SUCCEEDED" state in order to support long running > applications like Tensorflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8648) Container cgroups are leaked when using docker
[ https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576855#comment-16576855 ] Jim Brennan edited comment on YARN-8648 at 8/10/18 9:37 PM: Another problem we have seen is that container-executor still has code that cherry-picks the PID of the launch shell from the docker container and writes that into the {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/tasks}} file, effectively moving it from {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}} to {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id}}. So you end up with one process out of the container in the {{container_id}} cgroup, and the rest in the {{container_id/docker_container_id}} cgroup. Since we are passing the {{--cgroup-parent}} to docker, there is no need to manually write the pid - we can just remove the code that does this. was (Author: jim_brennan): Another problem we have seen is that container-executor still has code that cherry-picks the PID of the launch shell from the docker container and writes that into the {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/tasks}} file, effectively moving it from {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}} to {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id}}. So you end up with one process out of the container in the {{container_id}} cgroup, and the rest in the {{container_id/docker_container_id}} cgroup. > Container cgroups are leaked when using docker > -- > > Key: YARN-8648 > URL: https://issues.apache.org/jira/browse/YARN-8648 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Labels: Docker > > When you run with docker and enable cgroups for cpu, docker creates cgroups > for all resources on the system, not just for cpu. For instance, if the > {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, > the nodemanager will create a cgroup for each container under > {{/sys/fs/cgroup/cpu/hadoop-yarn}}. In the docker case, we pass this path > via the {{--cgroup-parent}} command line argument. Docker then creates a > cgroup for the docker container under that, for instance: > {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}. > When the container exits, docker cleans up the {{docker_container_id}} > cgroup, and the nodemanager cleans up the {{container_id}} cgroup, All is > good under {{/sys/fs/cgroup/hadoop-yarn}}. > The problem is that docker also creates that same hierarchy under every > resource under {{/sys/fs/cgroup}}. On the rhel7 system I am using, these > are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, > perf_event, and systemd.So for instance, docker creates > {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but > it only cleans up the leaf cgroup {{docker_container_id}}. Nobody cleans up > the {{container_id}} cgroups for these other resources. On one of our busy > clusters, we found > 100,000 of these leaked cgroups. > I found this in our 2.8-based version of hadoop, but I have been able to > repro with current hadoop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8488) Need to add "SUCCEED" state to YARN service
[ https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576878#comment-16576878 ] genericqa commented on YARN-8488: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} YARN-8488 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8488 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935185/YARN-8488.1.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21568/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Need to add "SUCCEED" state to YARN service > --- > > Key: YARN-8488 > URL: https://issues.apache.org/jira/browse/YARN-8488 > Project: Hadoop YARN > Issue Type: Task > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-8488.1.patch, YARN-8488.2.patch > > > Existing YARN service has following states: > {code} > public enum ServiceState { > ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING, > UPGRADING_AUTO_FINALIZE; > } > {code} > Ideally we should add "SUCCEEDED" state in order to support long running > applications like Tensorflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8488) Need to add "SUCCEED" state to YARN service
[ https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8488: --- Attachment: (was: YARN-8488.2.patch) > Need to add "SUCCEED" state to YARN service > --- > > Key: YARN-8488 > URL: https://issues.apache.org/jira/browse/YARN-8488 > Project: Hadoop YARN > Issue Type: Task > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-8488.1.patch, YARN-8488.2.patch > > > Existing YARN service has following states: > {code} > public enum ServiceState { > ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING, > UPGRADING_AUTO_FINALIZE; > } > {code} > Ideally we should add "SUCCEEDED" state in order to support long running > applications like Tensorflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8488) Need to add "SUCCEED" state to YARN service
[ https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8488: --- Attachment: YARN-8488.2.patch > Need to add "SUCCEED" state to YARN service > --- > > Key: YARN-8488 > URL: https://issues.apache.org/jira/browse/YARN-8488 > Project: Hadoop YARN > Issue Type: Task > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-8488.1.patch, YARN-8488.2.patch > > > Existing YARN service has following states: > {code} > public enum ServiceState { > ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING, > UPGRADING_AUTO_FINALIZE; > } > {code} > Ideally we should add "SUCCEEDED" state in order to support long running > applications like Tensorflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8488) Need to add "SUCCEED" state to YARN service
[ https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576877#comment-16576877 ] Suma Shivaprasad commented on YARN-8488: Attached patch which adds states SUCCEEDED to ServiceState, SUCCEEDED/FAILED to ComponentState. Earlier ComponentInstance State was marked as STOPPED for all available restart policies. Now it is SUCCEEDED/FAILED depending on the exit status. One pending issue is when there is a graceful stop sent via Client RPC, component instance STATE is not marked as STOPPED. Will fix this in subsequent patch. For restartPolicy=ON_FAILURE/NEVER, when all component instances terminate, component = SUCCEEDED if all component instances succeed else marked as FAILED. > Need to add "SUCCEED" state to YARN service > --- > > Key: YARN-8488 > URL: https://issues.apache.org/jira/browse/YARN-8488 > Project: Hadoop YARN > Issue Type: Task > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-8488.1.patch, YARN-8488.2.patch > > > Existing YARN service has following states: > {code} > public enum ServiceState { > ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING, > UPGRADING_AUTO_FINALIZE; > } > {code} > Ideally we should add "SUCCEEDED" state in order to support long running > applications like Tensorflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator
[ https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576875#comment-16576875 ] Yufei Gu edited comment on YARN-8632 at 8/10/18 9:28 PM: - Your patch doesn't apply to trunk. You said the bug is in trunk as well, can you provide a patch for the trunk? What is the version does your patch target? 2.7.2? was (Author: yufeigu): Your patch doesn't apply to trunk? You said the bug is in trunk as well, can you provide a patch for the trunk? What is the version does your patch target? 2.7.2? > No data in file realtimetrack.json after running SchedulerLoadSimulator > --- > > Key: YARN-8632 > URL: https://issues.apache.org/jira/browse/YARN-8632 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Attachments: YARN-8632.001.patch > > > Recently, I have beenning using > [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html] > to validate the impact of changes on my FairScheduler. I encountered some > problems. > Firstly, I fix a npe bug with the patch in > https://issues.apache.org/jira/browse/YARN-4302 > Secondly, Everything seems to be ok, but I just get "[]" in file > realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit > because of npe, > the reason is "wrapper.getQueueSet()" is still null when executing "String > metrics = web.generateRealTimeTrackingMetrics();" > So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" > in try section to avoid MetricsLogRunnable thread exit with unexpected > exception. > My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the > second problem and I have made a patch to solve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator
[ https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576875#comment-16576875 ] Yufei Gu commented on YARN-8632: Your patch doesn't apply to trunk? You said the bug is in trunk as well, can you provide a patch for the trunk? What is the version does your patch target? 2.7.2? > No data in file realtimetrack.json after running SchedulerLoadSimulator > --- > > Key: YARN-8632 > URL: https://issues.apache.org/jira/browse/YARN-8632 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Attachments: YARN-8632.001.patch > > > Recently, I have beenning using > [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html] > to validate the impact of changes on my FairScheduler. I encountered some > problems. > Firstly, I fix a npe bug with the patch in > https://issues.apache.org/jira/browse/YARN-4302 > Secondly, Everything seems to be ok, but I just get "[]" in file > realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit > because of npe, > the reason is "wrapper.getQueueSet()" is still null when executing "String > metrics = web.generateRealTimeTrackingMetrics();" > So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" > in try section to avoid MetricsLogRunnable thread exit with unexpected > exception. > My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the > second problem and I have made a patch to solve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8488) Need to add "SUCCEED" state to YARN service
[ https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8488: --- Attachment: YARN-8488.2.patch > Need to add "SUCCEED" state to YARN service > --- > > Key: YARN-8488 > URL: https://issues.apache.org/jira/browse/YARN-8488 > Project: Hadoop YARN > Issue Type: Task > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-8488.1.patch, YARN-8488.2.patch > > > Existing YARN service has following states: > {code} > public enum ServiceState { > ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING, > UPGRADING_AUTO_FINALIZE; > } > {code} > Ideally we should add "SUCCEEDED" state in order to support long running > applications like Tensorflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573997#comment-16573997 ] Zian Chen edited comment on YARN-8509 at 8/10/18 9:21 PM: -- Hi Eric, thanks for the comments. Discussed with Wangda, the patch uploaded before is not correct due to misunderstand of the original problem. I have changed the Jira title. The intention of this Jira is to fix calculation of pending resource consider user-limit in preemption scenario. Currently, pending resource calculation in preemption uses the calculation algorithm in scheduling which is this one, {code:java} user_limit = min(max(current_capacity)/ #active_users, current_capacity * user_limit_percent), queue_capacity * user_limit_factor) {code} this is good for scheduling cause we want to make sure users can get at least "minimum-user-limit-percent" of resource to use, which is more like a lower bound of user-limit. However we should not capture total pending resource a leaf queue can get by minimum-user-limit-percent, instead, we want to use user-limit-factor which is the upper bound to capture pending resource in preemption. Cause if we use minimum-user-limit-percent to capture pending resource, resource under-utilization will happen in preemption scenario. Thus, we suggest the pending resource calculation for preemption should use this formula. {code:java} total_pending(partition,queue) = min {Q_max(partition) - Q_used(partition), Σ (min { User.ulf(partition) - User.used(partition), User.pending(partition})} {code} Let me give an example, {code:java} Root / | \ \ a b c d 30 30 30 10 1) Only one node (n1) in the cluster, it has 100G. 2) app1 submit to queue-a, asks for 10G used, 6G pending. 3) app2 submit to queue-b, asks for 40G used, 30G pending. 4) app3 submit to queue-c, asks for 50G used, 30G pending. {code} Here we only have one user, and user-limit-factor for queues are ||Queue name|| minimum-user-limit-percent ||user-limit-factor|| | a| 50| 1.0 f| | b| 50| 3.0 f| | c| 50| 3.0 f| | d| 50| 2.0 f| With old calculation, user-limit for queue-a is 30G, which can let app1 has 6G pending, but user-limit for queue-b becomes 40G, which makes headroom become zero after subtract 40G used, the 30G pending resource been asked can not be accepted, same thing with queue-c too. However if we see this test case in preemption point of view, we should allow queue-b and queue-c take more pending resources. Because even though queue-a has 30G guaranteed configured, it's under utilization. And by pending resource captured by the old algorithm, queue-b and queue-c can not take available resource through preemption which make the cluster resource not used effectively. To summarize, since user-limit-factor maintains the hard-limit of how much resource can be used by a user, we should calculate pending resource consider user-limit-factor instead of minimum-user-limit-percent. Could you share your opinion on this, [~eepayne]? was (Author: zian chen): Hi Eric, thanks for the comments. Discussed with Wangda, the patch uploaded before is not correct due to misunderstand of the original problem. I have changed the Jira title. The intention of this Jira is to fix calculation of pending resource consider user-limit in preemption scenario. Currently, pending resource calculation in preemption uses the calculation algorithm in scheduling which is this one, {code:java} user_limit = min(max(current_capacity)/ #active_users, current_capacity * user_limit_percent), queue_capacity * user_limit_factor) {code} this is good for scheduling cause we want to make sure users can get at least "minimum-user-limit-percent" of resource to use, which is more like a lower bound of user-limit. However we should not capture total pending resource a leaf queue can get by minimum-user-limit-percent, instead, we want to use user-limit-factor which is the upper bound to capture pending resource in preemption. Cause if we use minimum-user-limit-percent to capture pending resource, resource under-utilization will happen in preemption scenario. Thus, we suggest the pending resource calculation for preemption should use this formula. {code:java} total_pending(partition,queue) = min {Q_max(partition) - Q_used(partition), Σ (min { User.ulf(partition) - User.used(partition), User.pending(partition})} {code} Let me give an example, {code:java} Root / | \ \ a b c d 30 30 30 10 1) Only one node (n1) in the cluster, it has 100G. 2) app1 submit to queue-a, asks for 10G used, 6G pending. 3) app2 submit to queue-b, asks for 40G used, 30G pending.
[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker
[ https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576855#comment-16576855 ] Jim Brennan commented on YARN-8648: --- Another problem we have seen is that container-executor still has code that cherry-picks the PID of the launch shell from the docker container and writes that into the {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/tasks}} file, effectively moving it from {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}} to {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id}}. So you end up with one process out of the container in the {{container_id}} cgroup, and the rest in the {{container_id/docker_container_id}} cgroup. > Container cgroups are leaked when using docker > -- > > Key: YARN-8648 > URL: https://issues.apache.org/jira/browse/YARN-8648 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Labels: Docker > > When you run with docker and enable cgroups for cpu, docker creates cgroups > for all resources on the system, not just for cpu. For instance, if the > {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, > the nodemanager will create a cgroup for each container under > {{/sys/fs/cgroup/cpu/hadoop-yarn}}. In the docker case, we pass this path > via the {{--cgroup-parent}} command line argument. Docker then creates a > cgroup for the docker container under that, for instance: > {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}. > When the container exits, docker cleans up the {{docker_container_id}} > cgroup, and the nodemanager cleans up the {{container_id}} cgroup, All is > good under {{/sys/fs/cgroup/hadoop-yarn}}. > The problem is that docker also creates that same hierarchy under every > resource under {{/sys/fs/cgroup}}. On the rhel7 system I am using, these > are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, > perf_event, and systemd.So for instance, docker creates > {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but > it only cleans up the leaf cgroup {{docker_container_id}}. Nobody cleans up > the {{container_id}} cgroups for these other resources. On one of our busy > clusters, we found > 100,000 of these leaked cgroups. > I found this in our 2.8-based version of hadoop, but I have been able to > repro with current hadoop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8648) Container cgroups are leaked when using docker
[ https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-8648: -- Labels: Docker (was: ) > Container cgroups are leaked when using docker > -- > > Key: YARN-8648 > URL: https://issues.apache.org/jira/browse/YARN-8648 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Labels: Docker > > When you run with docker and enable cgroups for cpu, docker creates cgroups > for all resources on the system, not just for cpu. For instance, if the > {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, > the nodemanager will create a cgroup for each container under > {{/sys/fs/cgroup/cpu/hadoop-yarn}}. In the docker case, we pass this path > via the {{--cgroup-parent}} command line argument. Docker then creates a > cgroup for the docker container under that, for instance: > {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}. > When the container exits, docker cleans up the {{docker_container_id}} > cgroup, and the nodemanager cleans up the {{container_id}} cgroup, All is > good under {{/sys/fs/cgroup/hadoop-yarn}}. > The problem is that docker also creates that same hierarchy under every > resource under {{/sys/fs/cgroup}}. On the rhel7 system I am using, these > are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, > perf_event, and systemd.So for instance, docker creates > {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but > it only cleans up the leaf cgroup {{docker_container_id}}. Nobody cleans up > the {{container_id}} cgroups for these other resources. On one of our busy > clusters, we found > 100,000 of these leaked cgroups. > I found this in our 2.8-based version of hadoop, but I have been able to > repro with current hadoop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8648) Container cgroups are leaked when using docker
Jim Brennan created YARN-8648: - Summary: Container cgroups are leaked when using docker Key: YARN-8648 URL: https://issues.apache.org/jira/browse/YARN-8648 Project: Hadoop YARN Issue Type: Bug Reporter: Jim Brennan Assignee: Jim Brennan When you run with docker and enable cgroups for cpu, docker creates cgroups for all resources on the system, not just for cpu. For instance, if the {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, the nodemanager will create a cgroup for each container under {{/sys/fs/cgroup/cpu/hadoop-yarn}}. In the docker case, we pass this path via the {{--cgroup-parent}} command line argument. Docker then creates a cgroup for the docker container under that, for instance: {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}. When the container exits, docker cleans up the {{docker_container_id}} cgroup, and the nodemanager cleans up the {{container_id}} cgroup, All is good under {{/sys/fs/cgroup/hadoop-yarn}}. The problem is that docker also creates that same hierarchy under every resource under {{/sys/fs/cgroup}}. On the rhel7 system I am using, these are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, perf_event, and systemd.So for instance, docker creates {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but it only cleans up the leaf cgroup {{docker_container_id}}. Nobody cleans up the {{container_id}} cgroups for these other resources. On one of our busy clusters, we found > 100,000 of these leaked cgroups. I found this in our 2.8-based version of hadoop, but I have been able to repro with current hadoop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8520) Document best practice for user management
[ https://issues.apache.org/jira/browse/YARN-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576840#comment-16576840 ] Hudson commented on YARN-8520: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14749 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14749/]) YARN-8520. Document best practice for user management. Contributed by (skumpf: rev e7951c69cbc85604f72cdd3559122d4e2c1ea127) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md > Document best practice for user management > -- > > Key: YARN-8520 > URL: https://issues.apache.org/jira/browse/YARN-8520 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation, yarn >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8520.001.patch, YARN-8520.002.patch, > YARN-8520.003.patch, YARN-8520.004.patch, YARN-8520.005.patch > > > Docker container must have consistent username and groups with host operating > system when external mount points are exposed to docker container. This > prevents malicious or unauthorized impersonation to occur. This task is to > document the best practice to ensure user and group membership are consistent > across docker containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8520) Document best practice for user management
[ https://issues.apache.org/jira/browse/YARN-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576838#comment-16576838 ] Eric Yang commented on YARN-8520: - Thank you [~shaneku...@gmail.com]. > Document best practice for user management > -- > > Key: YARN-8520 > URL: https://issues.apache.org/jira/browse/YARN-8520 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation, yarn >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8520.001.patch, YARN-8520.002.patch, > YARN-8520.003.patch, YARN-8520.004.patch, YARN-8520.005.patch > > > Docker container must have consistent username and groups with host operating > system when external mount points are exposed to docker container. This > prevents malicious or unauthorized impersonation to occur. This task is to > document the best practice to ensure user and group membership are consistent > across docker containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8520) Document best practice for user management
[ https://issues.apache.org/jira/browse/YARN-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576832#comment-16576832 ] Shane Kumpf commented on YARN-8520: --- Thanks for the contribution, [~eyang]! I committed this to trunk and branch-3.1. > Document best practice for user management > -- > > Key: YARN-8520 > URL: https://issues.apache.org/jira/browse/YARN-8520 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation, yarn >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8520.001.patch, YARN-8520.002.patch, > YARN-8520.003.patch, YARN-8520.004.patch, YARN-8520.005.patch > > > Docker container must have consistent username and groups with host operating > system when external mount points are exposed to docker container. This > prevents malicious or unauthorized impersonation to occur. This task is to > document the best practice to ensure user and group membership are consistent > across docker containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8488) Need to add "SUCCEED" state to YARN service
[ https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576825#comment-16576825 ] genericqa commented on YARN-8488: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} YARN-8488 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8488 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935185/YARN-8488.1.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21567/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Need to add "SUCCEED" state to YARN service > --- > > Key: YARN-8488 > URL: https://issues.apache.org/jira/browse/YARN-8488 > Project: Hadoop YARN > Issue Type: Task > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-8488.1.patch > > > Existing YARN service has following states: > {code} > public enum ServiceState { > ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING, > UPGRADING_AUTO_FINALIZE; > } > {code} > Ideally we should add "SUCCEEDED" state in order to support long running > applications like Tensorflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576815#comment-16576815 ] Zian Chen commented on YARN-7417: - Thanks for the review [~eyang], that was my original plan to make it reusable, but after investigating the logic, it's very almost impossible to achieve this. The main reason is one formal parameter can not be abstracted into a common class type. The "AggregatedLogFormat.ContainerLogsReader logReader" in TFileAggregatedLogsBlock is a static class which can not be converted into any of the parent class of the formal parameter "InputStream in" in IndexedFileAggregatedLogsBlock > re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to > remove duplicate codes > > > Key: YARN-7417 > URL: https://issues.apache.org/jira/browse/YARN-7417 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7417.001.patch, YARN-7417.002.patch > > > This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock and > TFileAggregatedLogsBlock > # We have duplicate code in current implementation of > IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock which can be > abstract into common methods. > # render method is too long in both of these class, we want to make it clear > by abstracting some helper methods out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8488) Need to add "SUCCEED" state to YARN service
[ https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8488: --- Attachment: YARN-8488.1.patch > Need to add "SUCCEED" state to YARN service > --- > > Key: YARN-8488 > URL: https://issues.apache.org/jira/browse/YARN-8488 > Project: Hadoop YARN > Issue Type: Task > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-8488.1.patch > > > Existing YARN service has following states: > {code} > public enum ServiceState { > ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING, > UPGRADING_AUTO_FINALIZE; > } > {code} > Ideally we should add "SUCCEEDED" state in order to support long running > applications like Tensorflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-7417: Description: This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock # We have duplicate code in current implementation of IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock which can be abstract into common methods. # render method is too long in both of these class, we want to make it clear by abstracting some helper methods out. was: This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock We have duplicate code in current implementation of IndexedFileAggregatedLogsBlock and IndexedFileAggregatedLogsBlock which can be abstract into common method. > re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to > remove duplicate codes > > > Key: YARN-7417 > URL: https://issues.apache.org/jira/browse/YARN-7417 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7417.001.patch, YARN-7417.002.patch > > > This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock and > TFileAggregatedLogsBlock > # We have duplicate code in current implementation of > IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock which can be > abstract into common methods. > # render method is too long in both of these class, we want to make it clear > by abstracting some helper methods out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-7417: Description: This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock We have duplicate code in current implementation of IndexedFileAggregatedLogsBlock and IndexedFileAggregatedLogsBlock which can be abstract into common method. was:We have duplicate code in current implementation of IndexedFileAggregatedLogsBlock and > re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to > remove duplicate codes > > > Key: YARN-7417 > URL: https://issues.apache.org/jira/browse/YARN-7417 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7417.001.patch, YARN-7417.002.patch > > > This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock > We have duplicate code in current implementation of > IndexedFileAggregatedLogsBlock and IndexedFileAggregatedLogsBlock which can > be abstract into common method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8523) Interactive docker shell
[ https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576801#comment-16576801 ] Eric Yang edited comment on YARN-8523 at 8/10/18 8:21 PM: -- [~Zian Chen] # Without step 2 session management, the terminal session will terminate with Connection Closed when node manager restarts. User can retry with browser reload to obtain a new session. I think web socket connection is reliable enough to keep the connection alive. If it drops, user can always get a new session of docker exec. # There is nothing to handle on node manager shutdown or crash because remote connection closed will be displayed to browser. was (Author: eyang): [~Zian Chen] # Without step 2 session management, the terminal session will terminate with Connection Closed when node manager restarts. User can retry with browser reload to obtain a new session. I think web socket connection is reliable enough to keep the connection alive. If it drops, user can always get a new session of docker exec. # There is nothing to handle on node manager shutdown or crash because remote connection closed will be displayed to browser. > Interactive docker shell > > > Key: YARN-8523 > URL: https://issues.apache.org/jira/browse/YARN-8523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Priority: Major > Labels: Docker > > Some application might require interactive unix commands executions to carry > out operations. Container-executor can interface with docker exec to debug > or analyze docker containers while the application is running. It would be > nice to support an API to invoke docker exec to perform unix commands and > report back the output to application master. Application master can > distribute and aggregate execution of the commands to record in application > master log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8523) Interactive docker shell
[ https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576801#comment-16576801 ] Eric Yang commented on YARN-8523: - [~Zian Chen] # Without step 2 session management, the terminal session will terminate with Connection Closed when node manager restarts. User can retry with browser reload to obtain a new session. I think web socket connection is reliable enough to keep the connection alive. If it drops, user can always get a new session of docker exec. # There is nothing to handle on node manager shutdown or crash because remote connection closed will be displayed to browser. > Interactive docker shell > > > Key: YARN-8523 > URL: https://issues.apache.org/jira/browse/YARN-8523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Priority: Major > Labels: Docker > > Some application might require interactive unix commands executions to carry > out operations. Container-executor can interface with docker exec to debug > or analyze docker containers while the application is running. It would be > nice to support an API to invoke docker exec to perform unix commands and > report back the output to application master. Application master can > distribute and aggregate execution of the commands to record in application > master log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-7417: Description: We have duplicate code in current implementation of IndexedFileAggregatedLogsBlock and > re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to > remove duplicate codes > > > Key: YARN-7417 > URL: https://issues.apache.org/jira/browse/YARN-7417 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7417.001.patch, YARN-7417.002.patch > > > We have duplicate code in current implementation of > IndexedFileAggregatedLogsBlock and -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576795#comment-16576795 ] genericqa commented on YARN-8559: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 44s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 20s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 39s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 55s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 26s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 50s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 51s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 3 new + 17 unchanged - 0 fixed = 20 total (was 17) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 64m 28s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 18s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}125m 44s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:a716388 | | JIRA Issue | YARN-8559 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935167/YARN-8559-branch-2.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux fdea2c2d6bf7 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / 2024260 | | maven | version: Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) | | Default Java | 1.7.0_181 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/21566/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21566/testReport/ | | Max. process+thread count | 873 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn | |
[jira] [Commented] (YARN-7494) Add muti node lookup support for better placement
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576769#comment-16576769 ] genericqa commented on YARN-7494: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 48s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 52s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 29s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}136m 26s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueManagementDynamicEditPolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-7494 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935162/YARN-7494.14.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ca679d8c7cbf 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 15241c6 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/21565/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21565/testReport/ | | Max. process+thread count | 928 (vs. ulimit of 1) | | modules | C:
[jira] [Commented] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576732#comment-16576732 ] Eric Yang commented on YARN-7417: - [~Zian Chen] Thank you for the patch. Is it possible to reuse processContainerLog? I think they look similar that the method maybe reusable? I think it is safe to assume that logs stored in TFile is also UTF-8 encoding. > re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to > remove duplicate codes > > > Key: YARN-7417 > URL: https://issues.apache.org/jira/browse/YARN-7417 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7417.001.patch, YARN-7417.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8647) Add a flag to disable move app between queues
[ https://issues.apache.org/jira/browse/YARN-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi reassigned YARN-8647: --- Assignee: Abhishek Modi > Add a flag to disable move app between queues > - > > Key: YARN-8647 > URL: https://issues.apache.org/jira/browse/YARN-8647 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.3 >Reporter: sarun singla >Assignee: Abhishek Modi >Priority: Critical > > For large clusters where we have a number of users submitting application, we > can result into scenarios where app developers try to move the queues for > their applications using something like > {code:java} > yarn application -movetoqueue -queue {code} > Today there is no way of disabling the feature if one does not want > application developers to use the feature. > *Solution:* > We should probably add an option to disable move queue feature from RM side > on the cluster level. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8523) Interactive docker shell
[ https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576697#comment-16576697 ] Zian Chen commented on YARN-8523: - Good point, I think we can make this Jira focus on building this pipline and create a second Jira for persistent docker exec state while NM restart. Two more questions here, # Should we give user sone kind of notification while NM restart and we are trying to resuming the docker exec? What if we get several retries to reconnect and don't succeed? We may need to give user some friendly reminder to avoid the misunderstanding of session been stuck for too long, right? # How to handle NM unexpected shutdown(like crash, etc) scenario? > Interactive docker shell > > > Key: YARN-8523 > URL: https://issues.apache.org/jira/browse/YARN-8523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Priority: Major > Labels: Docker > > Some application might require interactive unix commands executions to carry > out operations. Container-executor can interface with docker exec to debug > or analyze docker containers while the application is running. It would be > nice to support an API to invoke docker exec to perform unix commands and > report back the output to application master. Application master can > distribute and aggregate execution of the commands to record in application > master log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8647) Add a flag to disable move app between queues
[ https://issues.apache.org/jira/browse/YARN-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8647: - Summary: Add a flag to disable move app between queues (was: Add a flag to disable move queue) > Add a flag to disable move app between queues > - > > Key: YARN-8647 > URL: https://issues.apache.org/jira/browse/YARN-8647 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.3 >Reporter: sarun singla >Priority: Critical > > For large clusters where we have a number of users submitting application, we > can result into scenarios where app developers try to move the queues for > their applications using something like > {code:java} > yarn application -movetoqueue -queue {code} > Today there is no way of disabling the feature if one does not want > application developers to use the feature. > *Solution:* > We should probably add an option to disable move queue feature from RM side > on the cluster level. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8647) Add a flag to disable move queue
[ https://issues.apache.org/jira/browse/YARN-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sarun singla updated YARN-8647: --- Description: For large clusters where we have a number of users submitting application, we can result into scenarios where app developers try to move the queues for their applications using something like {code:java} yarn application -movetoqueue -queue {code} Today there is no way of disabling the feature if one does not want application developers to use the feature. *Solution:* We should probably add an option to disable move queue feature from RM side on the cluster level. was: For large clusters where we have a number of users submitting application, we can result into scenarios where app developers try to move the queues for their applications using something like {code:java} yarn application -movetoqueue -queue {code} Today there is no way of disabling the feature if one does not want application developers to use the feature. Solution: We probably add an option to disable move queue feature from RM side on the cluster level. > Add a flag to disable move queue > > > Key: YARN-8647 > URL: https://issues.apache.org/jira/browse/YARN-8647 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.3 >Reporter: sarun singla >Priority: Critical > > For large clusters where we have a number of users submitting application, we > can result into scenarios where app developers try to move the queues for > their applications using something like > {code:java} > yarn application -movetoqueue -queue {code} > Today there is no way of disabling the feature if one does not want > application developers to use the feature. > *Solution:* > We should probably add an option to disable move queue feature from RM side > on the cluster level. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8647) Add a flag to disable move queue
[ https://issues.apache.org/jira/browse/YARN-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sarun singla updated YARN-8647: --- Description: For large clusters where we have a number of users submitting application, we can result into scenarios where app developers try to move the queues for their applications using something like {code:java} yarn application -movetoqueue -queue {code} Today there is no way of disabling the feature if one does not want application developers to use the feature. Solution: We probably add an option to disable move queue feature from RM side on the cluster level. was: For large clusters where we have a number of users submitting application, we can result into scenarios where app developers try to move the queues for their applications using something like {code}yarn application -movetoqueue -queue \{/code} Today there is no way of disabling the feature if one does not want application developers to use the feature. Solution: We probably add an option to disable move queue feature from RM side on the cluster level. > Add a flag to disable move queue > > > Key: YARN-8647 > URL: https://issues.apache.org/jira/browse/YARN-8647 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.3 >Reporter: sarun singla >Priority: Critical > > For large clusters where we have a number of users submitting application, we > can result into scenarios where app developers try to move the queues for > their applications using something like > {code:java} > yarn application -movetoqueue -queue {code} > Today there is no way of disabling the feature if one does not want > application developers to use the feature. > Solution: > We probably add an option to disable move queue feature from RM side on the > cluster level. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576681#comment-16576681 ] Shane Kumpf commented on YARN-8569: --- Thanks for filing this [~eyang]. I have a use case that could benefit from this as well. When running in containers, one challenging piece is determining how much CPU and memory was allocated to the container. Traditional os tooling shows the totals from the host. This is especially problematic for tools like Ambari, which use os tooling to dynamically set configuration. Exposing the resource request details via this mechanism could be used to solve this problem. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Priority: Major > Labels: Docker > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8647) Add a flag to disable move queue
sarun created YARN-8647: --- Summary: Add a flag to disable move queue Key: YARN-8647 URL: https://issues.apache.org/jira/browse/YARN-8647 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.3 Reporter: sarun For large clusters where we have a number of users submitting application, we can result into scenarios where app developers try to move the queues for their applications using something like {code}yarn application -movetoqueue -queue \{/code} Today there is no way of disabling the feature if one does not want application developers to use the feature. Solution: We probably add an option to disable move queue feature from RM side on the cluster level. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator
[ https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-8632: -- Assignee: Xianghao Lu > No data in file realtimetrack.json after running SchedulerLoadSimulator > --- > > Key: YARN-8632 > URL: https://issues.apache.org/jira/browse/YARN-8632 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Attachments: YARN-8632.001.patch > > > Recently, I have beenning using > [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html] > to validate the impact of changes on my FairScheduler. I encountered some > problems. > Firstly, I fix a npe bug with the patch in > https://issues.apache.org/jira/browse/YARN-4302 > Secondly, Everything seems to be ok, but I just get "[]" in file > realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit > because of npe, > the reason is "wrapper.getQueueSet()" is still null when executing "String > metrics = web.generateRealTimeTrackingMetrics();" > So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" > in try section to avoid MetricsLogRunnable thread exit with unexpected > exception. > My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the > second problem and I have made a patch to solve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator
[ https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1657#comment-1657 ] Yufei Gu commented on YARN-8632: Added you to the contributor list and assign this to you. Will review later. > No data in file realtimetrack.json after running SchedulerLoadSimulator > --- > > Key: YARN-8632 > URL: https://issues.apache.org/jira/browse/YARN-8632 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Attachments: YARN-8632.001.patch > > > Recently, I have beenning using > [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html] > to validate the impact of changes on my FairScheduler. I encountered some > problems. > Firstly, I fix a npe bug with the patch in > https://issues.apache.org/jira/browse/YARN-4302 > Secondly, Everything seems to be ok, but I just get "[]" in file > realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit > because of npe, > the reason is "wrapper.getQueueSet()" is still null when executing "String > metrics = web.generateRealTimeTrackingMetrics();" > So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" > in try section to avoid MetricsLogRunnable thread exit with unexpected > exception. > My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the > second problem and I have made a patch to solve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576342#comment-16576342 ] Eric Payne edited comment on YARN-8509 at 8/10/18 6:12 PM: --- [~Zian Chen], can I please get a couple of clarifications? {quote} total_pending(partition,queue) = min {Q_max(partition) - Q_used(partition), Σ (min \{ User.ulf\(partition\) - User.used\(partition\), User.pending\(partition\}\)\} {quote} 1) In the above pseudo-code, what is being summed by the summation? 2) In the above example, queue-a is the only one that's underserved, so the the first round of preemption should actually preempt 6G from queues b and c. The amount preempted from each queue depends on the age of the containers, but you could end up with something like queue-b consuming 40G and pending 30G and queue-c consuming 44G and pending 36G before the second round of preemption, at which point queue-a would be satisfied and only queues b and c have pending resource requests. Since this issue is meant to address the balancing of queues that are over their capacity, I don't understand why queue-a is involved in the above use case. Can you provide a simpler example that only involves the balancing of over-served queues? was (Author: eepayne): [~Zian Chen], can I please get a couple of clarifications? {quote}total_pending(partition,queue) = min {Q_max(partition) - Q_used(partition), Σ (min Unknown macro: \{User.ulf(partition) - User.used(partition), User.pending(partition})}{quote} 1) In the above pseudo-code, what is being summed by the summation? 2) In the above example, queue-a is the only one that's underserved, so the the first round of preemption should actually preempt 6G from queues b and c. The amount preempted from each queue depends on the age of the containers, but you could end up with something like queue-b consuming 40G and pending 30G and queue-c consuming 44G and pending 36G before the second round of preemption, at which point queue-a would be satisfied and only queues b and c have pending resource requests. Since this issue is meant to address the balancing of queues that are over their capacity, I don't understand why queue-a is involved in the above use case. Can you provide a simpler example that only involves the balancing of over-served queues? > Total pending resource calculation in preemption should use user-limit factor > instead of minimum-user-limit-percent > --- > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch, YARN-8509.002.patch, > YARN-8509.003.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8623) Update Docker examples to use image which exists
[ https://issues.apache.org/jira/browse/YARN-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576655#comment-16576655 ] Shane Kumpf edited comment on YARN-8623 at 8/10/18 6:07 PM: [~ccondit-target] - thanks for looking into this. I see what you mean about the challenge with using that image. I think you are correct that the existing apache/hadoop-runner image serves a different type of use case than we need here. IMO, our target should be an image capable of running MapReduce pi, as that's the example we provide in the docs. If it also works for the Spark shell example we provide in our docs, with the appropriate spark install/config, that would be great, but I don't think it's a requirement to start. Thinking about what we need to meet that goal, I think a majority of the users we would be targeting with this guide will have all of Hadoop installed on the nodes where these containers are running. Instead of trying to package the latest version of Apache Hadoop as an image, I think our example would be easier to maintain if we guide the user towards bind mounting the Hadoop binaries and configuration from the NodeManager hosts. If we take that approach, I believe the image should only need to include a JDK and set up JAVA_HOME. We might even be able to use an existing openjdk image. Assuming we can't leverage an existing image, one question I'm unsure about is the process of creating an "official" image under the apache docker hub namespace. [~elek] - can you share any insights around this process? was (Author: shaneku...@gmail.com): [~ccondit-target] - thanks for looking into this. I see what you mean about the challenge with using that image. I think you are correct that the existing apache/hadoop-runner image serves a different type of use case than we need here. IMO, our target should be an image capable of running MapReduce pi, as that's the example we provide in the docs. If it also works for Spark shell example we provide in our docs, with the appropriate spark install/config, that would be great, but I don't think it's a requirement to start. !/jira/images/icons/emoticons/smile.png! Thinking about what we need to meet that goal, I think a majority of the users we would be targeting with this guide will have all of Hadoop installed on the nodes where these containers are running. Instead of trying to package the latest version of Apache Hadoop as an image, I think our example would be easier to maintain if we guide the user towards bind mounting the Hadoop binaries and configuration from the NodeManager hosts. If we take that approach, I believe the image should only need to include a JDK and set up JAVA_HOME. We might even be able to use an existing openjdk image. Assuming we can't leverage an existing image, one question I'm unsure about is the process of creating an "official" image under the apache docker hub namespace. [~elek] - can you share any insights around this process? > Update Docker examples to use image which exists > > > Key: YARN-8623 > URL: https://issues.apache.org/jira/browse/YARN-8623 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Craig Condit >Priority: Minor > Labels: Docker > > The example Docker image given in the documentation > (images/hadoop-docker:latest) does not exist. We could change > images/hadoop-docker:latest to apache/hadoop-runner:latest, which does exist. > We'd need to do a quick sanity test to see if the image works with YARN. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8623) Update Docker examples to use image which exists
[ https://issues.apache.org/jira/browse/YARN-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576655#comment-16576655 ] Shane Kumpf commented on YARN-8623: --- [~ccondit-target] - thanks for looking into this. I see what you mean about the challenge with using that image. I think you are correct that the existing apache/hadoop-runner image serves a different type of use case than we need here. IMO, our target should be an image capable of running MapReduce pi, as that's the example we provide in the docs. If it also works for Spark shell example we provide in our docs, with the appropriate spark install/config, that would be great, but I don't think it's a requirement to start. !/jira/images/icons/emoticons/smile.png! Thinking about what we need to meet that goal, I think a majority of the users we would be targeting with this guide will have all of Hadoop installed on the nodes where these containers are running. Instead of trying to package the latest version of Apache Hadoop as an image, I think our example would be easier to maintain if we guide the user towards bind mounting the Hadoop binaries and configuration from the NodeManager hosts. If we take that approach, I believe the image should only need to include a JDK and set up JAVA_HOME. We might even be able to use an existing openjdk image. Assuming we can't leverage an existing image, one question I'm unsure about is the process of creating an "official" image under the apache docker hub namespace. [~elek] - can you share any insights around this process? > Update Docker examples to use image which exists > > > Key: YARN-8623 > URL: https://issues.apache.org/jira/browse/YARN-8623 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Craig Condit >Priority: Minor > Labels: Docker > > The example Docker image given in the documentation > (images/hadoop-docker:latest) does not exist. We could change > images/hadoop-docker:latest to apache/hadoop-runner:latest, which does exist. > We'd need to do a quick sanity test to see if the image works with YARN. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-8559: Attachment: YARN-8559-branch-2.001.patch > Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint > > > Key: YARN-8559 > URL: https://issues.apache.org/jira/browse/YARN-8559 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Anna Savarin >Assignee: Weiwei Yang >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8559-branch-2.001.patch, > YARN-8559-branch-3.0.001.patch, YARN-8559.001.patch, YARN-8559.002.patch, > YARN-8559.003.patch, YARN-8559.004.patch > > > All Hadoop services provide a set of common endpoints (/stacks, /logLevel, > /metrics, /jmx, /conf). In the case of the Resource Manager, part of the > configuration comes from the scheduler being used. Currently, these > configuration key/values are not exposed through the /conf endpoint, thereby > revealing an incomplete configuration picture. > Make an improvement and expose the scheduling configuration info through the > RM's /conf endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-8559: Attachment: (was: YARN-8559-branch-2.001.patch) > Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint > > > Key: YARN-8559 > URL: https://issues.apache.org/jira/browse/YARN-8559 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Anna Savarin >Assignee: Weiwei Yang >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8559-branch-2.001.patch, > YARN-8559-branch-3.0.001.patch, YARN-8559.001.patch, YARN-8559.002.patch, > YARN-8559.003.patch, YARN-8559.004.patch > > > All Hadoop services provide a set of common endpoints (/stacks, /logLevel, > /metrics, /jmx, /conf). In the case of the Resource Manager, part of the > configuration comes from the scheduler being used. Currently, these > configuration key/values are not exposed through the /conf endpoint, thereby > revealing an incomplete configuration picture. > Make an improvement and expose the scheduling configuration info through the > RM's /conf endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.
[ https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576613#comment-16576613 ] Sunil Govindan commented on YARN-8561: -- Thanks [~leftnoteasy], Overall looks good to me Will create additional jiras as discussed in this ticket. If there are no objections, I will commit this patch tomorrow. +1 Thanks > [Submarine] Add initial implementation: training job submission and job > history retrieve. > - > > Key: YARN-8561 > URL: https://issues.apache.org/jira/browse/YARN-8561 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: YARN-8561.001.patch, YARN-8561.002.patch, > YARN-8561.003.patch, YARN-8561.004.patch, YARN-8561.005.patch > > > Added following parts: > 1) New subcomponent of YARN, under applications/ project. > 2) Tensorflow training job submission, including training (single node and > distributed). > - Supported Docker container. > - Support GPU isolation. > - Support YARN registry DNS. > 3) Retrieve job history. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7494) Add muti node lookup support for better placement
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576604#comment-16576604 ] Sunil Govindan commented on YARN-7494: -- As discussed, removed updating multiNodePolicyName in Queue interface. This is changed to CSQueue. [~cheersyang] pls help to review. > Add muti node lookup support for better placement > - > > Key: YARN-7494 > URL: https://issues.apache.org/jira/browse/YARN-7494 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7494.001.patch, YARN-7494.002.patch, > YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, > YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, > YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, > YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, > YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png > > > Instead of single node, for effectiveness we can consider a multi node lookup > based on partition to start with. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8635) Container Resource localization fails if umask is 077
[ https://issues.apache.org/jira/browse/YARN-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8635: --- Summary: Container Resource localization fails if umask is 077 (was: Container fails to start if umask is 077) > Container Resource localization fails if umask is 077 > - > > Key: YARN-8635 > URL: https://issues.apache.org/jira/browse/YARN-8635 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Major > > {code} > java.io.IOException: Application application_1533652359071_0001 > initialization failed (exitCode=255) with output: main : command provided 0 > main : run as user is mapred > main : requested yarn user is mapred > Path > /opt/HA/OSBR310/nmlocal/usercache/mapred/appcache/application_1533652359071_0001 > has permission 700 but needs permission 750. > Did not create any app directories > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:411) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1229) > Caused by: > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=255: > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:402) > ... 1 more > Caused by: ExitCodeException exitCode=255: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009) > at org.apache.hadoop.util.Shell.run(Shell.java:902) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) > ... 2 more > 2018-08-08 17:43:26,918 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_e04_1533652359071_0001_01_27 transitioned from > LOCALIZING to LOCALIZATION_FAILED > 2018-08-08 17:43:26,916 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_e04_1533652359071_0001_01_31 startLocalizer is : > 255 > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=255: > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:402) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1229) > Caused by: ExitCodeException exitCode=255: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009) > at org.apache.hadoop.util.Shell.run(Shell.java:902) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) > ... 2 more > 2018-08-08 17:43:26,923 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Localizer failed for containe > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7494) Add muti node lookup support for better placement
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-7494: - Attachment: YARN-7494.14.patch > Add muti node lookup support for better placement > - > > Key: YARN-7494 > URL: https://issues.apache.org/jira/browse/YARN-7494 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7494.001.patch, YARN-7494.002.patch, > YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, > YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, > YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, > YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, > YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png > > > Instead of single node, for effectiveness we can consider a multi node lookup > based on partition to start with. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8520) Document best practice for user management
[ https://issues.apache.org/jira/browse/YARN-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576597#comment-16576597 ] Shane Kumpf commented on YARN-8520: --- Thanks for the updated patch, [~eyang]! +1 on the latest patch. I'll commit this later today if there is no additional feedback. > Document best practice for user management > -- > > Key: YARN-8520 > URL: https://issues.apache.org/jira/browse/YARN-8520 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation, yarn >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8520.001.patch, YARN-8520.002.patch, > YARN-8520.003.patch, YARN-8520.004.patch, YARN-8520.005.patch > > > Docker container must have consistent username and groups with host operating > system when external mount points are exposed to docker container. This > prevents malicious or unauthorized impersonation to occur. This task is to > document the best practice to ensure user and group membership are consistent > across docker containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8520) Document best practice for user management
[ https://issues.apache.org/jira/browse/YARN-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576565#comment-16576565 ] genericqa commented on YARN-8520: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 34m 11s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 32s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 46m 8s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8520 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935150/YARN-8520.005.patch | | Optional Tests | asflicense mvnsite | | uname | Linux aa253326073f 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 0a71bf1 | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 407 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21564/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Document best practice for user management > -- > > Key: YARN-8520 > URL: https://issues.apache.org/jira/browse/YARN-8520 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation, yarn >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8520.001.patch, YARN-8520.002.patch, > YARN-8520.003.patch, YARN-8520.004.patch, YARN-8520.005.patch > > > Docker container must have consistent username and groups with host operating > system when external mount points are exposed to docker container. This > prevents malicious or unauthorized impersonation to occur. This task is to > document the best practice to ensure user and group membership are consistent > across docker containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7935) Expose container's hostname to applications running within the docker container
[ https://issues.apache.org/jira/browse/YARN-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-7935: Parent Issue: YARN-8472 (was: YARN-3611) > Expose container's hostname to applications running within the docker > container > --- > > Key: YARN-7935 > URL: https://issues.apache.org/jira/browse/YARN-7935 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Suma Shivaprasad >Assignee: Suma Shivaprasad >Priority: Major > Labels: Docker > Attachments: YARN-7935.1.patch, YARN-7935.2.patch, YARN-7935.3.patch, > YARN-7935.4.patch > > > Some applications have a need to bind to the container's hostname (like > Spark) which is different from the NodeManager's hostname(NM_HOST which is > available as an env during container launch) when launched through Docker > runtime. The container's hostname can be exposed to applications via an env > CONTAINER_HOSTNAME. Another potential candidate is the container's IP but > this can be addressed in a separate jira. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7994) Add support for network-alias in docker run for user defined networks
[ https://issues.apache.org/jira/browse/YARN-7994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-7994: Parent Issue: YARN-8472 (was: YARN-3611) > Add support for network-alias in docker run for user defined networks > -- > > Key: YARN-7994 > URL: https://issues.apache.org/jira/browse/YARN-7994 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Suma Shivaprasad >Assignee: Suma Shivaprasad >Priority: Major > Labels: Docker > > Docker Embedded DNS supports DNS resolution for containers by one or more of > its configured {{--network-alias}} within a user-defined network. > DockerRunCommand should support this option for DNS resolution to work > through docker embedded DNS -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8520) Document best practice for user management
[ https://issues.apache.org/jira/browse/YARN-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576501#comment-16576501 ] Eric Yang commented on YARN-8520: - [~shaneku...@gmail.com] Thanks for the feedback offline. Patch 005 includes your edits for static user and bind mount /etc/passwd solutions. > Document best practice for user management > -- > > Key: YARN-8520 > URL: https://issues.apache.org/jira/browse/YARN-8520 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation, yarn >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8520.001.patch, YARN-8520.002.patch, > YARN-8520.003.patch, YARN-8520.004.patch, YARN-8520.005.patch > > > Docker container must have consistent username and groups with host operating > system when external mount points are exposed to docker container. This > prevents malicious or unauthorized impersonation to occur. This task is to > document the best practice to ensure user and group membership are consistent > across docker containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8520) Document best practice for user management
[ https://issues.apache.org/jira/browse/YARN-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8520: Attachment: YARN-8520.005.patch > Document best practice for user management > -- > > Key: YARN-8520 > URL: https://issues.apache.org/jira/browse/YARN-8520 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation, yarn >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8520.001.patch, YARN-8520.002.patch, > YARN-8520.003.patch, YARN-8520.004.patch, YARN-8520.005.patch > > > Docker container must have consistent username and groups with host operating > system when external mount points are exposed to docker container. This > prevents malicious or unauthorized impersonation to occur. This task is to > document the best practice to ensure user and group membership are consistent > across docker containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7957) [UI2] Yarn service delete option disappears after stopping application
[ https://issues.apache.org/jira/browse/YARN-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576395#comment-16576395 ] Sunil Govindan commented on YARN-7957: -- Thanks [~akhilpb] Makes sense to me. Pls help to implement same. > [UI2] Yarn service delete option disappears after stopping application > -- > > Key: YARN-7957 > URL: https://issues.apache.org/jira/browse/YARN-7957 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Akhil PB >Priority: Critical > Attachments: YARN-7957.001.patch > > > Steps: > 1) Launch yarn service > 2) Go to service page and click on Setting button->"Stop Service". The > application will be stopped. > 3) Refresh page > Here, setting button disappears. Thus, user can not delete service from UI > after stopping application > Expected behavior: > Setting button should be present on UI page after application is stopped. If > application is stopped, setting button should only have "Delete Service" > action available. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7190) Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath
[ https://issues.apache.org/jira/browse/YARN-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576384#comment-16576384 ] Sean Busbey commented on YARN-7190: --- fix version now updated and filed YARN-8646 for myself to track getting the website updated. > Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user > classpath > > > Key: YARN-7190 > URL: https://issues.apache.org/jira/browse/YARN-7190 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineclient, timelinereader, timelineserver >Affects Versions: 2.9.0, 3.0.1, 3.0.2, 3.0.3, 3.0.x >Reporter: Vrushali C >Assignee: Varun Saxena >Priority: Major > Fix For: YARN-5355_branch2, 3.1.0, 2.9.1, 3.0.3 > > Attachments: YARN-7190-YARN-5355_branch2.01.patch, > YARN-7190-YARN-5355_branch2.02.patch, YARN-7190-YARN-5355_branch2.03.patch, > YARN-7190.01.patch, YARN-7190.02.patch > > > [~jlowe] had a good observation about the user classpath getting extra jars > in hadoop 2.x brought in with TSv2. If users start picking up Hadoop 2,x's > version of HBase jars instead of the ones they shipped with their job, it > could be a problem. > So when TSv2 is to be used in 2,x, the hbase related jars should come into > only the NM classpath not the user classpath. > Here is a list of some jars > {code} > commons-csv-1.0.jar > commons-el-1.0.jar > commons-httpclient-3.1.jar > disruptor-3.3.0.jar > findbugs-annotations-1.3.9-1.jar > hbase-annotations-1.2.6.jar > hbase-client-1.2.6.jar > hbase-common-1.2.6.jar > hbase-hadoop2-compat-1.2.6.jar > hbase-hadoop-compat-1.2.6.jar > hbase-prefix-tree-1.2.6.jar > hbase-procedure-1.2.6.jar > hbase-protocol-1.2.6.jar > hbase-server-1.2.6.jar > htrace-core-3.1.0-incubating.jar > jamon-runtime-2.4.1.jar > jasper-compiler-5.5.23.jar > jasper-runtime-5.5.23.jar > jcodings-1.0.8.jar > joni-2.1.2.jar > jsp-2.1-6.1.14.jar > jsp-api-2.1-6.1.14.jar > jsr311-api-1.1.1.jar > metrics-core-2.2.0.jar > servlet-api-2.5-6.1.14.jar > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8646) republish 3.0.3 release notes so they include YARN-7190
Sean Busbey created YARN-8646: - Summary: republish 3.0.3 release notes so they include YARN-7190 Key: YARN-8646 URL: https://issues.apache.org/jira/browse/YARN-8646 Project: Hadoop YARN Issue Type: Task Components: documentation Affects Versions: 3.0.3 Reporter: Sean Busbey Assignee: Sean Busbey now that 3.0.3 is listed as a fix version for YARN-7190, figure out what needs to happen for the release notes page to include it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7190) Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath
[ https://issues.apache.org/jira/browse/YARN-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated YARN-7190: -- Fix Version/s: 3.0.3 > Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user > classpath > > > Key: YARN-7190 > URL: https://issues.apache.org/jira/browse/YARN-7190 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineclient, timelinereader, timelineserver >Affects Versions: 2.9.0, 3.0.1, 3.0.2, 3.0.3, 3.0.x >Reporter: Vrushali C >Assignee: Varun Saxena >Priority: Major > Fix For: YARN-5355_branch2, 3.1.0, 2.9.1, 3.0.3 > > Attachments: YARN-7190-YARN-5355_branch2.01.patch, > YARN-7190-YARN-5355_branch2.02.patch, YARN-7190-YARN-5355_branch2.03.patch, > YARN-7190.01.patch, YARN-7190.02.patch > > > [~jlowe] had a good observation about the user classpath getting extra jars > in hadoop 2.x brought in with TSv2. If users start picking up Hadoop 2,x's > version of HBase jars instead of the ones they shipped with their job, it > could be a problem. > So when TSv2 is to be used in 2,x, the hbase related jars should come into > only the NM classpath not the user classpath. > Here is a list of some jars > {code} > commons-csv-1.0.jar > commons-el-1.0.jar > commons-httpclient-3.1.jar > disruptor-3.3.0.jar > findbugs-annotations-1.3.9-1.jar > hbase-annotations-1.2.6.jar > hbase-client-1.2.6.jar > hbase-common-1.2.6.jar > hbase-hadoop2-compat-1.2.6.jar > hbase-hadoop-compat-1.2.6.jar > hbase-prefix-tree-1.2.6.jar > hbase-procedure-1.2.6.jar > hbase-protocol-1.2.6.jar > hbase-server-1.2.6.jar > htrace-core-3.1.0-incubating.jar > jamon-runtime-2.4.1.jar > jasper-compiler-5.5.23.jar > jasper-runtime-5.5.23.jar > jcodings-1.0.8.jar > joni-2.1.2.jar > jsp-2.1-6.1.14.jar > jsp-api-2.1-6.1.14.jar > jsr311-api-1.1.1.jar > metrics-core-2.2.0.jar > servlet-api-2.5-6.1.14.jar > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7190) Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath
[ https://issues.apache.org/jira/browse/YARN-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576370#comment-16576370 ] Sean Busbey commented on YARN-7190: --- Today I dug through the git history and branch-3.0. I can confirm that this fix is present in 3.0.3. I've sent an email to yarn-dev@ because I can't edit fix versions yet. Once I can I'll update this and figure out republishing the 3.0.3 release notes. > Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user > classpath > > > Key: YARN-7190 > URL: https://issues.apache.org/jira/browse/YARN-7190 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineclient, timelinereader, timelineserver >Affects Versions: 2.9.0, 3.0.1, 3.0.2, 3.0.3, 3.0.x >Reporter: Vrushali C >Assignee: Varun Saxena >Priority: Major > Fix For: YARN-5355_branch2, 3.1.0, 2.9.1 > > Attachments: YARN-7190-YARN-5355_branch2.01.patch, > YARN-7190-YARN-5355_branch2.02.patch, YARN-7190-YARN-5355_branch2.03.patch, > YARN-7190.01.patch, YARN-7190.02.patch > > > [~jlowe] had a good observation about the user classpath getting extra jars > in hadoop 2.x brought in with TSv2. If users start picking up Hadoop 2,x's > version of HBase jars instead of the ones they shipped with their job, it > could be a problem. > So when TSv2 is to be used in 2,x, the hbase related jars should come into > only the NM classpath not the user classpath. > Here is a list of some jars > {code} > commons-csv-1.0.jar > commons-el-1.0.jar > commons-httpclient-3.1.jar > disruptor-3.3.0.jar > findbugs-annotations-1.3.9-1.jar > hbase-annotations-1.2.6.jar > hbase-client-1.2.6.jar > hbase-common-1.2.6.jar > hbase-hadoop2-compat-1.2.6.jar > hbase-hadoop-compat-1.2.6.jar > hbase-prefix-tree-1.2.6.jar > hbase-procedure-1.2.6.jar > hbase-protocol-1.2.6.jar > hbase-server-1.2.6.jar > htrace-core-3.1.0-incubating.jar > jamon-runtime-2.4.1.jar > jasper-compiler-5.5.23.jar > jasper-runtime-5.5.23.jar > jcodings-1.0.8.jar > joni-2.1.2.jar > jsp-2.1-6.1.14.jar > jsp-api-2.1-6.1.14.jar > jsr311-api-1.1.1.jar > metrics-core-2.2.0.jar > servlet-api-2.5-6.1.14.jar > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576342#comment-16576342 ] Eric Payne commented on YARN-8509: -- [~Zian Chen], can I please get a couple of clarifications? {quote}total_pending(partition,queue) = min {Q_max(partition) - Q_used(partition), Σ (min Unknown macro: \{User.ulf(partition) - User.used(partition), User.pending(partition})}{quote} 1) In the above pseudo-code, what is being summed by the summation? 2) In the above example, queue-a is the only one that's underserved, so the the first round of preemption should actually preempt 6G from queues b and c. The amount preempted from each queue depends on the age of the containers, but you could end up with something like queue-b consuming 40G and pending 30G and queue-c consuming 44G and pending 36G before the second round of preemption, at which point queue-a would be satisfied and only queues b and c have pending resource requests. Since this issue is meant to address the balancing of queues that are over their capacity, I don't understand why queue-a is involved in the above use case. Can you provide a simpler example that only involves the balancing of over-served queues? > Total pending resource calculation in preemption should use user-limit factor > instead of minimum-user-limit-percent > --- > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch, YARN-8509.002.patch, > YARN-8509.003.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8640) Restore previous state in container-executor if write_exit_code_file_as_nm fails
[ https://issues.apache.org/jira/browse/YARN-8640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-8640: -- Attachment: YARN-8640.001.patch > Restore previous state in container-executor if write_exit_code_file_as_nm > fails > > > Key: YARN-8640 > URL: https://issues.apache.org/jira/browse/YARN-8640 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-8640.001.patch > > > The container-executor function {{write_exit_code_file_as_nm}} had a number > of failure conditions where it just returns -1 without restoring previous > state. > This is not a problem in any of the places where it is currently called, but > it could be a problem if future code changes call it before code that depends > on the previous state. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8303) YarnClient should contact TimelineReader for application/attempt/container report
[ https://issues.apache.org/jira/browse/YARN-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576216#comment-16576216 ] genericqa commented on YARN-8303: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 18s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 46s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 3s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 8s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 3m 16s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 19s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 24m 46s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 36s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}120m 46s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.client.api.impl.TestTimelineReaderClientImpl | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8303 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935078/YARN-8303.poc.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 0d7314950243 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (YARN-8644) Make RMAppImpl$FinalTransition more readable + add more test coverage
[ https://issues.apache.org/jira/browse/YARN-8644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576150#comment-16576150 ] genericqa commented on YARN-8644: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 4s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 13s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 16s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 68m 42s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}131m 58s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8644 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935116/YARN-8644.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a5be5d0d 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 0a71bf1 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21561/testReport/ | | Max. process+thread count | 941 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21561/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Make RMAppImpl$FinalTransition more readable +
[jira] [Commented] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator
[ https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576121#comment-16576121 ] Xianghao Lu commented on YARN-8632: --- [~ywskycn] or [~yufeigu] Woud you like to review the patch? Thanks! > No data in file realtimetrack.json after running SchedulerLoadSimulator > --- > > Key: YARN-8632 > URL: https://issues.apache.org/jira/browse/YARN-8632 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Reporter: Xianghao Lu >Priority: Major > Attachments: YARN-8632.001.patch > > > Recently, I have beenning using > [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html] > to validate the impact of changes on my FairScheduler. I encountered some > problems. > Firstly, I fix a npe bug with the patch in > https://issues.apache.org/jira/browse/YARN-4302 > Secondly, Everything seems to be ok, but I just get "[]" in file > realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit > because of npe, > the reason is "wrapper.getQueueSet()" is still null when executing "String > metrics = web.generateRealTimeTrackingMetrics();" > So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" > in try section to avoid MetricsLogRunnable thread exit with unexpected > exception. > My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the > second problem and I have made a patch to solve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7494) Add muti node lookup support for better placement
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576105#comment-16576105 ] genericqa commented on YARN-7494: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 53s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 8s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 68m 9s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}120m 19s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-7494 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935111/YARN-7494.13.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2fa15e46833d 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 0a71bf1 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21560/testReport/ | | Max. process+thread count | 914 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21560/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add muti node lookup support for better placement