[jira] [Commented] (YARN-4117) End to end unit test with mini YARN cluster for AMRMProxy Service
[ https://issues.apache.org/jira/browse/YARN-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168565#comment-15168565 ] Jian He commented on YARN-4117: --- [~giovanni.fumarola], thanks for working on this, a few comments and questions : 1. could you add comments at the header of each test to explain what the test is doing so that it can be understood more easily? 2. this test is based on timing, can we check that AllocateResponse#getAMRMToken actually returns a different token than previous one ? {code} for (int i = 0; i < 5; i++) { client.allocate(request); request.setResponseId(request.getResponseId() + 1); // Time slot to be sure the RM renew the token Thread.sleep(1500); } {code} 3. why are the changes in MiniYarnCluster needed ? 4. what is the purpose of testE2ETokenSwap? > End to end unit test with mini YARN cluster for AMRMProxy Service > - > > Key: YARN-4117 > URL: https://issues.apache.org/jira/browse/YARN-4117 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Reporter: Kishore Chaliparambil >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-4117.v0.patch > > > YARN-2884 introduces a proxy between AM and RM. This JIRA proposes an end to > end unit test using mini YARN cluster to the AMRMProxy service. This test > will validate register, allocate and finish application and token renewal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4579) Allow DefaultContainerExecutor container log directory permissions to be configurable
[ https://issues.apache.org/jira/browse/YARN-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168562#comment-15168562 ] Ray Chiang commented on YARN-4579: -- Thanks for the review and the commit! > Allow DefaultContainerExecutor container log directory permissions to be > configurable > - > > Key: YARN-4579 > URL: https://issues.apache.org/jira/browse/YARN-4579 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.7.1 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Fix For: 2.9.0 > > Attachments: YARN-4579.001.patch, YARN-4579.002.patch, > YARN-4579.003.patch, YARN-4579.004.patch, YARN-4579.005.patch, > YARN-4579.006.patch > > > By default, container directory permissions are hardcoded to this member in > DefaultContainerExecutor: > static final short LOGDIR_PERM = (short)0710; > There are some cases where less restrictive permissions are desired. Make > this configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4718) Rename variables in SchedulerNode to reduce ambiguity post YARN-1011
[ https://issues.apache.org/jira/browse/YARN-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168548#comment-15168548 ] Inigo Goiri commented on YARN-4718: --- The latest errors seem consistent across runs. It seems related to the tokens, any ideas? > Rename variables in SchedulerNode to reduce ambiguity post YARN-1011 > > > Key: YARN-4718 > URL: https://issues.apache.org/jira/browse/YARN-4718 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Inigo Goiri > Attachments: YARN-4718-v000.patch, YARN-4718-v001.patch, > YARN-4718-v002.patch, YARN-4718-v003.patch, YARN-4718-v004.patch > > > As discussed in YARN-1011, we are planning to add a few more variables > regarding utilization to SchedulerNode. To ensure the new variables are > descriptive and don't lead to confusion, the existing variables could use > some renaming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4718) Rename variables in SchedulerNode to reduce ambiguity post YARN-1011
[ https://issues.apache.org/jira/browse/YARN-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168544#comment-15168544 ] Hadoop QA commented on YARN-4718: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 14 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s {color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 0 new + 366 unchanged - 11 fixed = 366 total (was 377) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 20s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 29s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 150m 13s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_72 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | JDK v1.7.0_95 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL |
[jira] [Updated] (YARN-4711) NM is going down with NPE's due to single thread processing of events by Timeline client
[ https://issues.apache.org/jira/browse/YARN-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4711: Description: After YARN-3367, while testing the latest 2928 branch came across few NPEs due to which NM is shutting down. {code} 2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306) at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) at java.lang.Thread.run(Thread.java:745) {code} {code} java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213) at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192) at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63) at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289) at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) at java.lang.Thread.run(Thread.java:745) {code} On analysis found that the there was delay in processing of events, as after YARN-3367 all the events were getting processed by a single thread inside the timeline client. Additionally found one scenario where there is possibility of NPE: * TimelineEntity.toString() when {{real}} is not null was: While testing the latest 2928 branch came across a NPE which is shutting down the NM {code} 2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306) at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) at java.lang.Thread.run(Thread.java:745) {code} Seems to be a race condition ... Few other NPE's faced : {code} java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213) at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192) at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63) at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289) at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) at java.lang.Thread.run(Thread.java:745) {code} * Also there is possibility of NPE in TimelineEntity.toString() when real is not null > NM is going down with NPE's due to single thread processing of events by > Timeline client > > > Key: YARN-4711 > URL: https://issues.apache.org/jira/browse/YARN-4711 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Critical > Labels: yarn-2928-1st-milestone > > After YARN-3367, while testing the latest 2928 branch came across few NPEs > due to which NM is shutting down. > {code} > 2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread >
[jira] [Updated] (YARN-4711) NM is going down with NPE's due to single thread processing of events by Timeline client
[ https://issues.apache.org/jira/browse/YARN-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4711: Summary: NM is going down with NPE's due to single thread processing of events by Timeline client (was: NPE in NMTimelinePublisher) > NM is going down with NPE's due to single thread processing of events by > Timeline client > > > Key: YARN-4711 > URL: https://issues.apache.org/jira/browse/YARN-4711 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Critical > Labels: yarn-2928-1st-milestone > > While testing the latest 2928 branch came across a NPE which is shutting down > the NM > {code} > 2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > Seems to be a race condition ... > Few other NPE's faced : > {code} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > * Also there is possibility of NPE in TimelineEntity.toString() when real is > not null -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928
[ https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168524#comment-15168524 ] Varun Saxena commented on YARN-4712: [~Naganarasimha], We had decided in YARN-4053 that we wont support doubles/floats as of now because of the complexities involved due to its implementation. Moreover, we found that most of the times floats would come for some metric which is represented as a percentage(%) or a ratio. In this case the denominator is fixed so you can easily normalize the value and store it as an integer. And divide it by 100(if its a %) when you read it back. Moreover, in case of percentages a precision higher than 2 decimal places, wont be mighty important. We thought if user/client wanted to store a value/ratio where denominator is not fixed, he can merely store numerator and denominator separately and do the necessary computation after reading it back. We had decided until and unless there was a strong use case, we will go with only longs for now. And we could not think of any use cases for current metrics as of now. And there is a way around it as well if somebody wants to store a float by normalizing the value and storing numerator and denominator separately. > CPU Usage Metric is not captured properly in YARN-2928 > -- > > Key: YARN-4712 > URL: https://issues.apache.org/jira/browse/YARN-4712 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > > There are 2 issues with CPU usage collection > * I was able to observe that that many times CPU usage got from > {{pTree.getCpuUsagePercent()}} is > ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do > the calculation i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore > /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE > check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not > encountered. so proper checks needs to be handled > * {{EntityColumnPrefix.METRIC}} uses always LongConverter but > ContainerMonitor is publishing decimal values for the CPU usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168517#comment-15168517 ] Hadoop QA commented on YARN-4734: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue} 0m 1s {color} | {color:blue} Shelldocs was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 17s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 16s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 45s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 6m 2s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 9m 56s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 5m 1s {color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 27s {color} | {color:red} hadoop-yarn-ui in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 7m 40s {color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 4m 34s {color} | {color:red} root in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 4m 34s {color} | {color:red} root in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 4m 45s {color} | {color:red} root in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 4m 45s {color} | {color:red} root in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 35s {color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvneclipse {color} | {color:red} 0m 45s {color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} shellcheck {color} | {color:red} 0m 13s {color} | {color:red} The applied patch generated 669 new + 98 unchanged - 0 fixed = 767 total (was 98) {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 50 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 1s {color} | {color:red} The patch has 242 line(s) with tabs. {color} | | {color:red}-1{color} | {color:red} xml {color} | {color:red} 0m 2s {color} | {color:red} The patch has 1 ill-formed XML file(s). {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 4m 33s {color} | {color:red} root in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 8m 5s {color} | {color:red} root in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 8s {color} | {color:red} root in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 37s {color} | {color:red} root in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 39s {color} | {color:red} Patch
[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted
[ https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168515#comment-15168515 ] Varun Saxena commented on YARN-4700: [~Naganarasimha], I remember there was a JIRA with you regarding duplicate events being sent from RM(issue is not specific to this branch). What happened to that ? > ATS storage has one extra record each time the RM got restarted > --- > > Key: YARN-4700 > URL: https://issues.apache.org/jira/browse/YARN-4700 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Li Lu >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > > When testing the new web UI for ATS v2, I noticed that we're creating one > extra record for each finished application (but still hold in the RM state > store) each time the RM got restarted. It's quite possible that we add the > cluster start timestamp into the default cluster id, thus each time we're > creating a new record for one application (cluster id is a part of the row > key). We need to fix this behavior, probably by having a better default > cluster id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4740) container complete msg may lost while AM restart in race condition
[ https://issues.apache.org/jira/browse/YARN-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-4740: --- Attachment: YARN-4740.01.patch put containers in finishedContainersSentToAM back to justFinishedContainer if AM restart. > container complete msg may lost while AM restart in race condition > -- > > Key: YARN-4740 > URL: https://issues.apache.org/jira/browse/YARN-4740 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: sandflee > Attachments: YARN-4740.01.patch > > > 1, container completed, and the msg is store in > RMAppAttempt.justFinishedContainers > 2, AM allocate and before allocateResponse came to AM, AM crashed > 3, AM restart and couldn't get the container complete msg. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4740) container complete msg may lost while AM restart in race condition
sandflee created YARN-4740: -- Summary: container complete msg may lost while AM restart in race condition Key: YARN-4740 URL: https://issues.apache.org/jira/browse/YARN-4740 Project: Hadoop YARN Issue Type: Bug Reporter: sandflee Assignee: sandflee 1, container completed, and the msg is store in RMAppAttempt.justFinishedContainers 2, AM allocate and before allocateResponse came to AM, AM crashed 3, AM restart and couldn't get the container complete msg. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted
[ https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168443#comment-15168443 ] Naganarasimha G R commented on YARN-4700: - Oops My mistake, i dint chk the default values set in the code just went by the discussion we had. Yes we resend the started and stopped events on recovery but with the same set of data, will try to verify whats making it enter as a new row with diff values. > ATS storage has one extra record each time the RM got restarted > --- > > Key: YARN-4700 > URL: https://issues.apache.org/jira/browse/YARN-4700 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Li Lu >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > > When testing the new web UI for ATS v2, I noticed that we're creating one > extra record for each finished application (but still hold in the RM state > store) each time the RM got restarted. It's quite possible that we add the > cluster start timestamp into the default cluster id, thus each time we're > creating a new record for one application (cluster id is a part of the row > key). We need to fix this behavior, probably by having a better default > cluster id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4359) Update LowCost agents logic to take advantage of YARN-4358
[ https://issues.apache.org/jira/browse/YARN-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168420#comment-15168420 ] Hadoop QA commented on YARN-4359: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 15 new + 27 unchanged - 1 fixed = 42 total (was 28) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 24s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 21s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 2m 44s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95 with JDK v1.7.0_95 generated 2 new + 2 unchanged - 0 fixed = 4 total (was 2) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 39s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 44s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 152m 18s {color} | {color:black} {color} | \\ \\ || Reason || Tests
[jira] [Commented] (YARN-4671) There is no need to acquire CS lock when completing a container
[ https://issues.apache.org/jira/browse/YARN-4671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168390#comment-15168390 ] Jian He commented on YARN-4671: --- [~mding], the patch does not apply now.. would you mind updating ? > There is no need to acquire CS lock when completing a container > --- > > Key: YARN-4671 > URL: https://issues.apache.org/jira/browse/YARN-4671 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4671.1.patch > > > In YARN-4519, we discovered that there is no need to acquire CS lock in > CS#completedContainerInternal, because: > * Access to critical section are already guarded by queue lock. > * It is not essential to guard {{schedulerHealth}} with cs lock in > completedContainerInternal. All maps in schedulerHealth are concurrent maps. > Even if schedulerHealth is not consistent at the moment, it will be > eventually consistent. > With this fix, we can truly claim that CS#allocate doesn't require CS lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4739) TestCase Failure : TestMiniYarnClusterNodeUtilization#testUpdateNodeUtilization
[ https://issues.apache.org/jira/browse/YARN-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168386#comment-15168386 ] Brahma Reddy Battula commented on YARN-4739: Yes, It is dupe of YARN-4566.. Thanks. > TestCase Failure : > TestMiniYarnClusterNodeUtilization#testUpdateNodeUtilization > > > Key: YARN-4739 > URL: https://issues.apache.org/jira/browse/YARN-4739 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > > {noformat} > 1 tests failed. > FAILED: > org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization > Error Message: > null > Stack Trace: > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.verifySimulatedUtilization(TestMiniYarnClusterNodeUtilization.java:217) > at > org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization(TestMiniYarnClusterNodeUtilization.java:116) > {noformat} > Ref : > https://builds.apache.org/job/Hadoop-Yarn-trunk/1820/ > https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/1109/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4739) TestCase Failure : TestMiniYarnClusterNodeUtilization#testUpdateNodeUtilization
[ https://issues.apache.org/jira/browse/YARN-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula resolved YARN-4739. Resolution: Duplicate > TestCase Failure : > TestMiniYarnClusterNodeUtilization#testUpdateNodeUtilization > > > Key: YARN-4739 > URL: https://issues.apache.org/jira/browse/YARN-4739 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > > {noformat} > 1 tests failed. > FAILED: > org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization > Error Message: > null > Stack Trace: > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.verifySimulatedUtilization(TestMiniYarnClusterNodeUtilization.java:217) > at > org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization(TestMiniYarnClusterNodeUtilization.java:116) > {noformat} > Ref : > https://builds.apache.org/job/Hadoop-Yarn-trunk/1820/ > https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/1109/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4720) Skip unnecessary NN operations in log aggregation
[ https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168379#comment-15168379 ] Hadoop QA commented on YARN-4720: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 1 new + 18 unchanged - 1 fixed = 19 total (was 19) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 2s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 30s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 33m 8s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12790051/YARN-4720.05.patch | | JIRA Issue | YARN-4720 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux b196f627fb17 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (YARN-4739) TestCase Failure : TestMiniYarnClusterNodeUtilization#testUpdateNodeUtilization
[ https://issues.apache.org/jira/browse/YARN-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168371#comment-15168371 ] Rohith Sharma K S commented on YARN-4739: - Is it dupe of YARN-4566? > TestCase Failure : > TestMiniYarnClusterNodeUtilization#testUpdateNodeUtilization > > > Key: YARN-4739 > URL: https://issues.apache.org/jira/browse/YARN-4739 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > > {noformat} > 1 tests failed. > FAILED: > org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization > Error Message: > null > Stack Trace: > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.verifySimulatedUtilization(TestMiniYarnClusterNodeUtilization.java:217) > at > org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization(TestMiniYarnClusterNodeUtilization.java:116) > {noformat} > Ref : > https://builds.apache.org/job/Hadoop-Yarn-trunk/1820/ > https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/1109/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4739) TestCase Failure : TestMiniYarnClusterNodeUtilization#testUpdateNodeUtilization
Brahma Reddy Battula created YARN-4739: -- Summary: TestCase Failure : TestMiniYarnClusterNodeUtilization#testUpdateNodeUtilization Key: YARN-4739 URL: https://issues.apache.org/jira/browse/YARN-4739 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula {noformat} 1 tests failed. FAILED: org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization Error Message: null Stack Trace: java.lang.NullPointerException: null at org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.verifySimulatedUtilization(TestMiniYarnClusterNodeUtilization.java:217) at org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization(TestMiniYarnClusterNodeUtilization.java:116) {noformat} Ref : https://builds.apache.org/job/Hadoop-Yarn-trunk/1820/ https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/1109/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started
[ https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168339#comment-15168339 ] Hadoop QA commented on YARN-4686: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 11s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 15s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 3s {color} | {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 15s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 6s {color} | {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 8s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 167m 2s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_72 Failed junit tests |
[jira] [Updated] (YARN-4718) Rename variables in SchedulerNode to reduce ambiguity post YARN-1011
[ https://issues.apache.org/jira/browse/YARN-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-4718: -- Attachment: YARN-4718-v004.patch Removing all changes in the user side. > Rename variables in SchedulerNode to reduce ambiguity post YARN-1011 > > > Key: YARN-4718 > URL: https://issues.apache.org/jira/browse/YARN-4718 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Inigo Goiri > Attachments: YARN-4718-v000.patch, YARN-4718-v001.patch, > YARN-4718-v002.patch, YARN-4718-v003.patch, YARN-4718-v004.patch > > > As discussed in YARN-1011, we are planning to add a few more variables > regarding utilization to SchedulerNode. To ensure the new variables are > descriptive and don't lead to confusion, the existing variables could use > some renaming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4624) NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI
[ https://issues.apache.org/jira/browse/YARN-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168285#comment-15168285 ] Rohith Sharma K S commented on YARN-4624: - Triggered jenkins manually > NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI > --- > > Key: YARN-4624 > URL: https://issues.apache.org/jira/browse/YARN-4624 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: SchedulerUIWithOutLabelMapping.png, YARN-2674-002.patch, > YARN-4624-003.patch, YARN-4624.patch > > > Scenario: > === > Configure nodelables and add to cluster > Start the cluster > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.PartitionQueueCapacitiesInfo.getMaxAMLimitPercentage(PartitionQueueCapacitiesInfo.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:105) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:94) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:293) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:447) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4624) NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI
[ https://issues.apache.org/jira/browse/YARN-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168281#comment-15168281 ] Rohith Sharma K S commented on YARN-4624: - +1 LGTM, pending jenkins > NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI > --- > > Key: YARN-4624 > URL: https://issues.apache.org/jira/browse/YARN-4624 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: SchedulerUIWithOutLabelMapping.png, YARN-2674-002.patch, > YARN-4624-003.patch, YARN-4624.patch > > > Scenario: > === > Configure nodelables and add to cluster > Start the cluster > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.PartitionQueueCapacitiesInfo.getMaxAMLimitPercentage(PartitionQueueCapacitiesInfo.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:105) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:94) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:293) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:447) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4720) Skip unnecessary NN operations in log aggregation
[ https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168274#comment-15168274 ] Jun Gong commented on YARN-4720: Thanks [~mingma] for suggestions. I attached a wrong version patch(YARN-4720.04.patch)... The new patch fixed the problem and added a new test. > Skip unnecessary NN operations in log aggregation > - > > Key: YARN-4720 > URL: https://issues.apache.org/jira/browse/YARN-4720 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Jun Gong > Attachments: YARN-4720.01.patch, YARN-4720.02.patch, > YARN-4720.03.patch, YARN-4720.04.patch, YARN-4720.05.patch > > > Log aggregation service could have unnecessary NN operations in the following > scenarios: > * No new local log has been created since the last upload for the long > running service scenario. > * NM uses {{ContainerLogAggregationPolicy}} that skips log aggregation for > certain containers. > In the following code snippet, even though {{pendingContainerInThisCycle}} is > empty, it still creates the writer and then removes the file later. Thus it > introduces unnecessary create/getfileinfo/delete NN calls when NM doesn't > aggregate logs for an app. > > {noformat} > AppLogAggregatorImpl.java > .. > writer = > new LogWriter(this.conf, this.remoteNodeTmpLogFileForApp, > this.userUgi); > .. > for (ContainerId container : pendingContainerInThisCycle) { > .. > } > .. > if (remoteFS.exists(remoteNodeTmpLogFileForApp)) { > if (rename) { > remoteFS.rename(remoteNodeTmpLogFileForApp, renamedPath); > } else { > remoteFS.delete(remoteNodeTmpLogFileForApp, false); > } > } > .. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4720) Skip unnecessary NN operations in log aggregation
[ https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong updated YARN-4720: --- Attachment: YARN-4720.05.patch > Skip unnecessary NN operations in log aggregation > - > > Key: YARN-4720 > URL: https://issues.apache.org/jira/browse/YARN-4720 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Jun Gong > Attachments: YARN-4720.01.patch, YARN-4720.02.patch, > YARN-4720.03.patch, YARN-4720.04.patch, YARN-4720.05.patch > > > Log aggregation service could have unnecessary NN operations in the following > scenarios: > * No new local log has been created since the last upload for the long > running service scenario. > * NM uses {{ContainerLogAggregationPolicy}} that skips log aggregation for > certain containers. > In the following code snippet, even though {{pendingContainerInThisCycle}} is > empty, it still creates the writer and then removes the file later. Thus it > introduces unnecessary create/getfileinfo/delete NN calls when NM doesn't > aggregate logs for an app. > > {noformat} > AppLogAggregatorImpl.java > .. > writer = > new LogWriter(this.conf, this.remoteNodeTmpLogFileForApp, > this.userUgi); > .. > for (ContainerId container : pendingContainerInThisCycle) { > .. > } > .. > if (remoteFS.exists(remoteNodeTmpLogFileForApp)) { > if (rename) { > remoteFS.rename(remoteNodeTmpLogFileForApp, renamedPath); > } else { > remoteFS.delete(remoteNodeTmpLogFileForApp, false); > } > } > .. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4734: - Attachment: YARN-4734.3.patch Attached a WIP ver.3 patch, integrated build to existing Hadoop maven build. Some items still pending, for example, add manual steps for user to load builded war package and run in Tomcat. bq. I don't think hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui the correct place given this appears to be a strictly server-side component. It should be moved to be under hadoop-yarn-server minimally. Since this is JS library running in client side, I would suggest to keep it here, it could be used to create client side tool, for example, we can create a tool using the framework running on client node to analysis application logs in local machine. Will address rest comments in following patches. > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled
[ https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168241#comment-15168241 ] Wangda Tan commented on YARN-4465: -- Thanks [~bibinchundatt], Generally looks good, one minor comment: Could you avoid show all cluster labels in exception when label doesn't exist in cluster? {code} 337 if (nlm != null && !nlm.containsNodeLabel(labelExpression)) { 338 throw new InvalidLabelResourceRequestException( 339 "Invalid label resource request, cluster do not contain " 340 + ", label= " + labelExpression + " Cluster labels= " 341 + StringUtils.join(nlm.getClusterNodeLabels(), ',')); 315 } 342 } {code} A cluster could have many labels. And could you explain why change of TestKillApplicationWithRMHA is needed? > SchedulerUtils#validateRequest for Label check should happen only when > nodelabel enabled > > > Key: YARN-4465 > URL: https://issues.apache.org/jira/browse/YARN-4465 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: 0001-YARN-4465.patch, 0002-YARN-4465.patch, > 0003-YARN-4465.patch, 0004-YARN-4465.patch, 0006-YARN-4465.patch > > > Disable label from rm side yarn.nodelabel.enable=false > Capacity scheduler label configuration for queue is available as below > default label for queue = b1 as 3 and accessible labels as 1,3 > Submit application to queue A . > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException): > Invalid resource request, queue=b1 doesn't have permission to access all > labels in resource request. labelExpression of resource request=3. Queue > labels=1,3 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247) > {noformat} > # Ignore default label expression when label is disabled *or* > # NormalizeResourceRequest we can set label expression to > when node label is not enabled *or* > # Improve message -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4579) Allow DefaultContainerExecutor container log directory permissions to be configurable
[ https://issues.apache.org/jira/browse/YARN-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168225#comment-15168225 ] Hudson commented on YARN-4579: -- FAILURE: Integrated in Hadoop-trunk-Commit #9372 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9372/]) YARN-4579. Allow DefaultContainerExecutor container log directory (rkanter: rev d7fdec1e6b465395d2faca6e404e329d20f6c3d8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt > Allow DefaultContainerExecutor container log directory permissions to be > configurable > - > > Key: YARN-4579 > URL: https://issues.apache.org/jira/browse/YARN-4579 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.7.1 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Fix For: 2.9.0 > > Attachments: YARN-4579.001.patch, YARN-4579.002.patch, > YARN-4579.003.patch, YARN-4579.004.patch, YARN-4579.005.patch, > YARN-4579.006.patch > > > By default, container directory permissions are hardcoded to this member in > DefaultContainerExecutor: > static final short LOGDIR_PERM = (short)0710; > There are some cases where less restrictive permissions are desired. Make > this configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4359) Update LowCost agents logic to take advantage of YARN-4358
[ https://issues.apache.org/jira/browse/YARN-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishai Menache updated YARN-4359: Attachment: (was: YARN-4359.2.patch) > Update LowCost agents logic to take advantage of YARN-4358 > -- > > Key: YARN-4359 > URL: https://issues.apache.org/jira/browse/YARN-4359 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Ishai Menache > Attachments: YARN-4359.0.patch, YARN-4359.3.patch > > > Given the improvements of YARN-4358, the LowCost agent should be improved to > leverage this, and operate on RLESparseResourceAllocation (ideally leveraging > the improvements of YARN-3454 to compute avaialable resources) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4359) Update LowCost agents logic to take advantage of YARN-4358
[ https://issues.apache.org/jira/browse/YARN-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishai Menache updated YARN-4359: Attachment: YARN-4359.3.patch > Update LowCost agents logic to take advantage of YARN-4358 > -- > > Key: YARN-4359 > URL: https://issues.apache.org/jira/browse/YARN-4359 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Ishai Menache > Attachments: YARN-4359.0.patch, YARN-4359.3.patch > > > Given the improvements of YARN-4358, the LowCost agent should be improved to > leverage this, and operate on RLESparseResourceAllocation (ideally leveraging > the improvements of YARN-3454 to compute avaialable resources) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4711) NPE in NMTimelinePublisher
[ https://issues.apache.org/jira/browse/YARN-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168183#comment-15168183 ] Sangjin Lee commented on YARN-4711: --- As discussed offline, probably we should re-characterize this JIRA to address the low throughput issue with single-threaded writes (of sync entities). > NPE in NMTimelinePublisher > -- > > Key: YARN-4711 > URL: https://issues.apache.org/jira/browse/YARN-4711 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Critical > Labels: yarn-2928-1st-milestone > > While testing the latest 2928 branch came across a NPE which is shutting down > the NM > {code} > 2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > Seems to be a race condition ... > Few other NPE's faced : > {code} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > * Also there is possibility of NPE in TimelineEntity.toString() when real is > not null -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted
[ https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168173#comment-15168173 ] Sangjin Lee commented on YARN-4700: --- I see. This is regarding the flow activity table, right? Writes to the flow activity table are done when an APPLICATION_CREATED_EVENT is handled. Does a RM restart trigger a duplicate APPLICATION_CREATED_EVENT for existing apps? > ATS storage has one extra record each time the RM got restarted > --- > > Key: YARN-4700 > URL: https://issues.apache.org/jira/browse/YARN-4700 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Li Lu >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > > When testing the new web UI for ATS v2, I noticed that we're creating one > extra record for each finished application (but still hold in the RM state > store) each time the RM got restarted. It's quite possible that we add the > cluster start timestamp into the default cluster id, thus each time we're > creating a new record for one application (cluster id is a part of the row > key). We need to fix this behavior, probably by having a better default > cluster id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4736) Issues with HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168159#comment-15168159 ] Sangjin Lee commented on YARN-4736: --- This could be a bug in HBase. It seems like the HBase cluster was already shut down, but the flush operation took a long time to finally error out (36 minutes): {noformat} 2016-02-26 00:02:28,270 INFO org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager: The collector service for application_1456425026132_0001 was removed 2016-02-26 00:39:03,879 ERROR org.apache.hadoop.hbase.client.AsyncProcess: Failed to get region location org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: Fri Feb 26 00:39:03 IST 2016, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=68065: row 'timelineservice.entity,naga!yarn_cluster!flow_1456425026132_1!���!M�����!YARN_CONTAINER!container_1456425026132_0001_01_01,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=localhost,16201,1456365764939, seqNum=0 at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:264) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:215) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:56) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) at org.apache.hadoop.hbase.client.ClientSmallReversedScanner.loadCache(ClientSmallReversedScanner.java:211) at org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:185) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1200) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1109) at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:369) at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:320) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:206) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:183) at org.apache.hadoop.yarn.server.timelineservice.storage.common.BufferedMutatorDelegator.flush(BufferedMutatorDelegator.java:66) at org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.flush(HBaseTimelineWriterImpl.java:457) at org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager$WriterFlushTask.run(TimelineCollectorManager.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.SocketTimeoutException: callTimeout=6, callDuration=68065: row 'timelineservice.entity,naga!yarn_cluster!flow_1456425026132_1!���!M�����!YARN_CONTAINER!container_1456425026132_0001_01_01,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=localhost,16201,1456365764939, seqNum=0 at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:310) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:291) at java.util.concurrent.FutureTask.run(FutureTask.java:262) ... 3 more Caused by: java.net.ConnectException: Connection refused {noformat} But this seems to have put the HBase client stuck in the flush state. That thread seems to be stuck: {noformat} "pool-14-thread-1" prio=10 tid=0x7f4215268000 nid=0x46e6 waiting on condition [0x7f41fe75d000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xeeb5a010> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at
[jira] [Updated] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started
[ https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-4686: -- Attachment: YARN-4686.001.patch Transition 1 RM to active in MiniYarnCluster start. Also improve MiniYarnCluster start by waiting for Scheduler to see all NMs. > MiniYARNCluster.start() returns before cluster is completely started > > > Key: YARN-4686 > URL: https://issues.apache.org/jira/browse/YARN-4686 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Rohith Sharma K S >Assignee: Eric Badger > Attachments: MAPREDUCE-6507.001.patch, YARN-4686.001.patch > > > TestRMNMInfo fails intermittently. Below is trace for the failure > {noformat} > testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo) Time elapsed: 0.28 > sec <<< FAILURE! > java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but > was:<3> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4718) Rename variables in SchedulerNode to reduce ambiguity post YARN-1011
[ https://issues.apache.org/jira/browse/YARN-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168145#comment-15168145 ] Hadoop QA commented on YARN-4718: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 17 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 23s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 7 new + 488 unchanged - 21 fixed = 495 total (was 509) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 18s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 2m 32s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95 with JDK v1.7.0_95 generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 19s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 32s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 150m 39s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_72 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices | | |
[jira] [Created] (YARN-4738) Notify the RM about the status of OPPORTUNISTIC containers
Konstantinos Karanasos created YARN-4738: Summary: Notify the RM about the status of OPPORTUNISTIC containers Key: YARN-4738 URL: https://issues.apache.org/jira/browse/YARN-4738 Project: Hadoop YARN Issue Type: Sub-task Reporter: Konstantinos Karanasos Assignee: Konstantinos Karanasos When an OPPORTUNISTIC container finishes its execution (either successfully or because it failed/got killed), the RM needs to be notified. This way the AM also gets notified in turn about the successfully completed tasks, as well as for rescheduling failed/killed tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2883) Queuing of container requests in the NM
[ https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-2883: - Attachment: YARN-2883-yarn-2877.001.patch I am attaching an initial version of the patch for this JIRA. Please let me know how it looks. > Queuing of container requests in the NM > --- > > Key: YARN-2883 > URL: https://issues.apache.org/jira/browse/YARN-2883 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Attachments: YARN-2883-yarn-2877.001.patch > > > We propose to add a queue in each NM, where queueable container requests can > be held. > Based on the available resources in the node and the containers in the queue, > the NM will decide when to allow the execution of a queued container. > In order to ensure the instantaneous start of a guaranteed-start container, > the NM may decide to pre-empt/kill running queueable containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted
[ https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168104#comment-15168104 ] Li Lu commented on YARN-4700: - [~sjlee0] I checked the flowrun activity table and I can see the row keys for the same flow is like this: {code} Current count: 1, row: yarn_cluster!\x7F\xFF\xFE\xAC\xEFl\xBB\xFF!llu!flow_1445894691726_1 ... Current count: 10, row: yarn_cluster!\x7F\xFF\xFE\xAD7\x85\xC3\xFF!llu!flow_1445894691726_1 ... Current count: 19, row: yarn_cluster!\x7F\xFF\xFE\xAD<\xAC\x1F\xFF!llu!flow_1445894691726_1 ... {code} >From FlowActivityTable, I can see the row key contains inv_top_of_day, which >is the reason we have duplicated flows. However, for each of the flows, I only >ran it for once, and never touch them again. By design this should not >regenerate any new flow activities? Is this related to RM restart? > ATS storage has one extra record each time the RM got restarted > --- > > Key: YARN-4700 > URL: https://issues.apache.org/jira/browse/YARN-4700 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Li Lu >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > > When testing the new web UI for ATS v2, I noticed that we're creating one > extra record for each finished application (but still hold in the RM state > store) each time the RM got restarted. It's quite possible that we add the > cluster start timestamp into the default cluster id, thus each time we're > creating a new record for one application (cluster id is a part of the row > key). We need to fix this behavior, probably by having a better default > cluster id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4723) NodesListManager$UnknownNodeId ClassCastException
[ https://issues.apache.org/jira/browse/YARN-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168096#comment-15168096 ] Hadoop QA commented on YARN-4723: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 45s {color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} branch-2.7 passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} branch-2.7 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s {color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s {color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} branch-2.7 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 9s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager in branch-2.7 has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} branch-2.7 passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} branch-2.7 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1486 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 41s {color} | {color:red} The patch has 99 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 13s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 51m 4s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 44m 56s {color} | {color:red} Patch generated 77 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 163m 19s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_72 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_95 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | |
[jira] [Commented] (YARN-4723) NodesListManager$UnknownNodeId ClassCastException
[ https://issues.apache.org/jira/browse/YARN-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168092#comment-15168092 ] Kuhu Shukla commented on YARN-4723: --- {{TestSystemMetricsPublisher}} and {{TestZKRMStateStore}} run fine locally and I don't see correlation between the change and the test failure. Requesting [~jlowe] for more comments/review. Thank you very much! > NodesListManager$UnknownNodeId ClassCastException > - > > Key: YARN-4723 > URL: https://issues.apache.org/jira/browse/YARN-4723 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.3 >Reporter: Jason Lowe >Assignee: Kuhu Shukla >Priority: Critical > Attachments: YARN-4723-branch-2.7.001.patch, YARN-4723.001.patch, > YARN-4723.002.patch > > > Saw the following in an RM log: > {noformat} > 2016-02-16 22:55:35,207 [IPC Server handler 5 on 8030] WARN ipc.Server: IPC > Server handler 5 on 8030, call > org.apache.hadoop.ipc.ProtobufRpcEngine$Server@6c403aff > java.lang.ClassCastException: > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId > cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:247) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:271) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:220) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.convertToProtoFormat(AllocateResponsePBImpl.java:712) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.access$500(AllocateResponsePBImpl.java:68) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.next(AllocateResponsePBImpl.java:658) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.next(AllocateResponsePBImpl.java:647) > at > com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336) > at > com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323) > at > org.apache.hadoop.yarn.proto.YarnServiceProtos$AllocateResponseProto$Builder.addAllUpdatedNodes(YarnServiceProtos.java:9335) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToBuilder(AllocateResponsePBImpl.java:144) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToProto(AllocateResponsePBImpl.java:175) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.getProto(AllocateResponsePBImpl.java:96) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:61) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server.call(Server.java:2267) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928
[ https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-4712: -- Labels: yarn-2928-1st-milestone (was: ) > CPU Usage Metric is not captured properly in YARN-2928 > -- > > Key: YARN-4712 > URL: https://issues.apache.org/jira/browse/YARN-4712 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > > There are 2 issues with CPU usage collection > * I was able to observe that that many times CPU usage got from > {{pTree.getCpuUsagePercent()}} is > ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do > the calculation i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore > /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE > check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not > encountered. so proper checks needs to be handled > * {{EntityColumnPrefix.METRIC}} uses always LongConverter but > ContainerMonitor is publishing decimal values for the CPU usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4736) Issues with HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-4736: -- Labels: yarn-2928-1st-milestone (was: ) > Issues with HBaseTimelineWriterImpl > --- > > Key: YARN-4736 > URL: https://issues.apache.org/jira/browse/YARN-4736 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Naganarasimha G R >Assignee: Vrushali C >Priority: Critical > Labels: yarn-2928-1st-milestone > Attachments: hbaseException.log, threaddump.log > > > Faced some issues while running ATSv2 in single node Hadoop cluster and in > the same node had launched Hbase with embedded zookeeper. > # Due to some NPE issues i was able to see NM was trying to shutdown, but the > NM daemon process was not completed due to the locks. > # Got some exception related to Hbase after application finished execution > successfully. > will attach logs and the trace for the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4700) ATS storage has one extra record each time the RM got restarted
[ https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-4700: -- Labels: yarn-2928-1st-milestone (was: ) > ATS storage has one extra record each time the RM got restarted > --- > > Key: YARN-4700 > URL: https://issues.apache.org/jira/browse/YARN-4700 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Li Lu >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > > When testing the new web UI for ATS v2, I noticed that we're creating one > extra record for each finished application (but still hold in the RM state > store) each time the RM got restarted. It's quite possible that we add the > cluster start timestamp into the default cluster id, thus each time we're > creating a new record for one application (cluster id is a part of the row > key). We need to fix this behavior, probably by having a better default > cluster id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4711) NPE in NMTimelinePublisher
[ https://issues.apache.org/jira/browse/YARN-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-4711: -- Labels: yarn-2928-1st-milestone (was: ) > NPE in NMTimelinePublisher > -- > > Key: YARN-4711 > URL: https://issues.apache.org/jira/browse/YARN-4711 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Critical > Labels: yarn-2928-1st-milestone > > While testing the latest 2928 branch came across a NPE which is shutting down > the NM > {code} > 2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > Seems to be a race condition ... > Few other NPE's faced : > {code} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > * Also there is possibility of NPE in TimelineEntity.toString() when real is > not null -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4579) Allow DefaultContainerExecutor container log directory permissions to be configurable
[ https://issues.apache.org/jira/browse/YARN-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168087#comment-15168087 ] Robert Kanter commented on YARN-4579: - +1 > Allow DefaultContainerExecutor container log directory permissions to be > configurable > - > > Key: YARN-4579 > URL: https://issues.apache.org/jira/browse/YARN-4579 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.7.1 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Attachments: YARN-4579.001.patch, YARN-4579.002.patch, > YARN-4579.003.patch, YARN-4579.004.patch, YARN-4579.005.patch, > YARN-4579.006.patch > > > By default, container directory permissions are hardcoded to this member in > DefaultContainerExecutor: > static final short LOGDIR_PERM = (short)0710; > There are some cases where less restrictive permissions are desired. Make > this configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted
[ https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168080#comment-15168080 ] Sangjin Lee commented on YARN-4700: --- These are the current default values that are being set as the context info (which is then used as part of the row key): {code:title=AppLevelTimelineCollector.java} context.setClusterId(conf.get(YarnConfiguration.RM_CLUSTER_ID, YarnConfiguration.DEFAULT_RM_CLUSTER_ID)); // Set the default values, which will be updated with an RPC call to get the // context info from NM. // Current user usually is not the app user, but keep this field non-null context.setUserId(UserGroupInformation.getCurrentUser().getShortUserName()); // Use app ID to generate a default flow name for orphan app context.setFlowName( TimelineUtils.generateDefaultFlowNameBasedOnAppId(appId)); // Set the flow version to string 1 if it's an orphan app context.setFlowVersion("1"); // Set the flow run ID to 1 if it's an orphan app context.setFlowRunId(1L); context.setAppId(appId.toString()); {code} The flow name, version, and the run id may be overridden if the application sets the YARN tag. > ATS storage has one extra record each time the RM got restarted > --- > > Key: YARN-4700 > URL: https://issues.apache.org/jira/browse/YARN-4700 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Li Lu >Assignee: Naganarasimha G R > > When testing the new web UI for ATS v2, I noticed that we're creating one > extra record for each finished application (but still hold in the RM state > store) each time the RM got restarted. It's quite possible that we add the > cluster start timestamp into the default cluster id, thus each time we're > creating a new record for one application (cluster id is a part of the row > key). We need to fix this behavior, probably by having a better default > cluster id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4723) NodesListManager$UnknownNodeId ClassCastException
[ https://issues.apache.org/jira/browse/YARN-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168069#comment-15168069 ] Hadoop QA commented on YARN-4723: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 23s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 30s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 149m 55s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_72 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher | | | hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_95 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL |
[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted
[ https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168063#comment-15168063 ] Sangjin Lee commented on YARN-4700: --- [~gtCarrera9], can you provide the exact row keys in this problem scenario? From what I can see in the code, if you're not setting the cluster id in the configuration (under key "yarn.resourcemanager.cluster-id"), it should be the default value ("yarn_cluster"). I'm not sure where the duplicate keys are coming from. Once we see the duplicate row keys for the same app, it should become clearer. Are you seeing this with a DS app or a MR app? > ATS storage has one extra record each time the RM got restarted > --- > > Key: YARN-4700 > URL: https://issues.apache.org/jira/browse/YARN-4700 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Li Lu >Assignee: Naganarasimha G R > > When testing the new web UI for ATS v2, I noticed that we're creating one > extra record for each finished application (but still hold in the RM state > store) each time the RM got restarted. It's quite possible that we add the > cluster start timestamp into the default cluster id, thus each time we're > creating a new record for one application (cluster id is a part of the row > key). We need to fix this behavior, probably by having a better default > cluster id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4359) Update LowCost agents logic to take advantage of YARN-4358
[ https://issues.apache.org/jira/browse/YARN-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168055#comment-15168055 ] Hadoop QA commented on YARN-4359: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 16s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 13s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 13s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 16s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 16s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 15 new + 25 unchanged - 1 fixed = 40 total (was 26) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 17s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 2 line(s) with tabs. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 16s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 47s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_72 with JDK v1.8.0_72 generated 3 new + 100 unchanged - 0 fixed = 103 total (was 100) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 1m 25s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95 with JDK v1.7.0_95 generated 3 new + 2 unchanged - 0 fixed = 5 total (was 2) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 13s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 15s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed
[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168029#comment-15168029 ] Allen Wittenauer commented on YARN-4734: Also: bq. Since now we haven't integrated web UI building to Hadoop's build system, it is not required for now. Why hasn't this been integrated? It makes me extremely uneasy that you're proposing to merge a branch where that branch's JIRAs are still mostly open and has had zero integration work done, especially given that a vote was called on an obviously broken code base. Something else: I don't think hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui the correct place given this appears to be a strictly server-side component. It should be moved to be under hadoop-yarn-server minimally. > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.2.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168008#comment-15168008 ] Allen Wittenauer commented on YARN-4734: bq. license and notice issues on dependencies Specifically, hadoop/LICENSE and hadoop/NOTICE need to have the appropriate updates made in order to comply with ASF and your new dependencies' legal requirements. bq. My understanding is Dockerfile will be used by Yetus to run build, correct? Since now we haven't integrated web UI building to Hadoop's build system, it is not required for now. The Dockerfile's primary purpose in life is to support start-build-env.sh as part of HADOOP-11843 which predates Yetus by months. Anyway, the Dockerfile should have *every* dependency needed for someone to build _and develop_ Apache Hadoop. Given the content of the README.MD, this is most definitely required, otherwise HADOOP-11843 is no longer true. Speaking of, why is this a separate README.MD here and not part of BUILDING.txt or something less hidden? bq. Not entirely sure about it, I have found there're a couple of .gitignore files under different source folders. Argh. bq. We can easier manage sub-project's ignore files by placing .gitignore file, could you let me know why you don't want it? Because it becomes a management nightmare when one is looking at the project in it's entirety. Is this file or dir ignored here? Well, first we have to find the appropriate .gitignore > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.2.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168004#comment-15168004 ] Hadoop QA commented on YARN-4734: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s {color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue} 0m 4s {color} | {color:blue} Shelldocs was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 23s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 8s {color} | {color:green} There were no new shellcheck issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 50 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 1s {color} | {color:red} The patch has 242 line(s) with tabs. {color} | | {color:red}-1{color} | {color:red} xml {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 ill-formed XML file(s). {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 25s {color} | {color:red} Patch generated 8 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 20m 14s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12790008/YARN-4734.2.patch | | JIRA Issue | YARN-4734 | | Optional Tests | asflicense mvnsite shellcheck shelldocs xml | | uname | Linux 3dfc29ba8ba4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c4d4df8 | | shellcheck | v0.4.3 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/10640/artifact/patchprocess/whitespace-eol.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/10640/artifact/patchprocess/whitespace-tabs.txt | | xml | https://builds.apache.org/job/PreCommit-YARN-Build/10640/artifact/patchprocess/xml.txt | | asflicense | https://builds.apache.org/job/PreCommit-YARN-Build/10640/artifact/patchprocess/patch-asflicense-problems.txt | | modules | C: hadoop-yarn-project/hadoop-yarn . U: . | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/10640/console | | Powered by | Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.2.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4359) Update LowCost agents logic to take advantage of YARN-4358
[ https://issues.apache.org/jira/browse/YARN-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishai Menache updated YARN-4359: Attachment: YARN-4359.2.patch > Update LowCost agents logic to take advantage of YARN-4358 > -- > > Key: YARN-4359 > URL: https://issues.apache.org/jira/browse/YARN-4359 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Ishai Menache > Attachments: YARN-4359.0.patch, YARN-4359.2.patch > > > Given the improvements of YARN-4358, the LowCost agent should be improved to > leverage this, and operate on RLESparseResourceAllocation (ideally leveraging > the improvements of YARN-3454 to compute avaialable resources) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4737) Use CSRF Filter in YARN
[ https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167973#comment-15167973 ] Jonathan Maron commented on YARN-4737: -- Thank you! > Use CSRF Filter in YARN > --- > > Key: YARN-4737 > URL: https://issues.apache.org/jira/browse/YARN-4737 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager, webapp >Reporter: Jonathan Maron >Assignee: Jonathan Maron > > A CSRF filter was added to hadoop common > (https://issues.apache.org/jira/browse/HADOOP-12691). The aim of this JIRA > is to come up with a mechanism to integrate this filter into the webapps for > which it is applicable (web apps that may establish an authenticated > identity). That includes the RM, NM, and mapreduce jobhistory web app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4737) Use CSRF Filter in YARN
[ https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167969#comment-15167969 ] Wangda Tan commented on YARN-4737: -- [~jmaron] added you to contributor and assigned ticket to you. > Use CSRF Filter in YARN > --- > > Key: YARN-4737 > URL: https://issues.apache.org/jira/browse/YARN-4737 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager, webapp >Reporter: Jonathan Maron >Assignee: Jonathan Maron > > A CSRF filter was added to hadoop common > (https://issues.apache.org/jira/browse/HADOOP-12691). The aim of this JIRA > is to come up with a mechanism to integrate this filter into the webapps for > which it is applicable (web apps that may establish an authenticated > identity). That includes the RM, NM, and mapreduce jobhistory web app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4737) Use CSRF Filter in YARN
[ https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4737: - Assignee: Jonathan Maron > Use CSRF Filter in YARN > --- > > Key: YARN-4737 > URL: https://issues.apache.org/jira/browse/YARN-4737 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager, webapp >Reporter: Jonathan Maron >Assignee: Jonathan Maron > > A CSRF filter was added to hadoop common > (https://issues.apache.org/jira/browse/HADOOP-12691). The aim of this JIRA > is to come up with a mechanism to integrate this filter into the webapps for > which it is applicable (web apps that may establish an authenticated > identity). That includes the RM, NM, and mapreduce jobhistory web app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4737) Use CSRF Filter in YARN
[ https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167965#comment-15167965 ] Jonathan Maron commented on YARN-4737: -- Could this be reassigned to me ([~jmaron]? > Use CSRF Filter in YARN > --- > > Key: YARN-4737 > URL: https://issues.apache.org/jira/browse/YARN-4737 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager, webapp >Reporter: Jonathan Maron > > A CSRF filter was added to hadoop common > (https://issues.apache.org/jira/browse/HADOOP-12691). The aim of this JIRA > is to come up with a mechanism to integrate this filter into the webapps for > which it is applicable (web apps that may establish an authenticated > identity). That includes the RM, NM, and mapreduce jobhistory web app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167943#comment-15167943 ] Wangda Tan commented on YARN-4734: -- Attached ver.2 patch. - Fixed ASF license issues - Removed unnecessary files (such as .gitkeep) [~aw], for your comments, I've addressed some of them, but some questions: bq. license and notice issues on dependencies Not sure what is it, could you explain? bq. Dockerfile support for dependencies My understanding is Dockerfile will be used by Yetus to run build, correct? Since now we haven't integrated web UI building to Hadoop's build system, it is not required for now. bq. dot-files in a subdir whose content should be in root (e.g., .gitignore) Not entirely sure about it, I have found there're a couple of .gitignore files under different source folders. We can easier manage sub-project's ignore files by placing .gitignore file, could you let me know why you don't want it? bq. dot-files that shouldn't be there at all Removed some of them, only left dot-files which are configuration files for package management. > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.2.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4737) Use CSRF Filter in YARN
Jonathan Maron created YARN-4737: Summary: Use CSRF Filter in YARN Key: YARN-4737 URL: https://issues.apache.org/jira/browse/YARN-4737 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager, webapp Reporter: Jonathan Maron A CSRF filter was added to hadoop common (https://issues.apache.org/jira/browse/HADOOP-12691). The aim of this JIRA is to come up with a mechanism to integrate this filter into the webapps for which it is applicable (web apps that may establish an authenticated identity). That includes the RM, NM, and mapreduce jobhistory web app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4734: - Attachment: YARN-4734.2.patch > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.2.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4731) Linux container executor fails to delete nmlocal folders
[ https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-4731: - Affects Version/s: 2.9.0 Target Version/s: 2.9.0 Priority: Blocker (was: Critical) Hadoop Flags: Reviewed Thanks for the report, Bibin, and the patch, Varun! This was triggered by YARN-4594. My apologies for missing it during that review, as I had forgotten about the symlinks in the container directory. +1 patch looks good to me. Pinging [~cmccabe] in case he has time to take a look as well. bq. Signal to container is throwing throwing exception in LCE I don't believe that is related since YARN-4594 didn't modify the signal path IIRC. I think we should address that as a separate JIRA. Is the signal issue something happening in 2.8 or earlier? > Linux container executor fails to delete nmlocal folders > > > Key: YARN-4731 > URL: https://issues.apache.org/jira/browse/YARN-4731 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Bibin A Chundatt >Assignee: Varun Vasudev >Priority: Blocker > Attachments: YARN-4731.001.patch > > > Enable LCE and CGroups > Submit a mapreduce job > {noformat} > 2016-02-24 18:56:46,889 INFO > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting > absolute path : > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01 > 2016-02-24 18:56:46,894 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: > Shell execution returned exit code: 255. Privileged Execution Operation > Output: > main : command provided 3 > main : run as user is dsperf > main : requested yarn user is dsperf > failed to rmdir job.jar: Not a directory > Error while deleting > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01: > 20 (Not a directory) > Full command array for failed execution: > [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, > dsperf, dsperf, 3, > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01] > 2016-02-24 18:56:46,894 ERROR > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: > DeleteAsUser for > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01 > returned with exit code: 255 > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=255: > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:199) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:569) > at > org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:265) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: ExitCodeException exitCode=255: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:927) > at org.apache.hadoop.util.Shell.run(Shell.java:838) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150) > ... 10 more > {noformat} > As a result nodemanager-local directory are not getting deleted for each > application > {noformat} > total 36 > drwxr-s--- 4 hdfs hadoop 4096 Feb 25 08:25 ./ > drwxr-s--- 7 hdfs hadoop 4096 Feb 25 08:25 ../ > -rw--- 1 hdfs hadoop 340 Feb 25 08:25 container_tokens
[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container
[ https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167889#comment-15167889 ] Bikas Saha commented on YARN-1040: -- Vinod, the plan you are suggesting has merits. But my initial impression is that reworking allocations and containers is a much bigger change than whats proposed earlier in this jira. Not only internally in YARN but also externally in terms of thinking about the whole larger flow of allocations and containers for users of YARN. The proposal discussed earlier is of much smaller scope and I believe sufficient to take us where we need to go. And it does not need reworking the RM related flow of allocations and containers. E.g. it may not be necessary for the RM to understand single use allocations vs multi-use vs concurrent use allocations. But for the RM level changes you are suggesting we may be on the path of convergence. At this point, the discussion is complex enough that we may want to gather interested people and do it as a group outside jira comments and then post it back. > De-link container life cycle from the process and add ability to execute > multiple processes in the same long-lived container > > > Key: YARN-1040 > URL: https://issues.apache.org/jira/browse/YARN-1040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Steve Loughran > > The AM should be able to exec >1 process in a container, rather than have the > NM automatically release the container when the single process exits. > This would let an AM restart a process on the same container repeatedly, > which for HBase would offer locality on a restarted region server. > We may also want the ability to exec multiple processes in parallel, so that > something could be run in the container while a long-lived process was > already running. This can be useful in monitoring and reconfiguring the > long-lived process, as well as shutting it down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4718) Rename variables in SchedulerNode to reduce ambiguity post YARN-1011
[ https://issues.apache.org/jira/browse/YARN-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-4718: -- Attachment: YARN-4718-v003.patch Fixed javadocs and checkstyles > Rename variables in SchedulerNode to reduce ambiguity post YARN-1011 > > > Key: YARN-4718 > URL: https://issues.apache.org/jira/browse/YARN-4718 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Inigo Goiri > Attachments: YARN-4718-v000.patch, YARN-4718-v001.patch, > YARN-4718-v002.patch, YARN-4718-v003.patch > > > As discussed in YARN-1011, we are planning to add a few more variables > regarding utilization to SchedulerNode. To ensure the new variables are > descriptive and don't lead to confusion, the existing variables could use > some renaming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4701) When task logs are not available, port 8041 is referenced instead of port 8042
[ https://issues.apache.org/jira/browse/YARN-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167858#comment-15167858 ] Hudson commented on YARN-4701: -- FAILURE: Integrated in Hadoop-trunk-Commit #9370 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9370/]) YARN-4701. When task logs are not available, port 8041 is referenced (rkanter: rev c4d4df8de09ee0c89ea8176bd8149900becd3c0c) * hadoop-yarn-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/webapp/dao/AMAttemptInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/log/AggregatedLogsBlock.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/webapp/TestHsWebServicesJobs.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogsBlock.java > When task logs are not available, port 8041 is referenced instead of port 8042 > -- > > Key: YARN-4701 > URL: https://issues.apache.org/jira/browse/YARN-4701 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: yarn4701.001.patch, yarn4701.002.patch, > yarn4701.003.patch, yarn4701.004.patch > > > Accessing logs for an task attempt in the workflow tool in Hue shows "Logs > not available for attempt_1433822010707_0001_m_00_0. Aggregation may not > be complete, Check back later or try the nodemanager at > quickstart.cloudera:8041" > If the user follows that link, he/she will get "It looks like you are making > an HTTP request to a Hadoop IPC port. This is not the correct port for the > web interface on this daemon." > We should update the message to use the correct HTTP port. We could also make > it more convenient by providing the application's specific page at NM as well > instead of just NM's main page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4701) When task logs are not available, port 8041 is referenced instead of port 8042
[ https://issues.apache.org/jira/browse/YARN-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167837#comment-15167837 ] Hadoop QA commented on YARN-4701: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 28s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 10s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 48s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 17s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 28s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 14s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 41s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 47s {color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 53s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 31s {color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 32s {color} | {color:red} Patch generated 14 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 103m 13s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || |
[jira] [Updated] (YARN-4723) NodesListManager$UnknownNodeId ClassCastException
[ https://issues.apache.org/jira/browse/YARN-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated YARN-4723: -- Attachment: YARN-4723-branch-2.7.001.patch Attaching patch for branch-2.7. > NodesListManager$UnknownNodeId ClassCastException > - > > Key: YARN-4723 > URL: https://issues.apache.org/jira/browse/YARN-4723 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.3 >Reporter: Jason Lowe >Assignee: Kuhu Shukla >Priority: Critical > Attachments: YARN-4723-branch-2.7.001.patch, YARN-4723.001.patch, > YARN-4723.002.patch > > > Saw the following in an RM log: > {noformat} > 2016-02-16 22:55:35,207 [IPC Server handler 5 on 8030] WARN ipc.Server: IPC > Server handler 5 on 8030, call > org.apache.hadoop.ipc.ProtobufRpcEngine$Server@6c403aff > java.lang.ClassCastException: > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId > cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:247) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:271) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:220) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.convertToProtoFormat(AllocateResponsePBImpl.java:712) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.access$500(AllocateResponsePBImpl.java:68) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.next(AllocateResponsePBImpl.java:658) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.next(AllocateResponsePBImpl.java:647) > at > com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336) > at > com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323) > at > org.apache.hadoop.yarn.proto.YarnServiceProtos$AllocateResponseProto$Builder.addAllUpdatedNodes(YarnServiceProtos.java:9335) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToBuilder(AllocateResponsePBImpl.java:144) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToProto(AllocateResponsePBImpl.java:175) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.getProto(AllocateResponsePBImpl.java:96) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:61) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server.call(Server.java:2267) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container
[ https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167794#comment-15167794 ] Vinod Kumar Vavilapalli commented on YARN-1040: --- Still catching up on some of your discussion, but quick comments on a few things that I care We should really call them instead as {{Allocation}} and {{Container}}. That's the nomenclature I used at YARN-4726. Till now, YARN has combined the notion of Allocation and Container, which is the main de-linking that we need to do here. Process has an OS level connotation, and doesn't work well with more things in the picture like process-trees / multiple-processes / Docker (YARN-2466). Taking this further, here's how the overall picture can look like the following *ResourceManager* - RM only does allocations in the scheduling path. ResourceManager does all scheduling based on AllocationRequests and tracks Allocations.. - RM receives AllocationRequests and returns fulfilled Allocations (and AllocationTokens) to AMs. *Applications* - AM can in turn use the Allocations (and AllocationTokens) to launch multiple Containers on the NM. -- Simple case: AM only launches containers one-after-another. It's up to the app to do this. -- General case: AM launches multiple containers at the same time. This is essentially container-groups - we should keep this option open. - AMs can specify *single-use* AllocationRequests, at which point RM can simply return Containers and Container-Tokens (today's code-path). - Each Container exits when the process-tree / linux-container exits. - Each Container has an Identifier. -- For single-use allocation-requests, RM generates ContainerIDs -- For multi-use allocation-requests, apps could optionally specify a container-name that is scoped under the allocation. NM always returns a (generated or app-specified) ContainerID based off the allocation-ID. Essentially, allocationID + containerID is unique *NodeManagers* - NodeManager also understands incoming Allocations and ties them to Container groups: it deals with Allocation activation/deactivation and Container start/stop. but does -- the following *decoupled from both allocations and containers*: localizations / re-localizations. This means local-resources should now have more scopes: container, allocation, application etc. -- *per allocation*: enforcement of resource-limits -- all of the following *per container*: (a) process/OS-container activation / deactivation, (b) process/OS-container auto-restart (YARN-4725) log-aggregation > De-link container life cycle from the process and add ability to execute > multiple processes in the same long-lived container > > > Key: YARN-1040 > URL: https://issues.apache.org/jira/browse/YARN-1040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Steve Loughran > > The AM should be able to exec >1 process in a container, rather than have the > NM automatically release the container when the single process exits. > This would let an AM restart a process on the same container repeatedly, > which for HBase would offer locality on a restarted region server. > We may also want the ability to exec multiple processes in parallel, so that > something could be run in the container while a long-lived process was > already running. This can be useful in monitoring and reconfiguring the > long-lived process, as well as shutting it down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4723) NodesListManager$UnknownNodeId ClassCastException
[ https://issues.apache.org/jira/browse/YARN-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated YARN-4723: -- Attachment: YARN-4723.002.patch Updated patch with checkstyle and test comment rectified. > NodesListManager$UnknownNodeId ClassCastException > - > > Key: YARN-4723 > URL: https://issues.apache.org/jira/browse/YARN-4723 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.3 >Reporter: Jason Lowe >Assignee: Kuhu Shukla >Priority: Critical > Attachments: YARN-4723.001.patch, YARN-4723.002.patch > > > Saw the following in an RM log: > {noformat} > 2016-02-16 22:55:35,207 [IPC Server handler 5 on 8030] WARN ipc.Server: IPC > Server handler 5 on 8030, call > org.apache.hadoop.ipc.ProtobufRpcEngine$Server@6c403aff > java.lang.ClassCastException: > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId > cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:247) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:271) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:220) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.convertToProtoFormat(AllocateResponsePBImpl.java:712) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.access$500(AllocateResponsePBImpl.java:68) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.next(AllocateResponsePBImpl.java:658) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.next(AllocateResponsePBImpl.java:647) > at > com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336) > at > com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323) > at > org.apache.hadoop.yarn.proto.YarnServiceProtos$AllocateResponseProto$Builder.addAllUpdatedNodes(YarnServiceProtos.java:9335) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToBuilder(AllocateResponsePBImpl.java:144) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToProto(AllocateResponsePBImpl.java:175) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.getProto(AllocateResponsePBImpl.java:96) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:61) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server.call(Server.java:2267) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4723) NodesListManager$UnknownNodeId ClassCastException
[ https://issues.apache.org/jira/browse/YARN-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167784#comment-15167784 ] Jason Lowe commented on YARN-4723: -- Thanks, Kuhu! Patch looks pretty good, just a couple of nits: - In the test NODE_USABLE should be NODE_UNUSABLE for the assert message - Please look into the checkstyle issues. > NodesListManager$UnknownNodeId ClassCastException > - > > Key: YARN-4723 > URL: https://issues.apache.org/jira/browse/YARN-4723 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.3 >Reporter: Jason Lowe >Assignee: Kuhu Shukla >Priority: Critical > Attachments: YARN-4723.001.patch > > > Saw the following in an RM log: > {noformat} > 2016-02-16 22:55:35,207 [IPC Server handler 5 on 8030] WARN ipc.Server: IPC > Server handler 5 on 8030, call > org.apache.hadoop.ipc.ProtobufRpcEngine$Server@6c403aff > java.lang.ClassCastException: > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId > cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:247) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:271) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:220) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.convertToProtoFormat(AllocateResponsePBImpl.java:712) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.access$500(AllocateResponsePBImpl.java:68) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.next(AllocateResponsePBImpl.java:658) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.next(AllocateResponsePBImpl.java:647) > at > com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336) > at > com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323) > at > org.apache.hadoop.yarn.proto.YarnServiceProtos$AllocateResponseProto$Builder.addAllUpdatedNodes(YarnServiceProtos.java:9335) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToBuilder(AllocateResponsePBImpl.java:144) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToProto(AllocateResponsePBImpl.java:175) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.getProto(AllocateResponsePBImpl.java:96) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:61) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server.call(Server.java:2267) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4711) NPE in NMTimelinePublisher
[ https://issues.apache.org/jira/browse/YARN-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167776#comment-15167776 ] Naganarasimha G R commented on YARN-4711: - As per initial analysis i am able to reproduce these issues when the containers run for little longer duration (due to which NM can report more container resource usages). So possible cause would be that YARN-3367 is making all the timeline entity Publishes in a single threaded and in NM all are published as Sync events. Tried changing container resource usages as asyncs events and was able to avoid these exceptions. Will test more and confirm. > NPE in NMTimelinePublisher > -- > > Key: YARN-4711 > URL: https://issues.apache.org/jira/browse/YARN-4711 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Critical > > While testing the latest 2928 branch came across a NPE which is shutting down > the NM > {code} > 2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > Seems to be a race condition ... > Few other NPE's faced : > {code} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > * Also there is possibility of NPE in TimelineEntity.toString() when real is > not null -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167771#comment-15167771 ] Allen Wittenauer commented on YARN-4734: This is so not ready to be merged. A very strong -1 from me. * ASF license issues on files * license and notice issues on dependencies * Dockerfile support for dependencies * dot-files in a subdir whose content should be in root (e.g., .gitignore) * dot-files that shouldn't be there at all (e.g., travis-ci support) * files that will result in packaging cruft (e.g., all those .gitkeep files) > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4711) NPE in NMTimelinePublisher
[ https://issues.apache.org/jira/browse/YARN-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4711: Description: While testing the latest 2928 branch came across a NPE which is shutting down the NM {code} 2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306) at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) at java.lang.Thread.run(Thread.java:745) {code} Seems to be a race condition ... Few other NPE's faced : {code} java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213) at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192) at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63) at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289) at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) at java.lang.Thread.run(Thread.java:745) {code} * Also there is possibility of NPE in TimelineEntity.toString() when real is not null was: While testing the latest 2928 branch came across a NPE which is shutting down the NM {code} 2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306) at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) at java.lang.Thread.run(Thread.java:745) {code} Seems to be a race condition ... > NPE in NMTimelinePublisher > -- > > Key: YARN-4711 > URL: https://issues.apache.org/jira/browse/YARN-4711 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Critical > > While testing the latest 2928 branch came across a NPE which is shutting down > the NM > {code} > 2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > Seems to be a race condition ... > Few other NPE's faced : > {code} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280) > at >
[jira] [Assigned] (YARN-4736) Issues with HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C reassigned YARN-4736: Assignee: Vrushali C > Issues with HBaseTimelineWriterImpl > --- > > Key: YARN-4736 > URL: https://issues.apache.org/jira/browse/YARN-4736 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Naganarasimha G R >Assignee: Vrushali C >Priority: Critical > Attachments: hbaseException.log, threaddump.log > > > Faced some issues while running ATSv2 in single node Hadoop cluster and in > the same node had launched Hbase with embedded zookeeper. > # Due to some NPE issues i was able to see NM was trying to shutdown, but the > NM daemon process was not completed due to the locks. > # Got some exception related to Hbase after application finished execution > successfully. > will attach logs and the trace for the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted
[ https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167741#comment-15167741 ] Naganarasimha G R commented on YARN-4700: - Any suggestions for the above approach [~gtCarrera9]/[~sjlee0] ? > ATS storage has one extra record each time the RM got restarted > --- > > Key: YARN-4700 > URL: https://issues.apache.org/jira/browse/YARN-4700 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Li Lu >Assignee: Naganarasimha G R > > When testing the new web UI for ATS v2, I noticed that we're creating one > extra record for each finished application (but still hold in the RM state > store) each time the RM got restarted. It's quite possible that we add the > cluster start timestamp into the default cluster id, thus each time we're > creating a new record for one application (cluster id is a part of the row > key). We need to fix this behavior, probably by having a better default > cluster id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4736) Issues with HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4736: Attachment: hbaseException.log threaddump.log For Issue 1 : threaddump.log For Issue 2 : hbaseException.log cc/ [~vrushalic]. Not sure its a setup issue, please check once ... > Issues with HBaseTimelineWriterImpl > --- > > Key: YARN-4736 > URL: https://issues.apache.org/jira/browse/YARN-4736 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Naganarasimha G R >Priority: Critical > Attachments: hbaseException.log, threaddump.log > > > Faced some issues while running ATSv2 in single node Hadoop cluster and in > the same node had launched Hbase with embedded zookeeper. > # Due to some NPE issues i was able to see NM was trying to shutdown, but the > NM daemon process was not completed due to the locks. > # Got some exception related to Hbase after application finished execution > successfully. > will attach logs and the trace for the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container
[ https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167738#comment-15167738 ] Bikas Saha commented on YARN-1040: -- My guess is that YARN-4725 may be redundant after we do this work because then we would have exposed primitives to apps to make that happen. The arguments for YARN not doing it by itself would be the same. If it can be done easily by the app and is very likely app dependent without one-size-fits-all then let the app do it. Coming back to this jira. Yes, lets please track any first-class support of the notion of upgrades separately which can be done as a follow up. Perhaps we can put the design in a document and look at the next level of details. We can send email to the dev list after adding a more detailed document to this jira. Then, based on +ve feedback, we could go ahead with jiras/code. The devil is in the details :) This would be a significant change and we could use more eyes for reviews. For startProcess identifier, it may be useful for the app to provide the identifier in startProcess and then use it later to refer to the process. E.g. stopProcess. vs YARN trying to come up with identifiers. This may make the apps life easier because it could use meaningful terms based on its own logic. We can discuss such details in the design document. > De-link container life cycle from the process and add ability to execute > multiple processes in the same long-lived container > > > Key: YARN-1040 > URL: https://issues.apache.org/jira/browse/YARN-1040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Steve Loughran > > The AM should be able to exec >1 process in a container, rather than have the > NM automatically release the container when the single process exits. > This would let an AM restart a process on the same container repeatedly, > which for HBase would offer locality on a restarted region server. > We may also want the ability to exec multiple processes in parallel, so that > something could be run in the container while a long-lived process was > already running. This can be useful in monitoring and reconfiguring the > long-lived process, as well as shutting it down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4736) Issues with HBaseTimelineWriterImpl
Naganarasimha G R created YARN-4736: --- Summary: Issues with HBaseTimelineWriterImpl Key: YARN-4736 URL: https://issues.apache.org/jira/browse/YARN-4736 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Naganarasimha G R Priority: Critical Faced some issues while running ATSv2 in single node Hadoop cluster and in the same node had launched Hbase with embedded zookeeper. # Due to some NPE issues i was able to see NM was trying to shutdown, but the NM daemon process was not completed due to the locks. # Got some exception related to Hbase after application finished execution successfully. will attach logs and the trace for the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167722#comment-15167722 ] Wangda Tan commented on YARN-4734: -- Thanks [~djp], [~aw] mentioned the same thing, will update the patch soon. > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4720) Skip unnecessary NN operations in log aggregation
[ https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167718#comment-15167718 ] Ming Ma commented on YARN-4720: --- Even though all tests passed by jenkins, but if you run the new {{testSkipUnnecessaryNNOperations}} individually, it failed. Based on how the test is written, {{this.context.getLogAggregationStatusForApps().size()}} should be 4. The reason it passes if all tests run together could be due to the fact that second {{doLogAggregationOutOfBand()}} and the {{LogHandlerAppFinishedEvent}} right after that only cause one notification. After fixing the size check to 4, additional sleep after the second {{doLogAggregationOutOfBand()}} fixes the issue, or you can wait and verify {{getLogAggregationStatusForApps.size()}} after each {{doLogAggregationOutOfBand()}} call. In addition, it might be useful to add another test case without long running service but with FailedOrKilledContainerLogAggregationPolicy; For that case, no log aggregation should happen. That will help to verify the other scenario. What do you think, [~hex108]? > Skip unnecessary NN operations in log aggregation > - > > Key: YARN-4720 > URL: https://issues.apache.org/jira/browse/YARN-4720 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Jun Gong > Attachments: YARN-4720.01.patch, YARN-4720.02.patch, > YARN-4720.03.patch, YARN-4720.04.patch > > > Log aggregation service could have unnecessary NN operations in the following > scenarios: > * No new local log has been created since the last upload for the long > running service scenario. > * NM uses {{ContainerLogAggregationPolicy}} that skips log aggregation for > certain containers. > In the following code snippet, even though {{pendingContainerInThisCycle}} is > empty, it still creates the writer and then removes the file later. Thus it > introduces unnecessary create/getfileinfo/delete NN calls when NM doesn't > aggregate logs for an app. > > {noformat} > AppLogAggregatorImpl.java > .. > writer = > new LogWriter(this.conf, this.remoteNodeTmpLogFileForApp, > this.userUgi); > .. > for (ContainerId container : pendingContainerInThisCycle) { > .. > } > .. > if (remoteFS.exists(remoteNodeTmpLogFileForApp)) { > if (rename) { > remoteFS.rename(remoteNodeTmpLogFileForApp, renamedPath); > } else { > remoteFS.delete(remoteNodeTmpLogFileForApp, false); > } > } > .. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3863) Support complex filters in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167716#comment-15167716 ] Varun Saxena commented on YARN-3863: Sorry, added the comment by mistake. > Support complex filters in TimelineReader > - > > Key: YARN-3863 > URL: https://issues.apache.org/jira/browse/YARN-3863 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3863-YARN-2928.v2.01.patch, > YARN-3863-YARN-2928.v2.02.patch, YARN-3863-feature-YARN-2928.wip.003.patch, > YARN-3863-feature-YARN-2928.wip.01.patch, > YARN-3863-feature-YARN-2928.wip.02.patch, > YARN-3863-feature-YARN-2928.wip.04.patch, > YARN-3863-feature-YARN-2928.wip.05.patch > > > Currently filters in timeline reader will return an entity only if all the > filter conditions hold true i.e. only AND operation is supported. We can > support OR operation for the filters as well. Additionally as primary backend > implementation is HBase, we can design our filters in a manner, where they > closely resemble HBase Filters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3863) Support complex filters in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167712#comment-15167712 ] Varun Saxena commented on YARN-3863: To aid in review, I will jot down what has been done in this JIRA. # The intention is to convert filters which were represented as maps or sets to TimelineFilterList which would help in complex filters being supported. i.e. Let me take example of config filters in the format {{cfg1=value1, cfg2=value2, cfg3=value3, cfg4=value4}} which means all the key value pairs should be matched for an entity. With work in this JIRA,we can support complex filters such as {{(cfg1=value1 OR cfg2=value2) AND (cfg3 !=value3 AND cfg4!=value4)}}. > Support complex filters in TimelineReader > - > > Key: YARN-3863 > URL: https://issues.apache.org/jira/browse/YARN-3863 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3863-YARN-2928.v2.01.patch, > YARN-3863-YARN-2928.v2.02.patch, YARN-3863-feature-YARN-2928.wip.003.patch, > YARN-3863-feature-YARN-2928.wip.01.patch, > YARN-3863-feature-YARN-2928.wip.02.patch, > YARN-3863-feature-YARN-2928.wip.04.patch, > YARN-3863-feature-YARN-2928.wip.05.patch > > > Currently filters in timeline reader will return an entity only if all the > filter conditions hold true i.e. only AND operation is supported. We can > support OR operation for the filters as well. Additionally as primary backend > implementation is HBase, we can design our filters in a manner, where they > closely resemble HBase Filters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4718) Rename variables in SchedulerNode to reduce ambiguity post YARN-1011
[ https://issues.apache.org/jira/browse/YARN-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167700#comment-15167700 ] Hadoop QA commented on YARN-4718: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 17 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 22s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 15 new + 492 unchanged - 16 fixed = 507 total (was 508) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 19s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 2m 28s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95 with JDK v1.7.0_95 generated 2 new + 2 unchanged - 0 fixed = 4 total (was 2) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 12s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 51s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 150m 50s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_72 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | |
[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167685#comment-15167685 ] Junping Du commented on YARN-4734: -- Hi [~leftnoteasy], it sounds like the code doesn't include Apache license header. I think we should fix this. > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container
[ https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167674#comment-15167674 ] Arun Suresh commented on YARN-1040: --- Thanks for clarifying [~bikassaha] I propose we break this down into sub-jiras : # New APIs specific to delinking container life-cycle from the process : This will include 4 of new APIs specified above (excluding localization) but will assume single process (and this the startProcess does not need to return a processId) # Add support for clubbing APIs into a single RPC ** Might have to think a bit about validating the order and multiplicity of the API calls in each command (which I expect might be different for single process / multiple processes) # Add support for localize API # Add support for Multiple processes ** A processId will be returned for a startProcess. Might have to think thru this further. for eg. how does this integrate with YARN-4725 For the purpose of Application Upgrades (for which this JIRA is marked as a sub-task of... also why im calling it out specifically) : Add support for Container Upgrades # Expose a canned NMCommand that has the list of APIs to upgrade based on some policy If folks are fine with this, I will ahead and open JIRAs and link this issue to each of the above JIRAs (Since I don't think I can create subtasks for this JIRA) so that we can start work on the same.. > De-link container life cycle from the process and add ability to execute > multiple processes in the same long-lived container > > > Key: YARN-1040 > URL: https://issues.apache.org/jira/browse/YARN-1040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Steve Loughran > > The AM should be able to exec >1 process in a container, rather than have the > NM automatically release the container when the single process exits. > This would let an AM restart a process on the same container repeatedly, > which for HBase would offer locality on a restarted region server. > We may also want the ability to exec multiple processes in parallel, so that > something could be run in the container while a long-lived process was > already running. This can be useful in monitoring and reconfiguring the > long-lived process, as well as shutting it down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4701) When task logs are not available, port 8041 is referenced instead of port 8042
[ https://issues.apache.org/jira/browse/YARN-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167598#comment-15167598 ] Robert Kanter commented on YARN-4701: - Kicked off Jenkins because it doesn't seem to have noticed this JIRA. > When task logs are not available, port 8041 is referenced instead of port 8042 > -- > > Key: YARN-4701 > URL: https://issues.apache.org/jira/browse/YARN-4701 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: yarn4701.001.patch, yarn4701.002.patch, > yarn4701.003.patch, yarn4701.004.patch > > > Accessing logs for an task attempt in the workflow tool in Hue shows "Logs > not available for attempt_1433822010707_0001_m_00_0. Aggregation may not > be complete, Check back later or try the nodemanager at > quickstart.cloudera:8041" > If the user follows that link, he/she will get "It looks like you are making > an HTTP request to a Hadoop IPC port. This is not the correct port for the > web interface on this daemon." > We should update the message to use the correct HTTP port. We could also make > it more convenient by providing the application's specific page at NM as well > instead of just NM's main page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3863) Support complex filters in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167556#comment-15167556 ] Varun Saxena commented on YARN-3863: bq. l.532: This is an interesting point. Should we categorically disallow any multi-entity reads without a filter? Is it an obvious requirement? I understand we already set some default values (e.g. created time, etc.) so this might be a moot point, but do we need to check for it when some defaults are set anyway? It is more like that we expect a filter list for filters from REST layer, even if empty. Otherwise I will have to introduce null checks everywhere. We can do that as well. I agree this might indicate that filters are necessary for multi entity reads. That was not the intention though. > Support complex filters in TimelineReader > - > > Key: YARN-3863 > URL: https://issues.apache.org/jira/browse/YARN-3863 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3863-YARN-2928.v2.01.patch, > YARN-3863-YARN-2928.v2.02.patch, YARN-3863-feature-YARN-2928.wip.003.patch, > YARN-3863-feature-YARN-2928.wip.01.patch, > YARN-3863-feature-YARN-2928.wip.02.patch, > YARN-3863-feature-YARN-2928.wip.04.patch, > YARN-3863-feature-YARN-2928.wip.05.patch > > > Currently filters in timeline reader will return an entity only if all the > filter conditions hold true i.e. only AND operation is supported. We can > support OR operation for the filters as well. Additionally as primary backend > implementation is HBase, we can design our filters in a manner, where they > closely resemble HBase Filters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4731) Linux container executor fails to delete nmlocal folders
[ https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167520#comment-15167520 ] Hadoop QA commented on YARN-4731: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 8m 58s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdk1.8.0_72 with JDK v1.8.0_72 generated 1 new + 2 unchanged - 1 fixed = 3 total (was 3) {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 39s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 13s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 28m 23s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12789913/YARN-4731.001.patch | | JIRA Issue | YARN-4731 | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux a92ee9de4749 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6979cbf | | Default Java | 1.7.0_95 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 | | cc | hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdk1.8.0_72: https://builds.apache.org/job/PreCommit-YARN-Build/10635/artifact/patchprocess/diff-compile-cc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdk1.8.0_72.txt | | JDK v1.7.0_95 Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/10635/testReport/ | | modules | C:
[jira] [Commented] (YARN-4735) Remove stale LogAggregationReport from NM's context
[ https://issues.apache.org/jira/browse/YARN-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167456#comment-15167456 ] Ming Ma commented on YARN-4735: --- I just read the code again. {{NodeStatusUpdaterImpl}} 's {{getLogAggregationReportsForApps}} actually calls {{lastestLogAggregationStatus.poll()}} and add it to another newly created list to be sent to RM. So this isn't an issue? > Remove stale LogAggregationReport from NM's context > --- > > Key: YARN-4735 > URL: https://issues.apache.org/jira/browse/YARN-4735 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jun Gong >Assignee: Jun Gong > > {quote} > All LogAggregationReport(current and previous) are only added to > *context.getLogAggregationStatusForApps*, and never removed. > So for long running service, the LogAggregationReport list NM sends to RM > will grow over time. > {quote} > Per discussion in YARN-4720, we need remove stale LogAggregationReport from > NM's context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4720) Skip unnecessary NN operations in log aggregation
[ https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167435#comment-15167435 ] Ming Ma commented on YARN-4720: --- +1 on the latest patch. I will wait until tomorrow to commit it in case others have more input. > Skip unnecessary NN operations in log aggregation > - > > Key: YARN-4720 > URL: https://issues.apache.org/jira/browse/YARN-4720 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Jun Gong > Attachments: YARN-4720.01.patch, YARN-4720.02.patch, > YARN-4720.03.patch, YARN-4720.04.patch > > > Log aggregation service could have unnecessary NN operations in the following > scenarios: > * No new local log has been created since the last upload for the long > running service scenario. > * NM uses {{ContainerLogAggregationPolicy}} that skips log aggregation for > certain containers. > In the following code snippet, even though {{pendingContainerInThisCycle}} is > empty, it still creates the writer and then removes the file later. Thus it > introduces unnecessary create/getfileinfo/delete NN calls when NM doesn't > aggregate logs for an app. > > {noformat} > AppLogAggregatorImpl.java > .. > writer = > new LogWriter(this.conf, this.remoteNodeTmpLogFileForApp, > this.userUgi); > .. > for (ContainerId container : pendingContainerInThisCycle) { > .. > } > .. > if (remoteFS.exists(remoteNodeTmpLogFileForApp)) { > if (rename) { > remoteFS.rename(remoteNodeTmpLogFileForApp, renamedPath); > } else { > remoteFS.delete(remoteNodeTmpLogFileForApp, false); > } > } > .. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1489) [Umbrella] Work-preserving ApplicationMaster restart
[ https://issues.apache.org/jira/browse/YARN-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167132#comment-15167132 ] Junping Du commented on YARN-1489: -- Another big problem is we don't actually notify a restarted AM the "finished-in-the-interim" containers. In YARN-1041, RM only report the running containers to AM new attempt, but if container get finished during this time, new AM attempt has no ways to know it - that make different behaviors for AMs - MR will rely on job history log to recover tasks, while Distributed Shell will launch these (finished) containers again. I think we should enhance this. Will file a separated JIRA if we don't have it yet. > [Umbrella] Work-preserving ApplicationMaster restart > > > Key: YARN-1489 > URL: https://issues.apache.org/jira/browse/YARN-1489 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: Work preserving AM restart.pdf > > > Today if AMs go down, > - RM kills all the containers of that ApplicationAttempt > - New ApplicationAttempt doesn't know where the previous containers are > running > - Old running containers don't know where the new AM is running. > We need to fix this to enable work-preserving AM restart. The later two > potentially can be done at the app level, but it is good to have a common > solution for all apps where-ever possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4624) NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI
[ https://issues.apache.org/jira/browse/YARN-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167113#comment-15167113 ] Brahma Reddy Battula commented on YARN-4624: As doing null check at {{capacities.getMaxAMLimitPercentage() == null}} changed to wrapper which was not used anywhere. and couldn't be checked null in {{PartitionQueueCapacitiesInfo#getMaxAMLimitPercentage}} (REST API call will fail). > NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI > --- > > Key: YARN-4624 > URL: https://issues.apache.org/jira/browse/YARN-4624 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: SchedulerUIWithOutLabelMapping.png, YARN-2674-002.patch, > YARN-4624-003.patch, YARN-4624.patch > > > Scenario: > === > Configure nodelables and add to cluster > Start the cluster > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.PartitionQueueCapacitiesInfo.getMaxAMLimitPercentage(PartitionQueueCapacitiesInfo.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:105) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:94) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:293) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:447) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4731) Linux container executor fails to delete nmlocal folders
[ https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167106#comment-15167106 ] Bibin A Chundatt commented on YARN-4731: [~vvasudev] I have check the patch attached *Issue 1* # Container localization files are getting deleted properly *Issues that still exists* # Signal to container is throwing throwing exception in LCE {noformat} 2016-02-25 13:08:20,442 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime: Using container runtime: DefaultLinuxContainerRuntime 2016-02-25 13:08:20,447 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: Shell execution returned exit code: 9. Privileged Execution Operation Output: main : command provided 2 main : run as user is yarn main : requested yarn user is yarn Full command array for failed execution: [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, yarn, yarn, 2, 23524, 9] 2016-02-25 13:08:20,447 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime: Signal container failed. Exception: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=9: at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513) at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor$DelayedProcessKiller.run(ContainerExecutor.java:532) Caused by: ExitCodeException exitCode=9: at org.apache.hadoop.util.Shell.runCommand(Shell.java:927) at org.apache.hadoop.util.Shell.run(Shell.java:838) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150) ... 4 more {noformat} # Container initalization error was thrown {noformat} 2016-02-25 13:08:20,183 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime: Using container runtime: DefaultLinuxContainerRuntime 2016-02-25 13:08:20,191 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: Shell execution returned exit code: 143. Privileged Execution Operation Output: main : command provided 1 main : run as user is yarn main : requested yarn user is yarn Getting exit code file... Creating script paths... Writing pid file... Writing to tmp file /opt/bibin/dsperf/HAINSTALL/nmlocal/nmPrivate/application_1456385661741_0001/container_1456385661741_0001_01_03/container_1456385661741_0001_01_03.pid.tmp Writing to cgroup task files... Creating local dirs... Launching container... Getting exit code file... Creating script paths... Full command array for failed execution: [nice, -n, 0, /opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, yarn, yarn, 1, application_1456385661741_0001, container_1456385661741_0001_01_03, /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/yarn/appcache/application_1456385661741_0001/container_1456385661741_0001_01_03, /opt/bibin/dsperf/HAINSTALL/nmlocal/nmPrivate/application_1456385661741_0001/container_1456385661741_0001_01_03/launch_container.sh, /opt/bibin/dsperf/HAINSTALL/nmlocal/nmPrivate/application_1456385661741_0001/container_1456385661741_0001_01_03/container_1456385661741_0001_01_03.tokens, /opt/bibin/dsperf/HAINSTALL/nmlocal/nmPrivate/application_1456385661741_0001/container_1456385661741_0001_01_03/container_1456385661741_0001_01_03.pid, /opt/bibin/dsperf/HAINSTALL/nmlocal, /opt/bibin/dsperf/HAINSTALL/nmlog, cgroups=/cgroups/cpu/hadoop-yarn/container_1456385661741_0001_01_03/tasks] 2016-02-25 13:08:20,191 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime: Launch container failed. Exception: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=143: at
[jira] [Commented] (YARN-4731) Linux container executor fails to delete nmlocal folders
[ https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167078#comment-15167078 ] Bibin A Chundatt commented on YARN-4731: [~vvasudev] Thank you for attaching patch for the same. Will try patch attached > Linux container executor fails to delete nmlocal folders > > > Key: YARN-4731 > URL: https://issues.apache.org/jira/browse/YARN-4731 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Varun Vasudev >Priority: Critical > Attachments: YARN-4731.001.patch > > > Enable LCE and CGroups > Submit a mapreduce job > {noformat} > 2016-02-24 18:56:46,889 INFO > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting > absolute path : > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01 > 2016-02-24 18:56:46,894 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: > Shell execution returned exit code: 255. Privileged Execution Operation > Output: > main : command provided 3 > main : run as user is dsperf > main : requested yarn user is dsperf > failed to rmdir job.jar: Not a directory > Error while deleting > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01: > 20 (Not a directory) > Full command array for failed execution: > [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, > dsperf, dsperf, 3, > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01] > 2016-02-24 18:56:46,894 ERROR > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: > DeleteAsUser for > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01 > returned with exit code: 255 > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=255: > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:199) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:569) > at > org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:265) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: ExitCodeException exitCode=255: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:927) > at org.apache.hadoop.util.Shell.run(Shell.java:838) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150) > ... 10 more > {noformat} > As a result nodemanager-local directory are not getting deleted for each > application > {noformat} > total 36 > drwxr-s--- 4 hdfs hadoop 4096 Feb 25 08:25 ./ > drwxr-s--- 7 hdfs hadoop 4096 Feb 25 08:25 ../ > -rw--- 1 hdfs hadoop 340 Feb 25 08:25 container_tokens > lrwxrwxrwx 1 hdfs hadoop 111 Feb 25 08:25 job.jar -> > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/hdfs/appcache/application_1456364845478_0004/filecache/11/job.jar/ > lrwxrwxrwx 1 hdfs hadoop 111 Feb 25 08:25 job.xml -> > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/hdfs/appcache/application_1456364845478_0004/filecache/13/job.xml* > drwxr-s--- 2 hdfs hadoop 4096 Feb 25 08:25 jobSubmitDir/ > -rwx-- 1 hdfs hadoop 5348 Feb 25 08:25 launch_container.sh* > drwxr-s--- 2 hdfs hadoop 4096 Feb 25 08:25 tmp/ > {noformat} -- This message was sent by Atlassian JIRA
[jira] [Updated] (YARN-4731) Linux container executor fails to delete nmlocal folders
[ https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-4731: Attachment: YARN-4731.001.patch Root cause here is that we are using fstat on an open fd. The open call follows the symlink and we stat the directory pointed to by the symlink instead of the actual symlink. As a result rmdir fails because it doesn't delete symlinks. The other problem with following symlinks in our case is that we end up deleting public resources because we use symlinks in the container work dir to point to the actual resources. I've attached a patch to not follow symlinks and just call unlink on the symlink itself. [~jlowe] - can you please take a look? > Linux container executor fails to delete nmlocal folders > > > Key: YARN-4731 > URL: https://issues.apache.org/jira/browse/YARN-4731 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Varun Vasudev >Priority: Critical > Attachments: YARN-4731.001.patch > > > Enable LCE and CGroups > Submit a mapreduce job > {noformat} > 2016-02-24 18:56:46,889 INFO > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting > absolute path : > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01 > 2016-02-24 18:56:46,894 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: > Shell execution returned exit code: 255. Privileged Execution Operation > Output: > main : command provided 3 > main : run as user is dsperf > main : requested yarn user is dsperf > failed to rmdir job.jar: Not a directory > Error while deleting > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01: > 20 (Not a directory) > Full command array for failed execution: > [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, > dsperf, dsperf, 3, > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01] > 2016-02-24 18:56:46,894 ERROR > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: > DeleteAsUser for > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01 > returned with exit code: 255 > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=255: > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:199) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:569) > at > org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:265) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: ExitCodeException exitCode=255: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:927) > at org.apache.hadoop.util.Shell.run(Shell.java:838) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150) > ... 10 more > {noformat} > As a result nodemanager-local directory are not getting deleted for each > application > {noformat} > total 36 > drwxr-s--- 4 hdfs hadoop 4096 Feb 25 08:25 ./ > drwxr-s--- 7 hdfs hadoop 4096 Feb 25 08:25 ../ > -rw--- 1 hdfs hadoop 340 Feb 25 08:25 container_tokens > lrwxrwxrwx 1 hdfs hadoop 111 Feb 25 08:25 job.jar -> >
[jira] [Assigned] (YARN-4731) Linux container executor fails to delete nmlocal folders
[ https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev reassigned YARN-4731: --- Assignee: Varun Vasudev > Linux container executor fails to delete nmlocal folders > > > Key: YARN-4731 > URL: https://issues.apache.org/jira/browse/YARN-4731 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Varun Vasudev >Priority: Critical > > Enable LCE and CGroups > Submit a mapreduce job > {noformat} > 2016-02-24 18:56:46,889 INFO > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting > absolute path : > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01 > 2016-02-24 18:56:46,894 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: > Shell execution returned exit code: 255. Privileged Execution Operation > Output: > main : command provided 3 > main : run as user is dsperf > main : requested yarn user is dsperf > failed to rmdir job.jar: Not a directory > Error while deleting > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01: > 20 (Not a directory) > Full command array for failed execution: > [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, > dsperf, dsperf, 3, > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01] > 2016-02-24 18:56:46,894 ERROR > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: > DeleteAsUser for > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01 > returned with exit code: 255 > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=255: > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:199) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:569) > at > org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:265) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: ExitCodeException exitCode=255: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:927) > at org.apache.hadoop.util.Shell.run(Shell.java:838) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150) > ... 10 more > {noformat} > As a result nodemanager-local directory are not getting deleted for each > application > {noformat} > total 36 > drwxr-s--- 4 hdfs hadoop 4096 Feb 25 08:25 ./ > drwxr-s--- 7 hdfs hadoop 4096 Feb 25 08:25 ../ > -rw--- 1 hdfs hadoop 340 Feb 25 08:25 container_tokens > lrwxrwxrwx 1 hdfs hadoop 111 Feb 25 08:25 job.jar -> > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/hdfs/appcache/application_1456364845478_0004/filecache/11/job.jar/ > lrwxrwxrwx 1 hdfs hadoop 111 Feb 25 08:25 job.xml -> > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/hdfs/appcache/application_1456364845478_0004/filecache/13/job.xml* > drwxr-s--- 2 hdfs hadoop 4096 Feb 25 08:25 jobSubmitDir/ > -rwx-- 1 hdfs hadoop 5348 Feb 25 08:25 launch_container.sh* > drwxr-s--- 2 hdfs hadoop 4096 Feb 25 08:25 tmp/ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4731) Linux container executor fails to delete nmlocal folders
[ https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166956#comment-15166956 ] Bibin A Chundatt commented on YARN-4731: *On Signal operation for container* also the below exception is thrown {noformat} 2016-02-25 10:49:57,704 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime: Using container runtime: DefaultLinuxContainerRuntime 2016-02-25 10:49:57,709 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: Shell execution returned exit code: 9. Privileged Execution Operation Output: main : command provided 2 main : run as user is hdfs main : requested yarn user is hdfs Full command array for failed execution: [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, hdfs, hdfs, 2, 4850, 15] 2016-02-25 10:49:57,710 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime: Signal container failed. Exception: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=9: at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:520) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) at java.lang.Thread.run(Thread.java:745) Caused by: ExitCodeException exitCode=9: at org.apache.hadoop.util.Shell.runCommand(Shell.java:927) at org.apache.hadoop.util.Shell.run(Shell.java:838) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150) ... 9 more {noformat} > Linux container executor fails to delete nmlocal folders > > > Key: YARN-4731 > URL: https://issues.apache.org/jira/browse/YARN-4731 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Priority: Critical > > Enable LCE and CGroups > Submit a mapreduce job > {noformat} > 2016-02-24 18:56:46,889 INFO > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting > absolute path : > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01 > 2016-02-24 18:56:46,894 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: > Shell execution returned exit code: 255. Privileged Execution Operation > Output: > main : command provided 3 > main : run as user is dsperf > main : requested yarn user is dsperf > failed to rmdir job.jar: Not a directory > Error while deleting > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01: > 20 (Not a directory) > Full command array for failed execution: > [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, > dsperf, dsperf, 3, > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01] > 2016-02-24 18:56:46,894 ERROR > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: > DeleteAsUser for > /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01 > returned with exit code: 255 > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=255: >
[jira] [Commented] (YARN-4673) race condition in ResourceTrackerService#nodeHeartBeat while processing deduplicated msg
[ https://issues.apache.org/jira/browse/YARN-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166923#comment-15166923 ] sandflee commented on YARN-4673: Hi, [~ozawa], in ResourceTrackService we may concurrently process nodeHeartBeat() with same nodeId and responseId, they may both pass the lastResonseId check, this will cause the lost of RM message. With the Nodelock, we could process one by one, and the above exception could be catched. {code} if (remoteNodeStatus.getResponseId() + 1 == lastNodeHeartbeatResponse .getResponseId()) { LOG.info("Received duplicate heartbeat from node " + rmNode.getNodeAddress() + " responseId=" + remoteNodeStatus.getResponseId()); return lastNodeHeartbeatResponse; } {code} actually I have not encounter the bug caused by this, but this may be a risk. > race condition in ResourceTrackerService#nodeHeartBeat while processing > deduplicated msg > > > Key: YARN-4673 > URL: https://issues.apache.org/jira/browse/YARN-4673 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: sandflee > Attachments: YARN-4673.01.patch > > > we could add a lock like ApplicationMasterService#allocate -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4718) Rename variables in SchedulerNode to reduce ambiguity post YARN-1011
[ https://issues.apache.org/jira/browse/YARN-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-4718: -- Attachment: YARN-4718-v002.patch Taking latest changes in trunk. > Rename variables in SchedulerNode to reduce ambiguity post YARN-1011 > > > Key: YARN-4718 > URL: https://issues.apache.org/jira/browse/YARN-4718 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Inigo Goiri > Attachments: YARN-4718-v000.patch, YARN-4718-v001.patch, > YARN-4718-v002.patch > > > As discussed in YARN-1011, we are planning to add a few more variables > regarding utilization to SchedulerNode. To ensure the new variables are > descriptive and don't lead to confusion, the existing variables could use > some renaming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)