[jira] [Commented] (YARN-433) When RM is catching up with node updates then it should not expire acquired containers
[ https://issues.apache.org/jira/browse/YARN-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632320#comment-14632320 ] zhihai xu commented on YARN-433: Thanks for the patch [~xgong]! The patch looks good to me except two nits: # It may be better to use {{containerId}} instead of {{remoteContainer.getContainerId()}} as the parameter of {{containerAllocationExpirer.unregister}} since they are equivalent. # Since we do {{containerAllocationExpirer.unregister}} for the completed containers in {{RMNodeImpl#handleContainerStatus}} earlier, Maybe we can remove the redundant code {{containerAllocationExpirer.unregister}} at {{RMContainerImpl#ContainerFinishedAtAcquiredState}}. After we remove this code at {{RMContainerImpl#ContainerFinishedAtAcquiredState}}, ContainerFinishedAtAcquiredState will be the same as FinishedTransition, So we can replace ContainerFinishedAtAcquiredState with FinishedTransition and remove ContainerFinishedAtAcquiredState. {code} .addTransition(RMContainerState.ACQUIRED, RMContainerState.COMPLETED, RMContainerEventType.FINISHED, new FinishedTransition()) {code} When RM is catching up with node updates then it should not expire acquired containers -- Key: YARN-433 URL: https://issues.apache.org/jira/browse/YARN-433 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-433.1.patch, YARN-433.2.patch, YARN-433.3.patch RM expires containers that are not launched within some time of being allocated. The default is 10mins. When an RM is not keeping up with node updates then it may not be aware of new launched containers. If the expire thread fires for such containers then the RM can expire them even though they may have launched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2003: -- Attachment: 0024-YARN-2003.patch Hi [~leftnoteasy] I have done some more analysis on test failure. As per the YARN-3533, fix is done in {{launchAM}} of MockRM. I was using MockRM apis in below way earlier. {code} RMApp app1 = rm.submitApp(1 * GB, appPriority1); nm1.nodeHeartbeat(true); MockAM am1 = rm.sendAMLaunched(app1.getCurrentAppAttempt() .getAppAttemptId()); {code} {{launchAM}} already wraps nm.nodeHeartbeat and sendAMLaunched. And after YARN-3533, its assured that attempt state is SCHEDULED before calling nodeHeartbeat. If we directly use as per above code segment, its some times possible that before app attempt is SCHEDULED, nm heartbeats arrives and attempt may be still ALLOCATED. Hence I changed my test cases to use this wrapped api {{launchAM}} from MockRM. Uploading a patch with same. Support for Application priority : Changes in RM and Capacity Scheduler --- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 0018-YARN-2003.patch, 0019-YARN-2003.patch, 0020-YARN-2003.patch, 0021-YARN-2003.patch, 0022-YARN-2003.patch, 0023-YARN-2003.patch, 0024-YARN-2003.patch AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632353#comment-14632353 ] Hadoop QA commented on YARN-2003: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 33s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 10 new or modified test files. | | {color:green}+1{color} | javac | 7m 50s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 47s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 35s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 22s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 48s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | tools/hadoop tests | 0m 52s | Tests passed in hadoop-sls. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 62m 6s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 106m 53s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745940/0024-YARN-2003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 419c51d | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8580/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-sls test log | https://builds.apache.org/job/PreCommit-YARN-Build/8580/artifact/patchprocess/testrun_hadoop-sls.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8580/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8580/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8580/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8580/console | This message was automatically generated. Support for Application priority : Changes in RM and Capacity Scheduler --- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 0018-YARN-2003.patch, 0019-YARN-2003.patch, 0020-YARN-2003.patch, 0021-YARN-2003.patch, 0022-YARN-2003.patch, 0023-YARN-2003.patch, 0024-YARN-2003.patch AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3535) Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632366#comment-14632366 ] Hudson commented on YARN-3535: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #260 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/260/]) YARN-3535. Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED (rohithsharma and peng.zhang via asuresh) (Arun Suresh: rev 9b272ccae78918e7d756d84920a9322187d61eed) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/SchedulerEventType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/ContainerRescheduledEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED --- Key: YARN-3535 URL: https://issues.apache.org/jira/browse/YARN-3535 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, fairscheduler, resourcemanager Affects Versions: 2.6.0 Reporter: Peng Zhang Assignee: Peng Zhang Priority: Critical Fix For: 2.8.0 Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log During rolling update of NM, AM start of container on NM failed. And then job hang there. Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3844) Make hadoop-yarn-project Native code -Wall-clean
[ https://issues.apache.org/jira/browse/YARN-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632365#comment-14632365 ] Hudson commented on YARN-3844: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #260 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/260/]) YARN-3844. Make hadoop-yarn-project Native code -Wall-clean (Alan Burlison via Colin P. McCabe) (cmccabe: rev 419c51d233bd124eadb38ff013693576ec02c4f1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h Make hadoop-yarn-project Native code -Wall-clean Key: YARN-3844 URL: https://issues.apache.org/jira/browse/YARN-3844 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Environment: As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean Reporter: Alan Burlison Assignee: Alan Burlison Fix For: 2.8.0 Attachments: YARN-3844.001.patch, YARN-3844.002.patch, YARN-3844.007.patch As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3905) Application History Server UI NPEs when accessing apps run after RM restart
[ https://issues.apache.org/jira/browse/YARN-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632367#comment-14632367 ] Hudson commented on YARN-3905: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #260 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/260/]) YARN-3905. Application History Server UI NPEs when accessing apps run after RM restart (Eric Payne via jeagles) (jeagles: rev 7faae0e6fe027a3886d9f4e290b6a488a2c55b3a) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java Application History Server UI NPEs when accessing apps run after RM restart --- Key: YARN-3905 URL: https://issues.apache.org/jira/browse/YARN-3905 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.7.0, 2.8.0, 2.7.1 Reporter: Eric Payne Assignee: Eric Payne Fix For: 3.0.0, 2.8.0, 2.7.2 Attachments: YARN-3905.001.patch, YARN-3905.002.patch From the Application History URL (http://RmHostName:8188/applicationhistory), clicking on the application ID of an app that was run after the RM daemon has been restarted results in a 500 error: {noformat} Sorry, got error 500 Please consult RFC 2616 for meanings of the error code. {noformat} The stack trace is as follows: {code} 2015-07-09 20:13:15,584 [2068024519@qtp-769046918-3] INFO applicationhistoryservice.FileSystemApplicationHistoryStore: Completed reading history information of all application attempts of application application_1436472584878_0001 2015-07-09 20:13:15,591 [2068024519@qtp-769046918-3] ERROR webapp.AppBlock: Failed to read the AM container of the application attempt appattempt_1436472584878_0001_01. java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToContainerReport(ApplicationHistoryManagerImpl.java:206) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ApplicationHistoryManagerImpl.java:199) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainerReport(ApplicationHistoryClientService.java:205) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:272) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:267) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable(AppBlock.java:266) ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3905) Application History Server UI NPEs when accessing apps run after RM restart
[ https://issues.apache.org/jira/browse/YARN-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632372#comment-14632372 ] Hudson commented on YARN-3905: -- FAILURE: Integrated in Hadoop-Yarn-trunk #990 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/990/]) YARN-3905. Application History Server UI NPEs when accessing apps run after RM restart (Eric Payne via jeagles) (jeagles: rev 7faae0e6fe027a3886d9f4e290b6a488a2c55b3a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/CHANGES.txt Application History Server UI NPEs when accessing apps run after RM restart --- Key: YARN-3905 URL: https://issues.apache.org/jira/browse/YARN-3905 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.7.0, 2.8.0, 2.7.1 Reporter: Eric Payne Assignee: Eric Payne Fix For: 3.0.0, 2.8.0, 2.7.2 Attachments: YARN-3905.001.patch, YARN-3905.002.patch From the Application History URL (http://RmHostName:8188/applicationhistory), clicking on the application ID of an app that was run after the RM daemon has been restarted results in a 500 error: {noformat} Sorry, got error 500 Please consult RFC 2616 for meanings of the error code. {noformat} The stack trace is as follows: {code} 2015-07-09 20:13:15,584 [2068024519@qtp-769046918-3] INFO applicationhistoryservice.FileSystemApplicationHistoryStore: Completed reading history information of all application attempts of application application_1436472584878_0001 2015-07-09 20:13:15,591 [2068024519@qtp-769046918-3] ERROR webapp.AppBlock: Failed to read the AM container of the application attempt appattempt_1436472584878_0001_01. java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToContainerReport(ApplicationHistoryManagerImpl.java:206) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ApplicationHistoryManagerImpl.java:199) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainerReport(ApplicationHistoryClientService.java:205) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:272) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:267) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable(AppBlock.java:266) ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3844) Make hadoop-yarn-project Native code -Wall-clean
[ https://issues.apache.org/jira/browse/YARN-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632370#comment-14632370 ] Hudson commented on YARN-3844: -- FAILURE: Integrated in Hadoop-Yarn-trunk #990 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/990/]) YARN-3844. Make hadoop-yarn-project Native code -Wall-clean (Alan Burlison via Colin P. McCabe) (cmccabe: rev 419c51d233bd124eadb38ff013693576ec02c4f1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.c Make hadoop-yarn-project Native code -Wall-clean Key: YARN-3844 URL: https://issues.apache.org/jira/browse/YARN-3844 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Environment: As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean Reporter: Alan Burlison Assignee: Alan Burlison Fix For: 2.8.0 Attachments: YARN-3844.001.patch, YARN-3844.002.patch, YARN-3844.007.patch As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3535) Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632371#comment-14632371 ] Hudson commented on YARN-3535: -- FAILURE: Integrated in Hadoop-Yarn-trunk #990 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/990/]) YARN-3535. Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED (rohithsharma and peng.zhang via asuresh) (Arun Suresh: rev 9b272ccae78918e7d756d84920a9322187d61eed) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/SchedulerEventType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/ContainerRescheduledEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED --- Key: YARN-3535 URL: https://issues.apache.org/jira/browse/YARN-3535 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, fairscheduler, resourcemanager Affects Versions: 2.6.0 Reporter: Peng Zhang Assignee: Peng Zhang Priority: Critical Fix For: 2.8.0 Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log During rolling update of NM, AM start of container on NM failed. And then job hang there. Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3535) Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632451#comment-14632451 ] Hudson commented on YARN-3535: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #249 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/249/]) YARN-3535. Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED (rohithsharma and peng.zhang via asuresh) (Arun Suresh: rev 9b272ccae78918e7d756d84920a9322187d61eed) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/SchedulerEventType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/ContainerRescheduledEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/CHANGES.txt Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED --- Key: YARN-3535 URL: https://issues.apache.org/jira/browse/YARN-3535 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, fairscheduler, resourcemanager Affects Versions: 2.6.0 Reporter: Peng Zhang Assignee: Peng Zhang Priority: Critical Fix For: 2.8.0 Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log During rolling update of NM, AM start of container on NM failed. And then job hang there. Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3844) Make hadoop-yarn-project Native code -Wall-clean
[ https://issues.apache.org/jira/browse/YARN-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632450#comment-14632450 ] Hudson commented on YARN-3844: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #249 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/249/]) YARN-3844. Make hadoop-yarn-project Native code -Wall-clean (Alan Burlison via Colin P. McCabe) (cmccabe: rev 419c51d233bd124eadb38ff013693576ec02c4f1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.c Make hadoop-yarn-project Native code -Wall-clean Key: YARN-3844 URL: https://issues.apache.org/jira/browse/YARN-3844 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Environment: As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean Reporter: Alan Burlison Assignee: Alan Burlison Fix For: 2.8.0 Attachments: YARN-3844.001.patch, YARN-3844.002.patch, YARN-3844.007.patch As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3905) Application History Server UI NPEs when accessing apps run after RM restart
[ https://issues.apache.org/jira/browse/YARN-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632452#comment-14632452 ] Hudson commented on YARN-3905: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #249 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/249/]) YARN-3905. Application History Server UI NPEs when accessing apps run after RM restart (Eric Payne via jeagles) (jeagles: rev 7faae0e6fe027a3886d9f4e290b6a488a2c55b3a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/CHANGES.txt Application History Server UI NPEs when accessing apps run after RM restart --- Key: YARN-3905 URL: https://issues.apache.org/jira/browse/YARN-3905 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.7.0, 2.8.0, 2.7.1 Reporter: Eric Payne Assignee: Eric Payne Fix For: 3.0.0, 2.8.0, 2.7.2 Attachments: YARN-3905.001.patch, YARN-3905.002.patch From the Application History URL (http://RmHostName:8188/applicationhistory), clicking on the application ID of an app that was run after the RM daemon has been restarted results in a 500 error: {noformat} Sorry, got error 500 Please consult RFC 2616 for meanings of the error code. {noformat} The stack trace is as follows: {code} 2015-07-09 20:13:15,584 [2068024519@qtp-769046918-3] INFO applicationhistoryservice.FileSystemApplicationHistoryStore: Completed reading history information of all application attempts of application application_1436472584878_0001 2015-07-09 20:13:15,591 [2068024519@qtp-769046918-3] ERROR webapp.AppBlock: Failed to read the AM container of the application attempt appattempt_1436472584878_0001_01. java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToContainerReport(ApplicationHistoryManagerImpl.java:206) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ApplicationHistoryManagerImpl.java:199) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainerReport(ApplicationHistoryClientService.java:205) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:272) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:267) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable(AppBlock.java:266) ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632383#comment-14632383 ] Sunil G commented on YARN-2003: --- There are no failures for TestNodeLabelContainerAllocation when running locally. This test case failures are unrelated. Support for Application priority : Changes in RM and Capacity Scheduler --- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 0018-YARN-2003.patch, 0019-YARN-2003.patch, 0020-YARN-2003.patch, 0021-YARN-2003.patch, 0022-YARN-2003.patch, 0023-YARN-2003.patch, 0024-YARN-2003.patch AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3784) Indicate preemption timout along with the list of containers to AM (preemption message)
[ https://issues.apache.org/jira/browse/YARN-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632406#comment-14632406 ] Sunil G commented on YARN-3784: --- Thank you [~leftnoteasy] for sharing the comments. I have tried to do the same in second patch. Please correct me if its not the way as you thought. {code} + public synchronized void addPreemptContainer(ContainerId cont, long timeout) { // ignore already completed containers if (liveContainers.containsKey(cont)) { +containersToPreempt.put(cont, timeout); + containersWithFirstNotifyTime.put(cont, clock.getTime()); {code} Here along with maxWaitTime, I also put the time in another map with key as containerID. Now when AM heartbeat comes {{getAllocation}} will be called. Below code is added in {{getAllocation}} {code} +Long currTime = clock.getTime(); +for (ContainerId c : containersToPreempt.keySet()) { + if (containersWithFirstNotifyTime.containsKey(c)) { +Long timeout = containersToPreempt.get(c); +containersToPreempt.put(c, timeout +- (currTime - containersWithFirstNotifyTime.get(c))); + } +} {code} This may happen in another time frame. Hence I have tried to find the {{delta}} duration which is elapsed from time of preempt-container reporting to AM fetching the same. This {{delta}} is reduced from the real {{maxWaitTime}}, which will help to find the possible max time that container will get till it is forcefully released from RM. This is calculated for each container. Could you please check this and let me know if I missed any point. Indicate preemption timout along with the list of containers to AM (preemption message) --- Key: YARN-3784 URL: https://issues.apache.org/jira/browse/YARN-3784 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-3784.patch, 0002-YARN-3784.patch Currently during preemption, AM is notified with a list of containers which are marked for preemption. Introducing a timeout duration also along with this container list so that AM can know how much time it will get to do a graceful shutdown to its containers (assuming one of preemption policy is loaded in AM). This will help in decommissioning NM scenarios, where NM will be decommissioned after a timeout (also killing containers on it). This timeout will be helpful to indicate AM that those containers can be killed by RM forcefully after the timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3844) Make hadoop-yarn-project Native code -Wall-clean
[ https://issues.apache.org/jira/browse/YARN-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632442#comment-14632442 ] Hudson commented on YARN-3844: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2187 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2187/]) YARN-3844. Make hadoop-yarn-project Native code -Wall-clean (Alan Burlison via Colin P. McCabe) (cmccabe: rev 419c51d233bd124eadb38ff013693576ec02c4f1) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h Make hadoop-yarn-project Native code -Wall-clean Key: YARN-3844 URL: https://issues.apache.org/jira/browse/YARN-3844 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Environment: As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean Reporter: Alan Burlison Assignee: Alan Burlison Fix For: 2.8.0 Attachments: YARN-3844.001.patch, YARN-3844.002.patch, YARN-3844.007.patch As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3905) Application History Server UI NPEs when accessing apps run after RM restart
[ https://issues.apache.org/jira/browse/YARN-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632444#comment-14632444 ] Hudson commented on YARN-3905: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2187 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2187/]) YARN-3905. Application History Server UI NPEs when accessing apps run after RM restart (Eric Payne via jeagles) (jeagles: rev 7faae0e6fe027a3886d9f4e290b6a488a2c55b3a) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java Application History Server UI NPEs when accessing apps run after RM restart --- Key: YARN-3905 URL: https://issues.apache.org/jira/browse/YARN-3905 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.7.0, 2.8.0, 2.7.1 Reporter: Eric Payne Assignee: Eric Payne Fix For: 3.0.0, 2.8.0, 2.7.2 Attachments: YARN-3905.001.patch, YARN-3905.002.patch From the Application History URL (http://RmHostName:8188/applicationhistory), clicking on the application ID of an app that was run after the RM daemon has been restarted results in a 500 error: {noformat} Sorry, got error 500 Please consult RFC 2616 for meanings of the error code. {noformat} The stack trace is as follows: {code} 2015-07-09 20:13:15,584 [2068024519@qtp-769046918-3] INFO applicationhistoryservice.FileSystemApplicationHistoryStore: Completed reading history information of all application attempts of application application_1436472584878_0001 2015-07-09 20:13:15,591 [2068024519@qtp-769046918-3] ERROR webapp.AppBlock: Failed to read the AM container of the application attempt appattempt_1436472584878_0001_01. java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToContainerReport(ApplicationHistoryManagerImpl.java:206) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ApplicationHistoryManagerImpl.java:199) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainerReport(ApplicationHistoryClientService.java:205) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:272) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:267) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable(AppBlock.java:266) ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3535) Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632443#comment-14632443 ] Hudson commented on YARN-3535: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2187 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2187/]) YARN-3535. Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED (rohithsharma and peng.zhang via asuresh) (Arun Suresh: rev 9b272ccae78918e7d756d84920a9322187d61eed) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/ContainerRescheduledEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/SchedulerEventType.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED --- Key: YARN-3535 URL: https://issues.apache.org/jira/browse/YARN-3535 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, fairscheduler, resourcemanager Affects Versions: 2.6.0 Reporter: Peng Zhang Assignee: Peng Zhang Priority: Critical Fix For: 2.8.0 Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log During rolling update of NM, AM start of container on NM failed. And then job hang there. Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3844) Make hadoop-yarn-project Native code -Wall-clean
[ https://issues.apache.org/jira/browse/YARN-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632477#comment-14632477 ] Hudson commented on YARN-3844: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #257 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/257/]) YARN-3844. Make hadoop-yarn-project Native code -Wall-clean (Alan Burlison via Colin P. McCabe) (cmccabe: rev 419c51d233bd124eadb38ff013693576ec02c4f1) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c Make hadoop-yarn-project Native code -Wall-clean Key: YARN-3844 URL: https://issues.apache.org/jira/browse/YARN-3844 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Environment: As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean Reporter: Alan Burlison Assignee: Alan Burlison Fix For: 2.8.0 Attachments: YARN-3844.001.patch, YARN-3844.002.patch, YARN-3844.007.patch As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3905) Application History Server UI NPEs when accessing apps run after RM restart
[ https://issues.apache.org/jira/browse/YARN-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632478#comment-14632478 ] Hudson commented on YARN-3905: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #257 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/257/]) YARN-3905. Application History Server UI NPEs when accessing apps run after RM restart (Eric Payne via jeagles) (jeagles: rev 7faae0e6fe027a3886d9f4e290b6a488a2c55b3a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/CHANGES.txt Application History Server UI NPEs when accessing apps run after RM restart --- Key: YARN-3905 URL: https://issues.apache.org/jira/browse/YARN-3905 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.7.0, 2.8.0, 2.7.1 Reporter: Eric Payne Assignee: Eric Payne Fix For: 3.0.0, 2.8.0, 2.7.2 Attachments: YARN-3905.001.patch, YARN-3905.002.patch From the Application History URL (http://RmHostName:8188/applicationhistory), clicking on the application ID of an app that was run after the RM daemon has been restarted results in a 500 error: {noformat} Sorry, got error 500 Please consult RFC 2616 for meanings of the error code. {noformat} The stack trace is as follows: {code} 2015-07-09 20:13:15,584 [2068024519@qtp-769046918-3] INFO applicationhistoryservice.FileSystemApplicationHistoryStore: Completed reading history information of all application attempts of application application_1436472584878_0001 2015-07-09 20:13:15,591 [2068024519@qtp-769046918-3] ERROR webapp.AppBlock: Failed to read the AM container of the application attempt appattempt_1436472584878_0001_01. java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToContainerReport(ApplicationHistoryManagerImpl.java:206) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ApplicationHistoryManagerImpl.java:199) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainerReport(ApplicationHistoryClientService.java:205) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:272) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:267) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable(AppBlock.java:266) ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3844) Make hadoop-yarn-project Native code -Wall-clean
[ https://issues.apache.org/jira/browse/YARN-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632483#comment-14632483 ] Hudson commented on YARN-3844: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2206 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2206/]) YARN-3844. Make hadoop-yarn-project Native code -Wall-clean (Alan Burlison via Colin P. McCabe) (cmccabe: rev 419c51d233bd124eadb38ff013693576ec02c4f1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.c * hadoop-yarn-project/CHANGES.txt Make hadoop-yarn-project Native code -Wall-clean Key: YARN-3844 URL: https://issues.apache.org/jira/browse/YARN-3844 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Environment: As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean Reporter: Alan Burlison Assignee: Alan Burlison Fix For: 2.8.0 Attachments: YARN-3844.001.patch, YARN-3844.002.patch, YARN-3844.007.patch As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3905) Application History Server UI NPEs when accessing apps run after RM restart
[ https://issues.apache.org/jira/browse/YARN-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632484#comment-14632484 ] Hudson commented on YARN-3905: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2206 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2206/]) YARN-3905. Application History Server UI NPEs when accessing apps run after RM restart (Eric Payne via jeagles) (jeagles: rev 7faae0e6fe027a3886d9f4e290b6a488a2c55b3a) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java Application History Server UI NPEs when accessing apps run after RM restart --- Key: YARN-3905 URL: https://issues.apache.org/jira/browse/YARN-3905 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.7.0, 2.8.0, 2.7.1 Reporter: Eric Payne Assignee: Eric Payne Fix For: 3.0.0, 2.8.0, 2.7.2 Attachments: YARN-3905.001.patch, YARN-3905.002.patch From the Application History URL (http://RmHostName:8188/applicationhistory), clicking on the application ID of an app that was run after the RM daemon has been restarted results in a 500 error: {noformat} Sorry, got error 500 Please consult RFC 2616 for meanings of the error code. {noformat} The stack trace is as follows: {code} 2015-07-09 20:13:15,584 [2068024519@qtp-769046918-3] INFO applicationhistoryservice.FileSystemApplicationHistoryStore: Completed reading history information of all application attempts of application application_1436472584878_0001 2015-07-09 20:13:15,591 [2068024519@qtp-769046918-3] ERROR webapp.AppBlock: Failed to read the AM container of the application attempt appattempt_1436472584878_0001_01. java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToContainerReport(ApplicationHistoryManagerImpl.java:206) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ApplicationHistoryManagerImpl.java:199) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainerReport(ApplicationHistoryClientService.java:205) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:272) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:267) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable(AppBlock.java:266) ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3535) Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632603#comment-14632603 ] Hudson commented on YARN-3535: -- FAILURE: Integrated in Hadoop-trunk-Commit #8182 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8182/]) Pulling in YARN-3535 to branch 2.7.x (Arun Suresh: rev 176131f12bc0d467e9caaa6a94b4ba96e09a4539) * hadoop-yarn-project/CHANGES.txt Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED --- Key: YARN-3535 URL: https://issues.apache.org/jira/browse/YARN-3535 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, fairscheduler, resourcemanager Affects Versions: 2.6.0 Reporter: Peng Zhang Assignee: Peng Zhang Priority: Critical Fix For: 2.7.2 Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log During rolling update of NM, AM start of container on NM failed. And then job hang there. Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3535) Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-3535: -- Fix Version/s: (was: 2.8.0) 2.7.2 Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED --- Key: YARN-3535 URL: https://issues.apache.org/jira/browse/YARN-3535 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, fairscheduler, resourcemanager Affects Versions: 2.6.0 Reporter: Peng Zhang Assignee: Peng Zhang Priority: Critical Fix For: 2.7.2 Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log During rolling update of NM, AM start of container on NM failed. And then job hang there. Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3535) Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-3535: -- Target Version/s: 2.7.2 (was: 2.8.0) Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED --- Key: YARN-3535 URL: https://issues.apache.org/jira/browse/YARN-3535 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, fairscheduler, resourcemanager Affects Versions: 2.6.0 Reporter: Peng Zhang Assignee: Peng Zhang Priority: Critical Fix For: 2.7.2 Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log During rolling update of NM, AM start of container on NM failed. And then job hang there. Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2513) Host framework UIs in YARN for use with the ATS
[ https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-2513: -- Labels: 2.6.1-candidate (was: ) Host framework UIs in YARN for use with the ATS --- Key: YARN-2513 URL: https://issues.apache.org/jira/browse/YARN-2513 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Labels: 2.6.1-candidate Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, YARN-2513.v3.patch Allow for pluggable UIs as described by TEZ-8. Yarn can provide the infrastructure to host java script and possible java UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3535) Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632518#comment-14632518 ] Arun Suresh commented on YARN-3535: --- [~jlowe], yup.. ill check it into the 2.7 branch as well... Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED --- Key: YARN-3535 URL: https://issues.apache.org/jira/browse/YARN-3535 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, fairscheduler, resourcemanager Affects Versions: 2.6.0 Reporter: Peng Zhang Assignee: Peng Zhang Priority: Critical Fix For: 2.8.0 Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log During rolling update of NM, AM start of container on NM failed. And then job hang there. Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3929) Uncleaning option for local app log files with log-aggregation feature
[ https://issues.apache.org/jira/browse/YARN-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632535#comment-14632535 ] Dongwook Kwon commented on YARN-3929: - Thanks Xuan for the information. I quickly looked yarn.nodemanager.delete.debug-delay-sec and tested, it appears the setting affect on DeletionService which means it will delay or not to delete all local files which are supposed to be deleted by DeletionService? I do want to keep application log for my own backup/troubleshooting but not for other files for such as application's localization, usercache, filecache, nmPrivate, spilled files etc, I would like to delete these as quick cycle as possible. Please correct me if I was misunderstood about yarn.nodemanager.delete.debug-delay-sec I couldn't find exact what I want, If there is any option that I can keep only application log in local with log-aggregation feature, I would just use it and close this case. Uncleaning option for local app log files with log-aggregation feature -- Key: YARN-3929 URL: https://issues.apache.org/jira/browse/YARN-3929 Project: Hadoop YARN Issue Type: New Feature Components: log-aggregation Affects Versions: 2.4.0, 2.6.0 Reporter: Dongwook Kwon Priority: Minor Attachments: YARN-3929.01.patch Although it makes sense to delete local app log files once AppLogAggregator copied all files into remote location(HDFS), I have some use cases that need to leave local app log files after it's copied to HDFS. Mostly it's for own backup purpose. I would like to use log-aggregation feature of YARN and want to back up app log files too. Without this option, files has to copy from HDFS to local again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3903) Disable preemption at Queue level for Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Tianyi updated YARN-3903: Affects Version/s: 2.4.0 2.5.0 2.6.0 2.7.0 Disable preemption at Queue level for Fair Scheduler Key: YARN-3903 URL: https://issues.apache.org/jira/browse/YARN-3903 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0 Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 (2014-12-08) x86_64 Reporter: He Tianyi Assignee: Karthik Kambatla Priority: Trivial Original Estimate: 72h Remaining Estimate: 72h YARN-2056 supports disabling preemption at queue level for CapacityScheduler. As for fair scheduler, we recently encountered the same need. -- This message was sent by Atlassian JIRA (v6.3.4#6332)