[jira] [Commented] (YARN-3948) Display Application Priority in RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641459#comment-14641459 ] Hadoop QA commented on YARN-3948: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 20m 19s | Pre-patch trunk has 6 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 37s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 21s | The applied patch generated 1 new checkstyle issues (total was 15, now 16). | | {color:red}-1{color} | whitespace | 0m 2s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 6m 30s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 21s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 6m 56s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-server-common. | | {color:red}-1{color} | yarn tests | 52m 20s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 111m 30s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-common | | Failed unit tests | hadoop.yarn.server.resourcemanager.TestClientRMService | | | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12747165/0002-YARN-3948.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / adcf5dd | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8668/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8668/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8668/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8668/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8668/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8668/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8668/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8668/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8668/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8668/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8668/console | This message was automatically generated. Display Application Priority in RM Web UI - Key: YARN-3948 URL: https://issues.apache.org/jira/browse/YARN-3948 Project: Hadoop YARN Issue Type: Sub-task Components: webapp Affects Versions: 2.7.1 Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-3948.patch, 0002-YARN-3948.patch, ApplicationPage.png, ClusterPage.png Application Priority can be displayed in RM Web UI Application page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3656) LowCost: A Cost-Based Placement Agent for YARN Reservations
[ https://issues.apache.org/jira/browse/YARN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-3656: --- Attachment: YARN-3656-v1.5.patch Fixing javadoc and one more checkstyle. LowCost: A Cost-Based Placement Agent for YARN Reservations --- Key: YARN-3656 URL: https://issues.apache.org/jira/browse/YARN-3656 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager Affects Versions: 2.6.0 Reporter: Ishai Menache Assignee: Jonathan Yaniv Labels: capacity-scheduler, resourcemanager Attachments: LowCostRayonExternal.pdf, YARN-3656-v1.1.patch, YARN-3656-v1.2.patch, YARN-3656-v1.3.patch, YARN-3656-v1.4.patch, YARN-3656-v1.5.patch, YARN-3656-v1.patch, lowcostrayonexternal_v2.pdf YARN-1051 enables SLA support by allowing users to reserve cluster capacity ahead of time. YARN-1710 introduced a greedy agent for placing user reservations. The greedy agent makes fast placement decisions but at the cost of ignoring the cluster committed resources, which might result in blocking the cluster resources for certain periods of time, and in turn rejecting some arriving jobs. We propose LowCost – a new cost-based planning algorithm. LowCost “spreads” the demand of the job throughout the allowed time-window according to a global, load-based cost function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3656) LowCost: A Cost-Based Placement Agent for YARN Reservations
[ https://issues.apache.org/jira/browse/YARN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641469#comment-14641469 ] Hadoop QA commented on YARN-3656: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 20s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 14 new or modified test files. | | {color:green}+1{color} | javac | 7m 47s | There were no new javac warning messages. | | {color:red}-1{color} | javadoc | 9m 41s | The applied patch generated 1 additional warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 47s | The applied patch generated 11 new checkstyle issues (total was 115, now 114). | | {color:green}+1{color} | whitespace | 0m 12s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 21s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 27s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 52m 25s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 91m 3s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12747170/YARN-3656-v1.5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / adcf5dd | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/8669/artifact/patchprocess/diffJavadocWarnings.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8669/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8669/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8669/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8669/console | This message was automatically generated. LowCost: A Cost-Based Placement Agent for YARN Reservations --- Key: YARN-3656 URL: https://issues.apache.org/jira/browse/YARN-3656 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager Affects Versions: 2.6.0 Reporter: Ishai Menache Assignee: Jonathan Yaniv Labels: capacity-scheduler, resourcemanager Attachments: LowCostRayonExternal.pdf, YARN-3656-v1.1.patch, YARN-3656-v1.2.patch, YARN-3656-v1.3.patch, YARN-3656-v1.4.patch, YARN-3656-v1.5.patch, YARN-3656-v1.patch, lowcostrayonexternal_v2.pdf YARN-1051 enables SLA support by allowing users to reserve cluster capacity ahead of time. YARN-1710 introduced a greedy agent for placing user reservations. The greedy agent makes fast placement decisions but at the cost of ignoring the cluster committed resources, which might result in blocking the cluster resources for certain periods of time, and in turn rejecting some arriving jobs. We propose LowCost – a new cost-based planning algorithm. LowCost “spreads” the demand of the job throughout the allowed time-window according to a global, load-based cost function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3965) Add starup timestamp for nodemanager
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641473#comment-14641473 ] Hong Zhiguo commented on YARN-3965: --- Hi, [~zxu], thanks for your comments. Here comes my re-consideration. 1. The nmStartupTime could be non-statice field of NodeManager, but it make it harder to access it since the accesser must have a reference to the NodeManager instance. For example, there's no such reference in current implementaion of NodeInfo constructor. One option is to make nmStartupTime as a non-static filed of NMContext. But I doubt is it worth to make simple thing complecated. BTW, the startup timestampt of ResourceManager is also static. 2. It's final so don't need warry about that. Private field with a Getter is also OK if you think it's better. Add starup timestamp for nodemanager Key: YARN-3965 URL: https://issues.apache.org/jira/browse/YARN-3965 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-3965-2.patch, YARN-3965.patch We have startup timestamp for RM already, but don't for NM. Sometimes cluster operator modified configuration of all nodes and kicked off command to restart all NMs. He found out it's hard for him to check whether all NMs are restarted. Actually there's always some NMs didn't restart as he expected, which leads to some error later due to inconsistent configuration. If we have startup timestamp for NM, the operator could easily fetch it via NM webservice and find out which NM didn't restart, and take mannaul action for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641474#comment-14641474 ] Brahma Reddy Battula commented on YARN-3528: will look into to the testcase failures but all locally passed. Tests with 12345 as hard-coded port break jenkins - Key: YARN-3528 URL: https://issues.apache.org/jira/browse/YARN-3528 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Environment: ASF Jenkins Reporter: Steve Loughran Assignee: Brahma Reddy Battula Priority: Blocker Labels: test Attachments: YARN-3528-002.patch, YARN-3528-003.patch, YARN-3528-004.patch, YARN-3528.patch A lot of the YARN tests have hard-coded the port 12345 for their services to come up on. This makes it impossible to have scheduled or precommit tests to run consistently on the ASF jenkins hosts. Instead the tests fail regularly and appear to get ignored completely. A quick grep of 12345 shows up many places in the test suite where this practise has developed. * All {{BaseContainerManagerTest}} subclasses * {{TestNodeManagerShutdown}} * {{TestContainerManager}} + others This needs to be addressed through portscanning and dynamic port allocation. Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3958) TestYarnConfigurationFields should be moved to hadoop-yarn-api
[ https://issues.apache.org/jira/browse/YARN-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641478#comment-14641478 ] Akira AJISAKA commented on YARN-3958: - Rethinking this issue, can we move YarnConfiguration.java to hadoop-yarn-common to fix the problem? If the patch is committed, Jenkins cannot run the test when yarn-default.xml is changed. TestYarnConfigurationFields should be moved to hadoop-yarn-api -- Key: YARN-3958 URL: https://issues.apache.org/jira/browse/YARN-3958 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3958.01.patch, YARN-3958.02.patch, YARN-3958.03.patch Currently TestYarnConfigurationFields is present in hadoop-yarn-common. The test is for checking whether all the configurations declared in YarnConfiguration exist in yarn-default.xml or not. But as YarnConfiguration is in hadoop-yarn-api, if somebody changes this file, it is not necessary that this test will be run. So if the developer misses to update yarn-default.xml and patch is committed, it will lead to unnecessary test failures after commit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3967) Fetch the application report from the AHS if the RM does not know about it
[ https://issues.apache.org/jira/browse/YARN-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641606#comment-14641606 ] Hudson commented on YARN-3967: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2194 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2194/]) YARN-3967. Fetch the application report from the AHS if the RM does not (xgong: rev fbd6063269221ec25834684477f434e19f0b66af) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/AppReportFetcher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestAppReportFetcher.java Fetch the application report from the AHS if the RM does not know about it -- Key: YARN-3967 URL: https://issues.apache.org/jira/browse/YARN-3967 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Mit Desai Assignee: Mit Desai Fix For: 2.7.2 Attachments: YARN-3967.1.patch, YARN-3967.2.patch, YARN-3967.3.patch If the application history service has been enabled and RM has forgotten anout an application, try and fetch the app report form the AHS. On larger clusters, the RM can forget about the applications in about 30 minutes. The proxy url generated during the job submission will try to fetch the app report from the RM and will fail to get anything from there. If the app is not found in the RM, we will need to get the application report from the Application History Server (if it is enabled) to see if we can get any information on that application before throwing an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp
[ https://issues.apache.org/jira/browse/YARN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641614#comment-14641614 ] Hudson commented on YARN-3026: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2194 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2194/]) YARN-3026. Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp. Contributed by Wangda Tan (jianhe: rev 83fe34ac0896cee0918bbfad7bd51231e4aec39b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp --- Key: YARN-3026 URL: https://issues.apache.org/jira/browse/YARN-3026 Project: Hadoop YARN Issue Type: Task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-3026.1.patch, YARN-3026.2.patch, YARN-3026.3.patch, YARN-3026.4.patch, YARN-3026.5.patch, YARN-3026.6.patch Have a discussion with [~vinodkv] and [~jianhe]: In existing Capacity Scheduler, all allocation logics of and under LeafQueue are located in LeafQueue.java in implementation. To make a cleaner scope of LeafQueue, we'd better move some of them to FiCaSchedulerApp. Ideal scope of LeafQueue should be: when a LeafQueue receives some resources from ParentQueue (like 15% of cluster resource), and it distributes resources to children apps, and it should be agnostic to internal logic of children apps (like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how application allocating container from given resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3973) Recent changes to application priority management break reservation system from YARN-1051
[ https://issues.apache.org/jira/browse/YARN-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641615#comment-14641615 ] Hudson commented on YARN-3973: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2194 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2194/]) YARN-3973. Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/CHANGES.txt Recent changes to application priority management break reservation system from YARN-1051 - Key: YARN-3973 URL: https://issues.apache.org/jira/browse/YARN-3973 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.8.0 Attachments: YARN-3973.1.patch, YARN-3973.patch Recent changes in trunk (I think YARN-2003) produce NPE for reservation system when application is submitted to a ReservationQueue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.
[ https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641609#comment-14641609 ] Hudson commented on YARN-1051: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2194 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2194/]) YARN-3973. Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/CHANGES.txt YARN Admission Control/Planner: enhancing the resource allocation model with time. -- Key: YARN-1051 URL: https://issues.apache.org/jira/browse/YARN-1051 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager, scheduler Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.6.0 Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, techreport.pdf In this umbrella JIRA we propose to extend the YARN RM to handle time explicitly, allowing users to reserve capacity over time. This is an important step towards SLAs, long-running services, workflows, and helps for gang scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3969) Allow jobs to be submitted to reservation that is active but does not have any allocations
[ https://issues.apache.org/jira/browse/YARN-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641603#comment-14641603 ] Hudson commented on YARN-3969: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2194 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2194/]) YARN-3969. Allow jobs to be submitted to reservation that is active but does not have any allocations. (subru via curino) (Carlo Curino: rev 0fcb4a8cf2add3f112907ff4e833e2f04947b53e) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservationQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ReservationQueue.java YARN-3969. Updating CHANGES.txt to reflect the correct set of branches where this is committed (Carlo Curino: rev fc42fa8ae3bc9d6d055090a7bb5e6f0c5972fcff) * hadoop-yarn-project/CHANGES.txt Allow jobs to be submitted to reservation that is active but does not have any allocations -- Key: YARN-3969 URL: https://issues.apache.org/jira/browse/YARN-3969 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler, resourcemanager Reporter: Subru Krishnan Assignee: Subru Krishnan Fix For: 2.8.0, 2.7.2 Attachments: YARN-3969-v1.patch, YARN-3969-v2.patch YARN-1051 introduces the notion of reserving resources prior to job submission. A reservation is active from its arrival time to deadline but in the interim there can be instances of time when it does not have any resources allocated. We reject jobs that are submitted when the reservation allocation is zero. Instead we should accept queue the jobs till the resources become available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3656) LowCost: A Cost-Based Placement Agent for YARN Reservations
[ https://issues.apache.org/jira/browse/YARN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-3656: --- Issue Type: Sub-task (was: Improvement) Parent: YARN-2572 LowCost: A Cost-Based Placement Agent for YARN Reservations --- Key: YARN-3656 URL: https://issues.apache.org/jira/browse/YARN-3656 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Affects Versions: 2.6.0 Reporter: Ishai Menache Assignee: Jonathan Yaniv Labels: capacity-scheduler, resourcemanager Fix For: 2.8.0 Attachments: LowCostRayonExternal.pdf, YARN-3656-v1.1.patch, YARN-3656-v1.2.patch, YARN-3656-v1.3.patch, YARN-3656-v1.4.patch, YARN-3656-v1.5.patch, YARN-3656-v1.patch, lowcostrayonexternal_v2.pdf YARN-1051 enables SLA support by allowing users to reserve cluster capacity ahead of time. YARN-1710 introduced a greedy agent for placing user reservations. The greedy agent makes fast placement decisions but at the cost of ignoring the cluster committed resources, which might result in blocking the cluster resources for certain periods of time, and in turn rejecting some arriving jobs. We propose LowCost – a new cost-based planning algorithm. LowCost “spreads” the demand of the job throughout the allowed time-window according to a global, load-based cost function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3957) FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500
[ https://issues.apache.org/jira/browse/YARN-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641628#comment-14641628 ] Hudson commented on YARN-3957: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #256 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/256/]) YARN-3957. FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500. (Anubhav Dhoot via kasha) (kasha: rev d19d18775368f5aaa254881165acc1299837072b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/TestFairSchedulerQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java * hadoop-yarn-project/CHANGES.txt FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500 Key: YARN-3957 URL: https://issues.apache.org/jira/browse/YARN-3957 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.8.0 Attachments: YARN-3957.001.patch, YARN-3957.002.patch There is a NPE causing the webpage of http://localhost:23188/cluster/scheduler to return a 500. This seems to be because of YARN-2336 setting null for childQueues and then getChildQueues hits the NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3967) Fetch the application report from the AHS if the RM does not know about it
[ https://issues.apache.org/jira/browse/YARN-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641624#comment-14641624 ] Hudson commented on YARN-3967: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #256 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/256/]) YARN-3967. Fetch the application report from the AHS if the RM does not (xgong: rev fbd6063269221ec25834684477f434e19f0b66af) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/AppReportFetcher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestAppReportFetcher.java Fetch the application report from the AHS if the RM does not know about it -- Key: YARN-3967 URL: https://issues.apache.org/jira/browse/YARN-3967 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Mit Desai Assignee: Mit Desai Fix For: 2.7.2 Attachments: YARN-3967.1.patch, YARN-3967.2.patch, YARN-3967.3.patch If the application history service has been enabled and RM has forgotten anout an application, try and fetch the app report form the AHS. On larger clusters, the RM can forget about the applications in about 30 minutes. The proxy url generated during the job submission will try to fetch the app report from the RM and will fail to get anything from there. If the app is not found in the RM, we will need to get the application report from the Application History Server (if it is enabled) to see if we can get any information on that application before throwing an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.
[ https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641626#comment-14641626 ] Hudson commented on YARN-3925: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #256 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/256/]) YARN-3925. ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks. Contributed by zhihai xu (jlowe: rev ff9c13e0a739bb13115167dc661b6a16b2ed2c04) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks. - Key: YARN-3925 URL: https://issues.apache.org/jira/browse/YARN-3925 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.1 Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Fix For: 2.7.2 Attachments: YARN-3925.000.patch, YARN-3925.001.patch ContainerLogsUtils#getContainerLogFile fails to read files from full disks. {{getContainerLogFile}} depends on {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but {{LocalDirsHandlerService#getLogPathToRead}} calls {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not include full disks in {{LocalDirsHandlerService#checkDirs}}: {code} Configuration conf = getConfig(); ListString localDirs = getLocalDirs(); conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS, localDirs.toArray(new String[localDirs.size()])); ListString logDirs = getLogDirs(); conf.setStrings(YarnConfiguration.NM_LOG_DIRS, logDirs.toArray(new String[logDirs.size()])); {code} ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and ContainerLogsPage.ContainersLogsBlock#render to read the log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3969) Allow jobs to be submitted to reservation that is active but does not have any allocations
[ https://issues.apache.org/jira/browse/YARN-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641665#comment-14641665 ] Hudson commented on YARN-3969: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #264 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/264/]) YARN-3969. Allow jobs to be submitted to reservation that is active but does not have any allocations. (subru via curino) (Carlo Curino: rev 0fcb4a8cf2add3f112907ff4e833e2f04947b53e) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ReservationQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservationQueue.java YARN-3969. Updating CHANGES.txt to reflect the correct set of branches where this is committed (Carlo Curino: rev fc42fa8ae3bc9d6d055090a7bb5e6f0c5972fcff) * hadoop-yarn-project/CHANGES.txt Allow jobs to be submitted to reservation that is active but does not have any allocations -- Key: YARN-3969 URL: https://issues.apache.org/jira/browse/YARN-3969 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler, resourcemanager Reporter: Subru Krishnan Assignee: Subru Krishnan Fix For: 2.8.0, 2.7.2 Attachments: YARN-3969-v1.patch, YARN-3969-v2.patch YARN-1051 introduces the notion of reserving resources prior to job submission. A reservation is active from its arrival time to deadline but in the interim there can be instances of time when it does not have any resources allocated. We reject jobs that are submitted when the reservation allocation is zero. Instead we should accept queue the jobs till the resources become available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp
[ https://issues.apache.org/jira/browse/YARN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641676#comment-14641676 ] Hudson commented on YARN-3026: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #264 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/264/]) YARN-3026. Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp. Contributed by Wangda Tan (jianhe: rev 83fe34ac0896cee0918bbfad7bd51231e4aec39b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp --- Key: YARN-3026 URL: https://issues.apache.org/jira/browse/YARN-3026 Project: Hadoop YARN Issue Type: Task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-3026.1.patch, YARN-3026.2.patch, YARN-3026.3.patch, YARN-3026.4.patch, YARN-3026.5.patch, YARN-3026.6.patch Have a discussion with [~vinodkv] and [~jianhe]: In existing Capacity Scheduler, all allocation logics of and under LeafQueue are located in LeafQueue.java in implementation. To make a cleaner scope of LeafQueue, we'd better move some of them to FiCaSchedulerApp. Ideal scope of LeafQueue should be: when a LeafQueue receives some resources from ParentQueue (like 15% of cluster resource), and it distributes resources to children apps, and it should be agnostic to internal logic of children apps (like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how application allocating container from given resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.
[ https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641671#comment-14641671 ] Hudson commented on YARN-1051: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #264 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/264/]) YARN-3973. Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java YARN Admission Control/Planner: enhancing the resource allocation model with time. -- Key: YARN-1051 URL: https://issues.apache.org/jira/browse/YARN-1051 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager, scheduler Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.6.0 Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, techreport.pdf In this umbrella JIRA we propose to extend the YARN RM to handle time explicitly, allowing users to reserve capacity over time. This is an important step towards SLAs, long-running services, workflows, and helps for gang scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3905) Application History Server UI NPEs when accessing apps run after RM restart
[ https://issues.apache.org/jira/browse/YARN-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3905: --- Release Note: (was: Resubmitting patch after fixing checkstyle warnings.) Application History Server UI NPEs when accessing apps run after RM restart --- Key: YARN-3905 URL: https://issues.apache.org/jira/browse/YARN-3905 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.7.0, 2.8.0, 2.7.1 Reporter: Eric Payne Assignee: Eric Payne Fix For: 3.0.0, 2.8.0, 2.7.2 Attachments: YARN-3905.001.patch, YARN-3905.002.patch From the Application History URL (http://RmHostName:8188/applicationhistory), clicking on the application ID of an app that was run after the RM daemon has been restarted results in a 500 error: {noformat} Sorry, got error 500 Please consult RFC 2616 for meanings of the error code. {noformat} The stack trace is as follows: {code} 2015-07-09 20:13:15,584 [2068024519@qtp-769046918-3] INFO applicationhistoryservice.FileSystemApplicationHistoryStore: Completed reading history information of all application attempts of application application_1436472584878_0001 2015-07-09 20:13:15,591 [2068024519@qtp-769046918-3] ERROR webapp.AppBlock: Failed to read the AM container of the application attempt appattempt_1436472584878_0001_01. java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToContainerReport(ApplicationHistoryManagerImpl.java:206) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ApplicationHistoryManagerImpl.java:199) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainerReport(ApplicationHistoryClientService.java:205) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:272) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:267) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable(AppBlock.java:266) ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3973) Recent changes to application priority management break reservation system from YARN-1051
[ https://issues.apache.org/jira/browse/YARN-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641633#comment-14641633 ] Hudson commented on YARN-3973: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #256 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/256/]) YARN-3973. Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java Recent changes to application priority management break reservation system from YARN-1051 - Key: YARN-3973 URL: https://issues.apache.org/jira/browse/YARN-3973 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.8.0 Attachments: YARN-3973.1.patch, YARN-3973.patch Recent changes in trunk (I think YARN-2003) produce NPE for reservation system when application is submitted to a ReservationQueue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.
[ https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641627#comment-14641627 ] Hudson commented on YARN-1051: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #256 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/256/]) YARN-3973. Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java YARN Admission Control/Planner: enhancing the resource allocation model with time. -- Key: YARN-1051 URL: https://issues.apache.org/jira/browse/YARN-1051 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager, scheduler Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.6.0 Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, techreport.pdf In this umbrella JIRA we propose to extend the YARN RM to handle time explicitly, allowing users to reserve capacity over time. This is an important step towards SLAs, long-running services, workflows, and helps for gang scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3969) Allow jobs to be submitted to reservation that is active but does not have any allocations
[ https://issues.apache.org/jira/browse/YARN-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641621#comment-14641621 ] Hudson commented on YARN-3969: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #256 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/256/]) YARN-3969. Allow jobs to be submitted to reservation that is active but does not have any allocations. (subru via curino) (Carlo Curino: rev 0fcb4a8cf2add3f112907ff4e833e2f04947b53e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ReservationQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservationQueue.java YARN-3969. Updating CHANGES.txt to reflect the correct set of branches where this is committed (Carlo Curino: rev fc42fa8ae3bc9d6d055090a7bb5e6f0c5972fcff) * hadoop-yarn-project/CHANGES.txt Allow jobs to be submitted to reservation that is active but does not have any allocations -- Key: YARN-3969 URL: https://issues.apache.org/jira/browse/YARN-3969 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler, resourcemanager Reporter: Subru Krishnan Assignee: Subru Krishnan Fix For: 2.8.0, 2.7.2 Attachments: YARN-3969-v1.patch, YARN-3969-v2.patch YARN-1051 introduces the notion of reserving resources prior to job submission. A reservation is active from its arrival time to deadline but in the interim there can be instances of time when it does not have any resources allocated. We reject jobs that are submitted when the reservation allocation is zero. Instead we should accept queue the jobs till the resources become available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery
[ https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3971: --- Attachment: 0002-YARN-3971.patch Attaching patch with update and testcase. [~leftnoteasy] Please review patch attached. Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery -- Key: YARN-3971 URL: https://issues.apache.org/jira/browse/YARN-3971 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch Steps to reproduce # Create label x,y # Delete label x,y # Create label x,y add capacity scheduler xml for labels x and y too # Restart RM Both RM will become Standby. Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}} {code} 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in state STARTED; cause: java.io.IOException: Cannot remove label=x, because queue=a1 is using this label. Please remove label on queue before remove the label java.io.IOException: Cannot remove label=x, because queue=a1 is using this label. Please remove label on queue before remove the label at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104) at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118) at org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641736#comment-14641736 ] Varun Saxena commented on YARN-3528: [~brahmareddy], Few comments : # Amongst the test failures which have come in QA report above, except one test each in TestDeletionService and TestResourceLocalizationService, all are related to your code change. # You need to make a call to {{ServerSocketUtil#getPort}} only in places where the port is used for binding to a socket. You have used it elsewhere as well. # In {{TestNodeManagerReboot#createNMConfig}}, below changes would lead to bind exception. Because at the time of setting config, call to ServerSocketUtil#getPort will return 49152(if free) and both NM_ADDRESS and NM_LOCALIZER_ADDRESS will be set using same port. The start port should be different at the time of second call to getPort. {code} - conf.set(YarnConfiguration.NM_ADDRESS, 127.0.0.1:12345); - conf.set(YarnConfiguration.NM_LOCALIZER_ADDRESS, 127.0.0.1:12346); + conf.set(YarnConfiguration.NM_ADDRESS, + 127.0.0.1: + ServerSocketUtil.getPort(49152, 10)); + conf.set(YarnConfiguration.NM_LOCALIZER_ADDRESS, 127.0.0.1: + + ServerSocketUtil.getPort(49152, 10)); {code} # Same problem as above exists in {{TestNodeManagerResync#createNMConfig}} which will lead to test failures due to BindException. # In {{TestNMContainerTokenSecretManager}}, {{TestRMAppLogAggregationStatus}} and {{TestNMTokenSecretManagerInNM}}, you do not need to get port from ServerSocketUtil to set NodeID # In {{TestNMWebServer#testNMWebApp}}, you do not need to call ServerSocketUtil#getPort to create the token. Token is not used for socket binding. # In {{TestRMApplicationHistoryWriter}} and {{TestAMRMRPCResponseId}}, call to MockRM#registerNode does not need a unique port to bind to. So again we do not need to get port from ServerSocketUtil # Same applies for {{TestAMRestart}} wherever MockNM constructor is invoked. # In {{TestAMRestart}}, you need to have different ports wherever more than one MockNM object is created. This is leading to test failure in QA report above. # Nit: Formatting of below piece of code in {{TestNMWebServer}} is not correct. {code} Token containerToken = - BuilderUtils.newContainerToken(containerId, 127.0.0.1, 1234, user, + BuilderUtils.newContainerToken(containerId, + 127.0.0.1, ServerSocketUtil.getPort(49152, 10), user, {code} Tests with 12345 as hard-coded port break jenkins - Key: YARN-3528 URL: https://issues.apache.org/jira/browse/YARN-3528 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Environment: ASF Jenkins Reporter: Steve Loughran Assignee: Brahma Reddy Battula Priority: Blocker Labels: test Attachments: YARN-3528-002.patch, YARN-3528-003.patch, YARN-3528-004.patch, YARN-3528.patch A lot of the YARN tests have hard-coded the port 12345 for their services to come up on. This makes it impossible to have scheduled or precommit tests to run consistently on the ASF jenkins hosts. Instead the tests fail regularly and appear to get ignored completely. A quick grep of 12345 shows up many places in the test suite where this practise has developed. * All {{BaseContainerManagerTest}} subclasses * {{TestNodeManagerShutdown}} * {{TestContainerManager}} + others This needs to be addressed through portscanning and dynamic port allocation. Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.
[ https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641608#comment-14641608 ] Hudson commented on YARN-3925: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2194 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2194/]) YARN-3925. ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks. Contributed by zhihai xu (jlowe: rev ff9c13e0a739bb13115167dc661b6a16b2ed2c04) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java * hadoop-yarn-project/CHANGES.txt ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks. - Key: YARN-3925 URL: https://issues.apache.org/jira/browse/YARN-3925 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.1 Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Fix For: 2.7.2 Attachments: YARN-3925.000.patch, YARN-3925.001.patch ContainerLogsUtils#getContainerLogFile fails to read files from full disks. {{getContainerLogFile}} depends on {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but {{LocalDirsHandlerService#getLogPathToRead}} calls {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not include full disks in {{LocalDirsHandlerService#checkDirs}}: {code} Configuration conf = getConfig(); ListString localDirs = getLocalDirs(); conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS, localDirs.toArray(new String[localDirs.size()])); ListString logDirs = getLogDirs(); conf.setStrings(YarnConfiguration.NM_LOG_DIRS, logDirs.toArray(new String[logDirs.size()])); {code} ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and ContainerLogsPage.ContainersLogsBlock#render to read the log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3957) FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500
[ https://issues.apache.org/jira/browse/YARN-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641610#comment-14641610 ] Hudson commented on YARN-3957: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2194 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2194/]) YARN-3957. FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500. (Anubhav Dhoot via kasha) (kasha: rev d19d18775368f5aaa254881165acc1299837072b) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/TestFairSchedulerQueueInfo.java FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500 Key: YARN-3957 URL: https://issues.apache.org/jira/browse/YARN-3957 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.8.0 Attachments: YARN-3957.001.patch, YARN-3957.002.patch There is a NPE causing the webpage of http://localhost:23188/cluster/scheduler to return a 500. This seems to be because of YARN-2336 setting null for childQueues and then getChildQueues hits the NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3967) Fetch the application report from the AHS if the RM does not know about it
[ https://issues.apache.org/jira/browse/YARN-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641668#comment-14641668 ] Hudson commented on YARN-3967: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #264 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/264/]) YARN-3967. Fetch the application report from the AHS if the RM does not (xgong: rev fbd6063269221ec25834684477f434e19f0b66af) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/AppReportFetcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestAppReportFetcher.java * hadoop-yarn-project/CHANGES.txt Fetch the application report from the AHS if the RM does not know about it -- Key: YARN-3967 URL: https://issues.apache.org/jira/browse/YARN-3967 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Mit Desai Assignee: Mit Desai Fix For: 2.7.2 Attachments: YARN-3967.1.patch, YARN-3967.2.patch, YARN-3967.3.patch If the application history service has been enabled and RM has forgotten anout an application, try and fetch the app report form the AHS. On larger clusters, the RM can forget about the applications in about 30 minutes. The proxy url generated during the job submission will try to fetch the app report from the RM and will fail to get anything from there. If the app is not found in the RM, we will need to get the application report from the Application History Server (if it is enabled) to see if we can get any information on that application before throwing an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3957) FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500
[ https://issues.apache.org/jira/browse/YARN-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641672#comment-14641672 ] Hudson commented on YARN-3957: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #264 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/264/]) YARN-3957. FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500. (Anubhav Dhoot via kasha) (kasha: rev d19d18775368f5aaa254881165acc1299837072b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/TestFairSchedulerQueueInfo.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500 Key: YARN-3957 URL: https://issues.apache.org/jira/browse/YARN-3957 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.8.0 Attachments: YARN-3957.001.patch, YARN-3957.002.patch There is a NPE causing the webpage of http://localhost:23188/cluster/scheduler to return a 500. This seems to be because of YARN-2336 setting null for childQueues and then getChildQueues hits the NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.
[ https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641670#comment-14641670 ] Hudson commented on YARN-3925: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #264 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/264/]) YARN-3925. ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks. Contributed by zhihai xu (jlowe: rev ff9c13e0a739bb13115167dc661b6a16b2ed2c04) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java * hadoop-yarn-project/CHANGES.txt ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks. - Key: YARN-3925 URL: https://issues.apache.org/jira/browse/YARN-3925 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.1 Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Fix For: 2.7.2 Attachments: YARN-3925.000.patch, YARN-3925.001.patch ContainerLogsUtils#getContainerLogFile fails to read files from full disks. {{getContainerLogFile}} depends on {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but {{LocalDirsHandlerService#getLogPathToRead}} calls {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not include full disks in {{LocalDirsHandlerService#checkDirs}}: {code} Configuration conf = getConfig(); ListString localDirs = getLocalDirs(); conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS, localDirs.toArray(new String[localDirs.size()])); ListString logDirs = getLogDirs(); conf.setStrings(YarnConfiguration.NM_LOG_DIRS, logDirs.toArray(new String[logDirs.size()])); {code} ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and ContainerLogsPage.ContainersLogsBlock#render to read the log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3973) Recent changes to application priority management break reservation system from YARN-1051
[ https://issues.apache.org/jira/browse/YARN-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641677#comment-14641677 ] Hudson commented on YARN-3973: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #264 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/264/]) YARN-3973. Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java Recent changes to application priority management break reservation system from YARN-1051 - Key: YARN-3973 URL: https://issues.apache.org/jira/browse/YARN-3973 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.8.0 Attachments: YARN-3973.1.patch, YARN-3973.patch Recent changes in trunk (I think YARN-2003) produce NPE for reservation system when application is submitted to a ReservationQueue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery
[ https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3971: --- Attachment: 0003-YARN-3971.patch Testcase failure is unrelated. Verified locally testcase is passing. Fixed checkstyle Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery -- Key: YARN-3971 URL: https://issues.apache.org/jira/browse/YARN-3971 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, 0003-YARN-3971.patch Steps to reproduce # Create label x,y # Delete label x,y # Create label x,y add capacity scheduler xml for labels x and y too # Restart RM Both RM will become Standby. Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}} {code} 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in state STARTED; cause: java.io.IOException: Cannot remove label=x, because queue=a1 is using this label. Please remove label on queue before remove the label java.io.IOException: Cannot remove label=x, because queue=a1 is using this label. Please remove label on queue before remove the label at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104) at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118) at org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3948) Display Application Priority in RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3948: -- Attachment: (was: ApplicationPage.png) Display Application Priority in RM Web UI - Key: YARN-3948 URL: https://issues.apache.org/jira/browse/YARN-3948 Project: Hadoop YARN Issue Type: Sub-task Components: webapp Affects Versions: 2.7.1 Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-3948.patch, ApplicationPage.png, ClusterPage.png Application Priority can be displayed in RM Web UI Application page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3948) Display Application Priority in RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3948: -- Attachment: ClusterPage.png ApplicationPage.png Display Application Priority in RM Web UI - Key: YARN-3948 URL: https://issues.apache.org/jira/browse/YARN-3948 Project: Hadoop YARN Issue Type: Sub-task Components: webapp Affects Versions: 2.7.1 Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-3948.patch, ApplicationPage.png, ClusterPage.png Application Priority can be displayed in RM Web UI Application page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3948) Display Application Priority in RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3948: -- Attachment: (was: ClusterPage.png) Display Application Priority in RM Web UI - Key: YARN-3948 URL: https://issues.apache.org/jira/browse/YARN-3948 Project: Hadoop YARN Issue Type: Sub-task Components: webapp Affects Versions: 2.7.1 Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-3948.patch, ApplicationPage.png, ClusterPage.png Application Priority can be displayed in RM Web UI Application page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3948) Display Application Priority in RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3948: -- Attachment: 0002-YARN-3948.patch Thank you [~jianhe] and [~rohithsharma] Uploading a new patch after addressing the review comments. Also attached new screen shots. Display Application Priority in RM Web UI - Key: YARN-3948 URL: https://issues.apache.org/jira/browse/YARN-3948 Project: Hadoop YARN Issue Type: Sub-task Components: webapp Affects Versions: 2.7.1 Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-3948.patch, 0002-YARN-3948.patch, ApplicationPage.png, ClusterPage.png Application Priority can be displayed in RM Web UI Application page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3656) LowCost: A Cost-Based Placement Agent for YARN Reservations
[ https://issues.apache.org/jira/browse/YARN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641645#comment-14641645 ] Carlo Curino commented on YARN-3656: +1 on this patch. I committed this to trunk/branch-2, after manually inspecting the checkstyle and javadoc (fix a couple more but the rest will not go away as checkstyle misses a few uses of link tag in javadoc). Thanks [~jyaniv] and [~imenache] for this important contribution, (and for having tested it and polished those algos endlessly). Thanks to [~subru] for helping sheparding this and [~asuresh] for reviewing. LowCost: A Cost-Based Placement Agent for YARN Reservations --- Key: YARN-3656 URL: https://issues.apache.org/jira/browse/YARN-3656 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager Affects Versions: 2.6.0 Reporter: Ishai Menache Assignee: Jonathan Yaniv Labels: capacity-scheduler, resourcemanager Fix For: 2.8.0 Attachments: LowCostRayonExternal.pdf, YARN-3656-v1.1.patch, YARN-3656-v1.2.patch, YARN-3656-v1.3.patch, YARN-3656-v1.4.patch, YARN-3656-v1.5.patch, YARN-3656-v1.patch, lowcostrayonexternal_v2.pdf YARN-1051 enables SLA support by allowing users to reserve cluster capacity ahead of time. YARN-1710 introduced a greedy agent for placing user reservations. The greedy agent makes fast placement decisions but at the cost of ignoring the cluster committed resources, which might result in blocking the cluster resources for certain periods of time, and in turn rejecting some arriving jobs. We propose LowCost – a new cost-based planning algorithm. LowCost “spreads” the demand of the job throughout the allowed time-window according to a global, load-based cost function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3656) LowCost: A Cost-Based Placement Agent for YARN Reservations
[ https://issues.apache.org/jira/browse/YARN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641644#comment-14641644 ] Hudson commented on YARN-3656: -- FAILURE: Integrated in Hadoop-trunk-Commit #8222 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8222/]) YARN-3656. LowCost: A Cost-Based Placement Agent for YARN Reservations. (Jonathan Yaniv and Ishai Menache via curino) (ccurino: rev 156f24ead00436faad5d4aeef327a546392cd265) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/StageEarliestStart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/StageEarliestStartByDemand.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/SimpleCapacityReplanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestAlignedPlanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/StageEarliestStartByJobArrival.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/StageAllocator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestGreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestSimpleCapacityReplanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationAgent.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/PlanContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/StageAllocatorGreedy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestNoOverCommitPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TryManyReservationAgents.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/ReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/GreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/GreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/RLESparseResourceAllocation.java *
[jira] [Commented] (YARN-3967) Fetch the application report from the AHS if the RM does not know about it
[ https://issues.apache.org/jira/browse/YARN-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641690#comment-14641690 ] Hudson commented on YARN-3967: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2213 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2213/]) YARN-3967. Fetch the application report from the AHS if the RM does not (xgong: rev fbd6063269221ec25834684477f434e19f0b66af) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestAppReportFetcher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/AppReportFetcher.java Fetch the application report from the AHS if the RM does not know about it -- Key: YARN-3967 URL: https://issues.apache.org/jira/browse/YARN-3967 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Mit Desai Assignee: Mit Desai Fix For: 2.7.2 Attachments: YARN-3967.1.patch, YARN-3967.2.patch, YARN-3967.3.patch If the application history service has been enabled and RM has forgotten anout an application, try and fetch the app report form the AHS. On larger clusters, the RM can forget about the applications in about 30 minutes. The proxy url generated during the job submission will try to fetch the app report from the RM and will fail to get anything from there. If the app is not found in the RM, we will need to get the application report from the Application History Server (if it is enabled) to see if we can get any information on that application before throwing an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.
[ https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641692#comment-14641692 ] Hudson commented on YARN-3925: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2213 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2213/]) YARN-3925. ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks. Contributed by zhihai xu (jlowe: rev ff9c13e0a739bb13115167dc661b6a16b2ed2c04) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java * hadoop-yarn-project/CHANGES.txt ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks. - Key: YARN-3925 URL: https://issues.apache.org/jira/browse/YARN-3925 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.1 Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Fix For: 2.7.2 Attachments: YARN-3925.000.patch, YARN-3925.001.patch ContainerLogsUtils#getContainerLogFile fails to read files from full disks. {{getContainerLogFile}} depends on {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but {{LocalDirsHandlerService#getLogPathToRead}} calls {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not include full disks in {{LocalDirsHandlerService#checkDirs}}: {code} Configuration conf = getConfig(); ListString localDirs = getLocalDirs(); conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS, localDirs.toArray(new String[localDirs.size()])); ListString logDirs = getLogDirs(); conf.setStrings(YarnConfiguration.NM_LOG_DIRS, logDirs.toArray(new String[logDirs.size()])); {code} ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and ContainerLogsPage.ContainersLogsBlock#render to read the log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp
[ https://issues.apache.org/jira/browse/YARN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641698#comment-14641698 ] Hudson commented on YARN-3026: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2213 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2213/]) YARN-3026. Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp. Contributed by Wangda Tan (jianhe: rev 83fe34ac0896cee0918bbfad7bd51231e4aec39b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp --- Key: YARN-3026 URL: https://issues.apache.org/jira/browse/YARN-3026 Project: Hadoop YARN Issue Type: Task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-3026.1.patch, YARN-3026.2.patch, YARN-3026.3.patch, YARN-3026.4.patch, YARN-3026.5.patch, YARN-3026.6.patch Have a discussion with [~vinodkv] and [~jianhe]: In existing Capacity Scheduler, all allocation logics of and under LeafQueue are located in LeafQueue.java in implementation. To make a cleaner scope of LeafQueue, we'd better move some of them to FiCaSchedulerApp. Ideal scope of LeafQueue should be: when a LeafQueue receives some resources from ParentQueue (like 15% of cluster resource), and it distributes resources to children apps, and it should be agnostic to internal logic of children apps (like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how application allocating container from given resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3969) Allow jobs to be submitted to reservation that is active but does not have any allocations
[ https://issues.apache.org/jira/browse/YARN-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641687#comment-14641687 ] Hudson commented on YARN-3969: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2213 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2213/]) YARN-3969. Allow jobs to be submitted to reservation that is active but does not have any allocations. (subru via curino) (Carlo Curino: rev 0fcb4a8cf2add3f112907ff4e833e2f04947b53e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservationQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ReservationQueue.java * hadoop-yarn-project/CHANGES.txt YARN-3969. Updating CHANGES.txt to reflect the correct set of branches where this is committed (Carlo Curino: rev fc42fa8ae3bc9d6d055090a7bb5e6f0c5972fcff) * hadoop-yarn-project/CHANGES.txt Allow jobs to be submitted to reservation that is active but does not have any allocations -- Key: YARN-3969 URL: https://issues.apache.org/jira/browse/YARN-3969 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler, resourcemanager Reporter: Subru Krishnan Assignee: Subru Krishnan Fix For: 2.8.0, 2.7.2 Attachments: YARN-3969-v1.patch, YARN-3969-v2.patch YARN-1051 introduces the notion of reserving resources prior to job submission. A reservation is active from its arrival time to deadline but in the interim there can be instances of time when it does not have any resources allocated. We reject jobs that are submitted when the reservation allocation is zero. Instead we should accept queue the jobs till the resources become available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.
[ https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641693#comment-14641693 ] Hudson commented on YARN-1051: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2213 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2213/]) YARN-3973. Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java YARN Admission Control/Planner: enhancing the resource allocation model with time. -- Key: YARN-1051 URL: https://issues.apache.org/jira/browse/YARN-1051 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager, scheduler Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.6.0 Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, techreport.pdf In this umbrella JIRA we propose to extend the YARN RM to handle time explicitly, allowing users to reserve capacity over time. This is an important step towards SLAs, long-running services, workflows, and helps for gang scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3957) FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500
[ https://issues.apache.org/jira/browse/YARN-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641694#comment-14641694 ] Hudson commented on YARN-3957: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2213 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2213/]) YARN-3957. FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500. (Anubhav Dhoot via kasha) (kasha: rev d19d18775368f5aaa254881165acc1299837072b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/TestFairSchedulerQueueInfo.java FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500 Key: YARN-3957 URL: https://issues.apache.org/jira/browse/YARN-3957 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.8.0 Attachments: YARN-3957.001.patch, YARN-3957.002.patch There is a NPE causing the webpage of http://localhost:23188/cluster/scheduler to return a 500. This seems to be because of YARN-2336 setting null for childQueues and then getChildQueues hits the NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3973) Recent changes to application priority management break reservation system from YARN-1051
[ https://issues.apache.org/jira/browse/YARN-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641699#comment-14641699 ] Hudson commented on YARN-3973: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2213 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2213/]) YARN-3973. Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java Recent changes to application priority management break reservation system from YARN-1051 - Key: YARN-3973 URL: https://issues.apache.org/jira/browse/YARN-3973 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.8.0 Attachments: YARN-3973.1.patch, YARN-3973.patch Recent changes in trunk (I think YARN-2003) produce NPE for reservation system when application is submitted to a ReservationQueue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery
[ https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641732#comment-14641732 ] Hadoop QA commented on YARN-3971: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 35s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 44s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 19s | The applied patch generated 1 new checkstyle issues (total was 31, now 32). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 0s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 52m 58s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 96m 55s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12747193/0002-YARN-3971.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 156f24e | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8670/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8670/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8670/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8670/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8670/console | This message was automatically generated. Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery -- Key: YARN-3971 URL: https://issues.apache.org/jira/browse/YARN-3971 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch Steps to reproduce # Create label x,y # Delete label x,y # Create label x,y add capacity scheduler xml for labels x and y too # Restart RM Both RM will become Standby. Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}} {code} 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in state STARTED; cause: java.io.IOException: Cannot remove label=x, because queue=a1 is using this label. Please remove label on queue before remove the label java.io.IOException: Cannot remove label=x, because queue=a1 is using this label. Please remove label on queue before remove the label at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104) at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118) at org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at
[jira] [Updated] (YARN-3884) RMContainerImpl transition from RESERVED to KILL apphistory status not updated
[ https://issues.apache.org/jira/browse/YARN-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3884: --- Component/s: timelineserver RMContainerImpl transition from RESERVED to KILL apphistory status not updated -- Key: YARN-3884 URL: https://issues.apache.org/jira/browse/YARN-3884 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Environment: Suse11 Sp3 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Attachments: 0001-YARN-3884.patch, Apphistory Container Status.jpg, Elapsed Time.jpg, Test Result-Container status.jpg Setup === 1 NM 3072 16 cores each Steps to reproduce === 1.Submit apps to Queue 1 with 512 mb 1 core 2.Submit apps to Queue 2 with 512 mb and 5 core lots of containers get reserved and unreserved in this case {code} 2015-07-02 20:45:31,169 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e24_1435849994778_0002_01_13 Container Transitioned from NEW to RESERVED 2015-07-02 20:45:31,170 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Reserved container application=application_1435849994778_0002 resource=memory:512, vCores:5 queue=QueueA: capacity=0.4, absoluteCapacity=0.4, usedResources=memory:2560, vCores:21, usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, numContainers=5 usedCapacity=1.6410257 absoluteUsedCapacity=0.65625 used=memory:2560, vCores:21 cluster=memory:6144, vCores:32 2015-07-02 20:45:31,170 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting assigned queue: root.QueueA stats: QueueA: capacity=0.4, absoluteCapacity=0.4, usedResources=memory:3072, vCores:26, usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, numContainers=6 2015-07-02 20:45:31,170 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.96875 absoluteUsedCapacity=0.96875 used=memory:5632, vCores:31 cluster=memory:6144, vCores:32 2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e24_1435849994778_0001_01_14 Container Transitioned from NEW to ALLOCATED 2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf OPERATION=AM Allocated ContainerTARGET=SchedulerApp RESULT=SUCCESS APPID=application_1435849994778_0001 CONTAINERID=container_e24_1435849994778_0001_01_14 2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_e24_1435849994778_0001_01_14 of capacity memory:512, vCores:1 on host host-10-19-92-117:64318, which has 6 containers, memory:3072, vCores:14 used and memory:0, vCores:2 available after allocation 2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: assignedContainer application attempt=appattempt_1435849994778_0001_01 container=Container: [ContainerId: container_e24_1435849994778_0001_01_14, NodeId: host-10-19-92-117:64318, NodeHttpAddress: host-10-19-92-117:65321, Resource: memory:512, vCores:1, Priority: 20, Token: null, ] queue=default: capacity=0.2, absoluteCapacity=0.2, usedResources=memory:2560, vCores:5, usedCapacity=2.0846906, absoluteUsedCapacity=0.4166, numApps=1, numContainers=5 clusterResource=memory:6144, vCores:32 2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting assigned queue: root.default stats: default: capacity=0.2, absoluteCapacity=0.2, usedResources=memory:3072, vCores:6, usedCapacity=2.5016286, absoluteUsedCapacity=0.5, numApps=1, numContainers=6 2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0 absoluteUsedCapacity=1.0 used=memory:6144, vCores:32 cluster=memory:6144, vCores:32 2015-07-02 20:45:32,143 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e24_1435849994778_0001_01_14 Container Transitioned from ALLOCATED to ACQUIRED 2015-07-02 20:45:32,174 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Trying to fulfill reservation for application application_1435849994778_0002 on node: host-10-19-92-143:64318 2015-07-02 20:45:32,174 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Reserved container application=application_1435849994778_0002
[jira] [Created] (YARN-3978) Configurably turn off the saving of container info in Generic AHS
Eric Payne created YARN-3978: Summary: Configurably turn off the saving of container info in Generic AHS Key: YARN-3978 URL: https://issues.apache.org/jira/browse/YARN-3978 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver, yarn Reporter: Eric Payne Assignee: Eric Payne Depending on how each application's metadata is stored, one week's worth of data stored in the Generic Application History Server's database can grow to be almost a terabyte of local disk space. In order to alleviate this, I suggest that there is a need for a configuration option to turn off saving of non-AM container metadata in the GAHS data store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery
[ https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641766#comment-14641766 ] Hadoop QA commented on YARN-3971: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 46s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 48s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 44s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 1s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 52m 20s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 96m 38s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12747203/0003-YARN-3971.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 156f24e | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8671/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8671/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8671/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8671/console | This message was automatically generated. Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery -- Key: YARN-3971 URL: https://issues.apache.org/jira/browse/YARN-3971 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, 0003-YARN-3971.patch Steps to reproduce # Create label x,y # Delete label x,y # Create label x,y add capacity scheduler xml for labels x and y too # Restart RM Both RM will become Standby. Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}} {code} 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in state STARTED; cause: java.io.IOException: Cannot remove label=x, because queue=a1 is using this label. Please remove label on queue before remove the label java.io.IOException: Cannot remove label=x, because queue=a1 is using this label. Please remove label on queue before remove the label at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104) at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118) at org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641743#comment-14641743 ] Bibin A Chundatt commented on YARN-3893: {quote} Instead of checking for exception message in test, can you check for ServiceFailedException {quote} Already the same is verified in may testcases using messages. {quote} Can you add a verification in the test to check whether active services were stopped ? {quote} IMO its not required. Both RM in active state when Admin#transitionToActive failure from refeshAll() -- Key: YARN-3893 URL: https://issues.apache.org/jira/browse/YARN-3893 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml Cases that can cause this. # Capacity scheduler xml is wrongly configured during switch # Refresh ACL failure due to configuration # Refresh User group failure due to configuration Continuously both RM will try to be active {code} dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm1 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm2 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active {code} # Both Web UI active # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3940) Application moveToQueue should check NodeLabel permission
[ https://issues.apache.org/jira/browse/YARN-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641769#comment-14641769 ] Hadoop QA commented on YARN-3940: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 7s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 38s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 40s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 23s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 52m 16s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 88m 51s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.reservation.TestCapacitySchedulerPlanFollower | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12746872/0001-YARN-3940.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 156f24e | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8672/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8672/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8672/console | This message was automatically generated. Application moveToQueue should check NodeLabel permission -- Key: YARN-3940 URL: https://issues.apache.org/jira/browse/YARN-3940 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Attachments: 0001-YARN-3940.patch Configure capacity scheduler Configure node label an submit application {{queue=A Label=X}} Move application to queue {{B}} and x is not having access {code} 2015-07-20 19:46:19,626 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application attempt appattempt_1437385548409_0005_01 released container container_e08_1437385548409_0005_01_02 on node: host: host-10-19-92-117:64318 #containers=1 available=memory:2560, vCores:15 used=memory:512, vCores:1 with event: KILL 2015-07-20 19:46:20,970 WARN org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Invalid resource ask by application appattempt_1437385548409_0005_01 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, queue=b1 doesn't have permission to access all labels in resource request. labelExpression of resource request=x. Queue labels=y at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:250) at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:106) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:515) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
[jira] [Commented] (YARN-3978) Configurably turn off the saving of container info in Generic AHS
[ https://issues.apache.org/jira/browse/YARN-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641768#comment-14641768 ] Eric Payne commented on YARN-3978: -- Use Case: A user launches an application on a secured cluster that runs for some time and then fails within the AM (perhaps due to OOM in the AM), leaving no history in the job history server. The user doesn't notice that the job has failed until after the application has dropped off of the RM's application store. At this point, if no information was stored in the Generic Application History Service, a user must rely on a priviledged system administrator to access the AM logs for them. It is desirable to activate the Generic Application History service within the timeline server so that users can access their application's information even after the RM has forgotten about their application. This app information should be kept in the GAHS for 1 week, as is done, for example, for logs in the job history server. One way that the Generic AHS stores metadata about an application is in an Entity levelDB. This includes information about each container for each application. Based on my analysis, the levelDB size grows by at least 2500 bytes per container (uncompressed). This is a conservative estimate as the size could be much bigger based on the amount of diagnostic information associated with failed containers. On very large and busy clusters, the amount needed on the timeline server's local disk would be between 0.6 TB and 1.0 TB (uncompressed). Even if we assume 90% compression, that's still between 60 GB and 100 GB that will be needed on the local disk. In addition to this, between 80 GB and 143 GB of metadata (uncopressed) will need to be cleaned up every day from the levelDB, which will delay other processing in the timeline server. The proposal of this JIRA is to add a configuration property that enables/disables whether or not the GAHS stores container information in the levelDB. Whith this change, I estimate that the local disk usage would be about 5700 bytes per job, or about 10 GB (uncompressed) per week. Additionally, the daily cleanup load would only be about 1.5 GB per day. Configurably turn off the saving of container info in Generic AHS - Key: YARN-3978 URL: https://issues.apache.org/jira/browse/YARN-3978 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver, yarn Reporter: Eric Payne Assignee: Eric Payne Depending on how each application's metadata is stored, one week's worth of data stored in the Generic Application History Server's database can grow to be almost a terabyte of local disk space. In order to alleviate this, I suggest that there is a need for a configuration option to turn off saving of non-AM container metadata in the GAHS data store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.
[ https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641770#comment-14641770 ] zhihai xu commented on YARN-3925: - thanks [~jlowe] for reviewing and committing the patch! ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks. - Key: YARN-3925 URL: https://issues.apache.org/jira/browse/YARN-3925 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.1 Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Fix For: 2.7.2 Attachments: YARN-3925.000.patch, YARN-3925.001.patch ContainerLogsUtils#getContainerLogFile fails to read files from full disks. {{getContainerLogFile}} depends on {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but {{LocalDirsHandlerService#getLogPathToRead}} calls {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not include full disks in {{LocalDirsHandlerService#checkDirs}}: {code} Configuration conf = getConfig(); ListString localDirs = getLocalDirs(); conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS, localDirs.toArray(new String[localDirs.size()])); ListString logDirs = getLogDirs(); conf.setStrings(YarnConfiguration.NM_LOG_DIRS, logDirs.toArray(new String[logDirs.size()])); {code} ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and ContainerLogsPage.ContainersLogsBlock#render to read the log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp
[ https://issues.apache.org/jira/browse/YARN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641516#comment-14641516 ] Hudson commented on YARN-3026: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #267 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/267/]) YARN-3026. Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp. Contributed by Wangda Tan (jianhe: rev 83fe34ac0896cee0918bbfad7bd51231e4aec39b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp --- Key: YARN-3026 URL: https://issues.apache.org/jira/browse/YARN-3026 Project: Hadoop YARN Issue Type: Task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-3026.1.patch, YARN-3026.2.patch, YARN-3026.3.patch, YARN-3026.4.patch, YARN-3026.5.patch, YARN-3026.6.patch Have a discussion with [~vinodkv] and [~jianhe]: In existing Capacity Scheduler, all allocation logics of and under LeafQueue are located in LeafQueue.java in implementation. To make a cleaner scope of LeafQueue, we'd better move some of them to FiCaSchedulerApp. Ideal scope of LeafQueue should be: when a LeafQueue receives some resources from ParentQueue (like 15% of cluster resource), and it distributes resources to children apps, and it should be agnostic to internal logic of children apps (like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how application allocating container from given resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3967) Fetch the application report from the AHS if the RM does not know about it
[ https://issues.apache.org/jira/browse/YARN-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641508#comment-14641508 ] Hudson commented on YARN-3967: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #267 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/267/]) YARN-3967. Fetch the application report from the AHS if the RM does not (xgong: rev fbd6063269221ec25834684477f434e19f0b66af) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/AppReportFetcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestAppReportFetcher.java Fetch the application report from the AHS if the RM does not know about it -- Key: YARN-3967 URL: https://issues.apache.org/jira/browse/YARN-3967 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Mit Desai Assignee: Mit Desai Fix For: 2.7.2 Attachments: YARN-3967.1.patch, YARN-3967.2.patch, YARN-3967.3.patch If the application history service has been enabled and RM has forgotten anout an application, try and fetch the app report form the AHS. On larger clusters, the RM can forget about the applications in about 30 minutes. The proxy url generated during the job submission will try to fetch the app report from the RM and will fail to get anything from there. If the app is not found in the RM, we will need to get the application report from the Application History Server (if it is enabled) to see if we can get any information on that application before throwing an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3957) FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500
[ https://issues.apache.org/jira/browse/YARN-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641512#comment-14641512 ] Hudson commented on YARN-3957: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #267 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/267/]) YARN-3957. FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500. (Anubhav Dhoot via kasha) (kasha: rev d19d18775368f5aaa254881165acc1299837072b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/TestFairSchedulerQueueInfo.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500 Key: YARN-3957 URL: https://issues.apache.org/jira/browse/YARN-3957 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.8.0 Attachments: YARN-3957.001.patch, YARN-3957.002.patch There is a NPE causing the webpage of http://localhost:23188/cluster/scheduler to return a 500. This seems to be because of YARN-2336 setting null for childQueues and then getChildQueues hits the NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.
[ https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641510#comment-14641510 ] Hudson commented on YARN-3925: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #267 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/267/]) YARN-3925. ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks. Contributed by zhihai xu (jlowe: rev ff9c13e0a739bb13115167dc661b6a16b2ed2c04) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java * hadoop-yarn-project/CHANGES.txt ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks. - Key: YARN-3925 URL: https://issues.apache.org/jira/browse/YARN-3925 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.1 Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Fix For: 2.7.2 Attachments: YARN-3925.000.patch, YARN-3925.001.patch ContainerLogsUtils#getContainerLogFile fails to read files from full disks. {{getContainerLogFile}} depends on {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but {{LocalDirsHandlerService#getLogPathToRead}} calls {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not include full disks in {{LocalDirsHandlerService#checkDirs}}: {code} Configuration conf = getConfig(); ListString localDirs = getLocalDirs(); conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS, localDirs.toArray(new String[localDirs.size()])); ListString logDirs = getLogDirs(); conf.setStrings(YarnConfiguration.NM_LOG_DIRS, logDirs.toArray(new String[logDirs.size()])); {code} ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and ContainerLogsPage.ContainersLogsBlock#render to read the log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.
[ https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641524#comment-14641524 ] Hudson commented on YARN-3925: -- FAILURE: Integrated in Hadoop-Yarn-trunk #997 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/997/]) YARN-3925. ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks. Contributed by zhihai xu (jlowe: rev ff9c13e0a739bb13115167dc661b6a16b2ed2c04) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java * hadoop-yarn-project/CHANGES.txt ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks. - Key: YARN-3925 URL: https://issues.apache.org/jira/browse/YARN-3925 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.1 Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Fix For: 2.7.2 Attachments: YARN-3925.000.patch, YARN-3925.001.patch ContainerLogsUtils#getContainerLogFile fails to read files from full disks. {{getContainerLogFile}} depends on {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but {{LocalDirsHandlerService#getLogPathToRead}} calls {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not include full disks in {{LocalDirsHandlerService#checkDirs}}: {code} Configuration conf = getConfig(); ListString localDirs = getLocalDirs(); conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS, localDirs.toArray(new String[localDirs.size()])); ListString logDirs = getLogDirs(); conf.setStrings(YarnConfiguration.NM_LOG_DIRS, logDirs.toArray(new String[logDirs.size()])); {code} ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and ContainerLogsPage.ContainersLogsBlock#render to read the log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3973) Recent changes to application priority management break reservation system from YARN-1051
[ https://issues.apache.org/jira/browse/YARN-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641531#comment-14641531 ] Hudson commented on YARN-3973: -- FAILURE: Integrated in Hadoop-Yarn-trunk #997 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/997/]) YARN-3973. Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java Recent changes to application priority management break reservation system from YARN-1051 - Key: YARN-3973 URL: https://issues.apache.org/jira/browse/YARN-3973 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.8.0 Attachments: YARN-3973.1.patch, YARN-3973.patch Recent changes in trunk (I think YARN-2003) produce NPE for reservation system when application is submitted to a ReservationQueue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.
[ https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641525#comment-14641525 ] Hudson commented on YARN-1051: -- FAILURE: Integrated in Hadoop-Yarn-trunk #997 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/997/]) YARN-3973. Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java YARN Admission Control/Planner: enhancing the resource allocation model with time. -- Key: YARN-1051 URL: https://issues.apache.org/jira/browse/YARN-1051 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager, scheduler Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.6.0 Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, techreport.pdf In this umbrella JIRA we propose to extend the YARN RM to handle time explicitly, allowing users to reserve capacity over time. This is an important step towards SLAs, long-running services, workflows, and helps for gang scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3967) Fetch the application report from the AHS if the RM does not know about it
[ https://issues.apache.org/jira/browse/YARN-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641522#comment-14641522 ] Hudson commented on YARN-3967: -- FAILURE: Integrated in Hadoop-Yarn-trunk #997 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/997/]) YARN-3967. Fetch the application report from the AHS if the RM does not (xgong: rev fbd6063269221ec25834684477f434e19f0b66af) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/AppReportFetcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestAppReportFetcher.java * hadoop-yarn-project/CHANGES.txt Fetch the application report from the AHS if the RM does not know about it -- Key: YARN-3967 URL: https://issues.apache.org/jira/browse/YARN-3967 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Mit Desai Assignee: Mit Desai Fix For: 2.7.2 Attachments: YARN-3967.1.patch, YARN-3967.2.patch, YARN-3967.3.patch If the application history service has been enabled and RM has forgotten anout an application, try and fetch the app report form the AHS. On larger clusters, the RM can forget about the applications in about 30 minutes. The proxy url generated during the job submission will try to fetch the app report from the RM and will fail to get anything from there. If the app is not found in the RM, we will need to get the application report from the Application History Server (if it is enabled) to see if we can get any information on that application before throwing an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3957) FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500
[ https://issues.apache.org/jira/browse/YARN-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641526#comment-14641526 ] Hudson commented on YARN-3957: -- FAILURE: Integrated in Hadoop-Yarn-trunk #997 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/997/]) YARN-3957. FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500. (Anubhav Dhoot via kasha) (kasha: rev d19d18775368f5aaa254881165acc1299837072b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/TestFairSchedulerQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java * hadoop-yarn-project/CHANGES.txt FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500 Key: YARN-3957 URL: https://issues.apache.org/jira/browse/YARN-3957 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.8.0 Attachments: YARN-3957.001.patch, YARN-3957.002.patch There is a NPE causing the webpage of http://localhost:23188/cluster/scheduler to return a 500. This seems to be because of YARN-2336 setting null for childQueues and then getChildQueues hits the NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp
[ https://issues.apache.org/jira/browse/YARN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641530#comment-14641530 ] Hudson commented on YARN-3026: -- FAILURE: Integrated in Hadoop-Yarn-trunk #997 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/997/]) YARN-3026. Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp. Contributed by Wangda Tan (jianhe: rev 83fe34ac0896cee0918bbfad7bd51231e4aec39b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp --- Key: YARN-3026 URL: https://issues.apache.org/jira/browse/YARN-3026 Project: Hadoop YARN Issue Type: Task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-3026.1.patch, YARN-3026.2.patch, YARN-3026.3.patch, YARN-3026.4.patch, YARN-3026.5.patch, YARN-3026.6.patch Have a discussion with [~vinodkv] and [~jianhe]: In existing Capacity Scheduler, all allocation logics of and under LeafQueue are located in LeafQueue.java in implementation. To make a cleaner scope of LeafQueue, we'd better move some of them to FiCaSchedulerApp. Ideal scope of LeafQueue should be: when a LeafQueue receives some resources from ParentQueue (like 15% of cluster resource), and it distributes resources to children apps, and it should be agnostic to internal logic of children apps (like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how application allocating container from given resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3969) Allow jobs to be submitted to reservation that is active but does not have any allocations
[ https://issues.apache.org/jira/browse/YARN-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641519#comment-14641519 ] Hudson commented on YARN-3969: -- FAILURE: Integrated in Hadoop-Yarn-trunk #997 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/997/]) YARN-3969. Allow jobs to be submitted to reservation that is active but does not have any allocations. (subru via curino) (Carlo Curino: rev 0fcb4a8cf2add3f112907ff4e833e2f04947b53e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservationQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ReservationQueue.java YARN-3969. Updating CHANGES.txt to reflect the correct set of branches where this is committed (Carlo Curino: rev fc42fa8ae3bc9d6d055090a7bb5e6f0c5972fcff) * hadoop-yarn-project/CHANGES.txt Allow jobs to be submitted to reservation that is active but does not have any allocations -- Key: YARN-3969 URL: https://issues.apache.org/jira/browse/YARN-3969 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler, resourcemanager Reporter: Subru Krishnan Assignee: Subru Krishnan Fix For: 2.8.0, 2.7.2 Attachments: YARN-3969-v1.patch, YARN-3969-v2.patch YARN-1051 introduces the notion of reserving resources prior to job submission. A reservation is active from its arrival time to deadline but in the interim there can be instances of time when it does not have any resources allocated. We reject jobs that are submitted when the reservation allocation is zero. Instead we should accept queue the jobs till the resources become available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.
[ https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641511#comment-14641511 ] Hudson commented on YARN-1051: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #267 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/267/]) YARN-3973. Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java YARN Admission Control/Planner: enhancing the resource allocation model with time. -- Key: YARN-1051 URL: https://issues.apache.org/jira/browse/YARN-1051 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager, scheduler Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.6.0 Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, techreport.pdf In this umbrella JIRA we propose to extend the YARN RM to handle time explicitly, allowing users to reserve capacity over time. This is an important step towards SLAs, long-running services, workflows, and helps for gang scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3969) Allow jobs to be submitted to reservation that is active but does not have any allocations
[ https://issues.apache.org/jira/browse/YARN-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641505#comment-14641505 ] Hudson commented on YARN-3969: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #267 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/267/]) YARN-3969. Allow jobs to be submitted to reservation that is active but does not have any allocations. (subru via curino) (Carlo Curino: rev 0fcb4a8cf2add3f112907ff4e833e2f04947b53e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservationQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ReservationQueue.java YARN-3969. Updating CHANGES.txt to reflect the correct set of branches where this is committed (Carlo Curino: rev fc42fa8ae3bc9d6d055090a7bb5e6f0c5972fcff) * hadoop-yarn-project/CHANGES.txt Allow jobs to be submitted to reservation that is active but does not have any allocations -- Key: YARN-3969 URL: https://issues.apache.org/jira/browse/YARN-3969 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler, resourcemanager Reporter: Subru Krishnan Assignee: Subru Krishnan Fix For: 2.8.0, 2.7.2 Attachments: YARN-3969-v1.patch, YARN-3969-v2.patch YARN-1051 introduces the notion of reserving resources prior to job submission. A reservation is active from its arrival time to deadline but in the interim there can be instances of time when it does not have any resources allocated. We reject jobs that are submitted when the reservation allocation is zero. Instead we should accept queue the jobs till the resources become available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3973) Recent changes to application priority management break reservation system from YARN-1051
[ https://issues.apache.org/jira/browse/YARN-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641517#comment-14641517 ] Hudson commented on YARN-3973: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #267 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/267/]) YARN-3973. Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java Recent changes to application priority management break reservation system from YARN-1051 - Key: YARN-3973 URL: https://issues.apache.org/jira/browse/YARN-3973 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.8.0 Attachments: YARN-3973.1.patch, YARN-3973.patch Recent changes in trunk (I think YARN-2003) produce NPE for reservation system when application is submitted to a ReservationQueue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641541#comment-14641541 ] Varun Saxena commented on YARN-3528: [~brahmareddy], you have missed some test class in your latest patch. For instance, TestNodeManagerShutdown. Tests with 12345 as hard-coded port break jenkins - Key: YARN-3528 URL: https://issues.apache.org/jira/browse/YARN-3528 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Environment: ASF Jenkins Reporter: Steve Loughran Assignee: Brahma Reddy Battula Priority: Blocker Labels: test Attachments: YARN-3528-002.patch, YARN-3528-003.patch, YARN-3528-004.patch, YARN-3528.patch A lot of the YARN tests have hard-coded the port 12345 for their services to come up on. This makes it impossible to have scheduled or precommit tests to run consistently on the ASF jenkins hosts. Instead the tests fail regularly and appear to get ignored completely. A quick grep of 12345 shows up many places in the test suite where this practise has developed. * All {{BaseContainerManagerTest}} subclasses * {{TestNodeManagerShutdown}} * {{TestContainerManager}} + others This needs to be addressed through portscanning and dynamic port allocation. Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3958) TestYarnConfigurationFields should be moved to hadoop-yarn-api
[ https://issues.apache.org/jira/browse/YARN-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641558#comment-14641558 ] Varun Saxena commented on YARN-3958: [~ajisakaa], Checked the test. You are correct. But I think this would be a major change. I have a concern that there might be some projects which might have induced dependency in their pom on hadoop-yarn-api because they want to use YarnConfiguration class. Pls note hadoop-yarn-api does not have dependency on hadoop-yarn-common in its pom.xml Change can be made but this should go in branch-2 then ? Moreover, realistically will somebody add a YARN related config in yarn-default.xml but not add it in YarnConfiguration class ? I think unlikely. The reverse happens far more frequently. So in branch-2 we can just move this test to hadoop-yarn-api and in trunk, move YarnConfiguration to hadoop-yarn-common. Thoughts ? TestYarnConfigurationFields should be moved to hadoop-yarn-api -- Key: YARN-3958 URL: https://issues.apache.org/jira/browse/YARN-3958 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3958.01.patch, YARN-3958.02.patch, YARN-3958.03.patch Currently TestYarnConfigurationFields is present in hadoop-yarn-common. The test is for checking whether all the configurations declared in YarnConfiguration exist in yarn-default.xml or not. But as YarnConfiguration is in hadoop-yarn-api, if somebody changes this file, it is not necessary that this test will be run. So if the developer misses to update yarn-default.xml and patch is committed, it will lead to unnecessary test failures after commit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)