[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251378#comment-14251378 ] Hadoop QA commented on YARN-2933: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687982/YARN-2933-1.patch against trunk revision 1050d42. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 14 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6143//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6143//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6143//console This message is automatically generated. Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Attachments: YARN-2933-1.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2340: - Attachment: 0001-YARN-2340.patch NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251451#comment-14251451 ] Hadoop QA commented on YARN-2340: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687993/0001-YARN-2340.patch against trunk revision 1050d42. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 14 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6144//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6144//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6144//console This message is automatically generated. NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251455#comment-14251455 ] Rohith commented on YARN-2340: -- It looks failed tests is random. In my env, it is running successfully. NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby -- Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2340.patch While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2189) Admin service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251462#comment-14251462 ] Hudson commented on YARN-2189: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/]) YARN-2189. Admin service for cache manager. Addendum to sort entries (kasha: rev 9b4ba409c6683c52c8e931809fc47b593bb90b48) * hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh * hadoop-yarn-project/hadoop-yarn/bin/yarn Admin service for cache manager --- Key: YARN-2189 URL: https://issues.apache.org/jira/browse/YARN-2189 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: YARN-1492-trunk-addendum.patch, YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch Implement the admin service for the shared cache manager. This service is responsible for handling administrative commands such as manually running a cleaner task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251466#comment-14251466 ] Hudson commented on YARN-2964: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/]) YARN-2964. FSLeafQueue#assignContainer - document the reason for using both write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev f2d150ea1205b77a75c347ace667b4cd060aaf40) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Attachments: YARN-2964.1.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands
[ https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251467#comment-14251467 ] Hudson commented on YARN-2972: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/]) YARN-2972. DelegationTokenRenewer thread pool never expands. Contributed by Jason Lowe (junping_du: rev 2b4b0e8847048850206f091c6870a02e08cfe836) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java DelegationTokenRenewer thread pool never expands Key: YARN-2972 URL: https://issues.apache.org/jira/browse/YARN-2972 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 2.7.0 Attachments: YARN-2972.001.patch DelegationTokenRenewer uses a thread pool to manage token renewals. The number of threads is configurable, but unfortunately the pool never expands beyond the hardcoded initial 5 threads because we are using an unbounded LinkedBlockingQueue. ThreadPoolExecutor only grows the thread pool beyond the core size when the specified queue is full. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251463#comment-14251463 ] Hudson commented on YARN-1492: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/]) YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) (kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Priority: Critical Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf, shared_cache_design_v6.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2944) InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance
[ https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251471#comment-14251471 ] Hudson commented on YARN-2944: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/]) YARN-2944. InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance. (Chris Trezzo via kasha) (kasha: rev a1bd1409649da96c9fde4a9f9398d7711bc6c281) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestSharedCacheUploaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStoreBaseTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/DummyAppChecker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestClientSCMProtocolService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance - Key: YARN-2944 URL: https://issues.apache.org/jira/browse/YARN-2944 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Priority: Minor Fix For: 2.7.0 Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, YARN-2944-trunk-v3.patch Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create the SCMStore service. Unfortunately the SCMStore class does not have a 0-argument constructor. On startup, the SCM fails with the following: {noformat} 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager failed in state INITED; cause: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156) Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at java.lang.Class.getConstructor0(Class.java:2763) at java.lang.Class.getDeclaredConstructor(Class.java:2021) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125) ... 4 more 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting SharedCacheManager java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156) Caused by: java.lang.NoSuchMethodException:
[jira] [Commented] (YARN-2203) Web UI for cache manager
[ https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251476#comment-14251476 ] Hudson commented on YARN-2203: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/]) YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) (kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java Web UI for cache manager Key: YARN-2203 URL: https://issues.apache.org/jira/browse/YARN-2203 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, YARN-2203-trunk-v5.patch Implement the web server and web ui for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands
[ https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251482#comment-14251482 ] Hudson commented on YARN-2972: -- FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/779/]) YARN-2972. DelegationTokenRenewer thread pool never expands. Contributed by Jason Lowe (junping_du: rev 2b4b0e8847048850206f091c6870a02e08cfe836) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * hadoop-yarn-project/CHANGES.txt DelegationTokenRenewer thread pool never expands Key: YARN-2972 URL: https://issues.apache.org/jira/browse/YARN-2972 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 2.7.0 Attachments: YARN-2972.001.patch DelegationTokenRenewer uses a thread pool to manage token renewals. The number of threads is configurable, but unfortunately the pool never expands beyond the hardcoded initial 5 threads because we are using an unbounded LinkedBlockingQueue. ThreadPoolExecutor only grows the thread pool beyond the core size when the specified queue is full. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251478#comment-14251478 ] Hudson commented on YARN-1492: -- FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/779/]) YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) (kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Priority: Critical Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf, shared_cache_design_v6.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2203) Web UI for cache manager
[ https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251490#comment-14251490 ] Hudson commented on YARN-2203: -- FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/779/]) YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) (kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java Web UI for cache manager Key: YARN-2203 URL: https://issues.apache.org/jira/browse/YARN-2203 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, YARN-2203-trunk-v5.patch Implement the web server and web ui for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2944) InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance
[ https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251486#comment-14251486 ] Hudson commented on YARN-2944: -- FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/779/]) YARN-2944. InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance. (Chris Trezzo via kasha) (kasha: rev a1bd1409649da96c9fde4a9f9398d7711bc6c281) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestClientSCMProtocolService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestSharedCacheUploaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/DummyAppChecker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStoreBaseTest.java InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance - Key: YARN-2944 URL: https://issues.apache.org/jira/browse/YARN-2944 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Priority: Minor Fix For: 2.7.0 Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, YARN-2944-trunk-v3.patch Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create the SCMStore service. Unfortunately the SCMStore class does not have a 0-argument constructor. On startup, the SCM fails with the following: {noformat} 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager failed in state INITED; cause: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156) Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at java.lang.Class.getConstructor0(Class.java:2763) at java.lang.Class.getDeclaredConstructor(Class.java:2021) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125) ... 4 more 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting SharedCacheManager java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156) Caused by: java.lang.NoSuchMethodException:
[jira] [Commented] (YARN-2189) Admin service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251477#comment-14251477 ] Hudson commented on YARN-2189: -- FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/779/]) YARN-2189. Admin service for cache manager. Addendum to sort entries (kasha: rev 9b4ba409c6683c52c8e931809fc47b593bb90b48) * hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh * hadoop-yarn-project/hadoop-yarn/bin/yarn Admin service for cache manager --- Key: YARN-2189 URL: https://issues.apache.org/jira/browse/YARN-2189 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: YARN-1492-trunk-addendum.patch, YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch Implement the admin service for the shared cache manager. This service is responsible for handling administrative commands such as manually running a cleaner task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251481#comment-14251481 ] Hudson commented on YARN-2964: -- FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/779/]) YARN-2964. FSLeafQueue#assignContainer - document the reason for using both write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev f2d150ea1205b77a75c347ace667b4cd060aaf40) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/CHANGES.txt RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Attachments: YARN-2964.1.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2949) Add documentation for CGroups
[ https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251660#comment-14251660 ] Hadoop QA commented on YARN-2949: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688030/apache-yarn-2949.1.patch against trunk revision 1050d42. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6145//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6145//console This message is automatically generated. Add documentation for CGroups - Key: YARN-2949 URL: https://issues.apache.org/jira/browse/YARN-2949 Project: Hadoop YARN Issue Type: Task Components: documentation, nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, apache-yarn-2949.1.patch A bunch of changes have gone into the NodeManager to allow greater use of CGroups. It would be good to have a single page that documents how to setup CGroups and the controls available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2949) Add documentation for CGroups
[ https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2949: Attachment: (was: NodeManagerCgroups.html) Add documentation for CGroups - Key: YARN-2949 URL: https://issues.apache.org/jira/browse/YARN-2949 Project: Hadoop YARN Issue Type: Task Components: documentation, nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2949.0.patch, apache-yarn-2949.1.patch A bunch of changes have gone into the NodeManager to allow greater use of CGroups. It would be good to have a single page that documents how to setup CGroups and the controls available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2949) Add documentation for CGroups
[ https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2949: Attachment: NodeManagerCgroups.html Uploaded the html generated by the latest patch. Add documentation for CGroups - Key: YARN-2949 URL: https://issues.apache.org/jira/browse/YARN-2949 Project: Hadoop YARN Issue Type: Task Components: documentation, nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, apache-yarn-2949.1.patch A bunch of changes have gone into the NodeManager to allow greater use of CGroups. It would be good to have a single page that documents how to setup CGroups and the controls available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2949) Add documentation for CGroups
[ https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251670#comment-14251670 ] Junping Du commented on YARN-2949: -- +1. v2 patch LGTM. Will commit it shortly. Add documentation for CGroups - Key: YARN-2949 URL: https://issues.apache.org/jira/browse/YARN-2949 Project: Hadoop YARN Issue Type: Task Components: documentation, nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, apache-yarn-2949.1.patch A bunch of changes have gone into the NodeManager to allow greater use of CGroups. It would be good to have a single page that documents how to setup CGroups and the controls available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2949) Add documentation for CGroups
[ https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251682#comment-14251682 ] Hudson commented on YARN-2949: -- FAILURE: Integrated in Hadoop-trunk-Commit #6746 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6746/]) YARN-2949. Add documentation for CGroups. (Contributed by Varun Vasudev) (junping_du: rev 389f881d423c1f7c2bb90ff521e59eb8c7d26214) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerCgroups.apt.vm * hadoop-project/src/site/site.xml * hadoop-yarn-project/CHANGES.txt Add documentation for CGroups - Key: YARN-2949 URL: https://issues.apache.org/jira/browse/YARN-2949 Project: Hadoop YARN Issue Type: Task Components: documentation, nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.7.0 Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, apache-yarn-2949.1.patch A bunch of changes have gone into the NodeManager to allow greater use of CGroups. It would be good to have a single page that documents how to setup CGroups and the controls available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2949) Add documentation for CGroups
[ https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251697#comment-14251697 ] Hadoop QA commented on YARN-2949: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688030/apache-yarn-2949.1.patch against trunk revision 1050d42. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6146//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6146//console This message is automatically generated. Add documentation for CGroups - Key: YARN-2949 URL: https://issues.apache.org/jira/browse/YARN-2949 Project: Hadoop YARN Issue Type: Task Components: documentation, nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.7.0 Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, apache-yarn-2949.1.patch A bunch of changes have gone into the NodeManager to allow greater use of CGroups. It would be good to have a single page that documents how to setup CGroups and the controls available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2189) Admin service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251704#comment-14251704 ] Hudson commented on YARN-2189: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/]) YARN-2189. Admin service for cache manager. Addendum to sort entries (kasha: rev 9b4ba409c6683c52c8e931809fc47b593bb90b48) * hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh * hadoop-yarn-project/hadoop-yarn/bin/yarn Admin service for cache manager --- Key: YARN-2189 URL: https://issues.apache.org/jira/browse/YARN-2189 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: YARN-1492-trunk-addendum.patch, YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch Implement the admin service for the shared cache manager. This service is responsible for handling administrative commands such as manually running a cleaner task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251705#comment-14251705 ] Hudson commented on YARN-1492: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/]) YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) (kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Priority: Critical Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf, shared_cache_design_v6.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251708#comment-14251708 ] Hudson commented on YARN-2964: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/]) YARN-2964. FSLeafQueue#assignContainer - document the reason for using both write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev f2d150ea1205b77a75c347ace667b4cd060aaf40) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Attachments: YARN-2964.1.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2203) Web UI for cache manager
[ https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251717#comment-14251717 ] Hudson commented on YARN-2203: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/]) YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) (kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml Web UI for cache manager Key: YARN-2203 URL: https://issues.apache.org/jira/browse/YARN-2203 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, YARN-2203-trunk-v5.patch Implement the web server and web ui for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands
[ https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251723#comment-14251723 ] Hudson commented on YARN-2972: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/]) YARN-2972. DelegationTokenRenewer thread pool never expands. Contributed by Jason Lowe (junping_du: rev 2b4b0e8847048850206f091c6870a02e08cfe836) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java DelegationTokenRenewer thread pool never expands Key: YARN-2972 URL: https://issues.apache.org/jira/browse/YARN-2972 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 2.7.0 Attachments: YARN-2972.001.patch DelegationTokenRenewer uses a thread pool to manage token renewals. The number of threads is configurable, but unfortunately the pool never expands beyond the hardcoded initial 5 threads because we are using an unbounded LinkedBlockingQueue. ThreadPoolExecutor only grows the thread pool beyond the core size when the specified queue is full. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251719#comment-14251719 ] Hudson commented on YARN-1492: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/]) YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) (kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Priority: Critical Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf, shared_cache_design_v6.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands
[ https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251709#comment-14251709 ] Hudson commented on YARN-2972: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/]) YARN-2972. DelegationTokenRenewer thread pool never expands. Contributed by Jason Lowe (junping_du: rev 2b4b0e8847048850206f091c6870a02e08cfe836) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * hadoop-yarn-project/CHANGES.txt DelegationTokenRenewer thread pool never expands Key: YARN-2972 URL: https://issues.apache.org/jira/browse/YARN-2972 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 2.7.0 Attachments: YARN-2972.001.patch DelegationTokenRenewer uses a thread pool to manage token renewals. The number of threads is configurable, but unfortunately the pool never expands beyond the hardcoded initial 5 threads because we are using an unbounded LinkedBlockingQueue. ThreadPoolExecutor only grows the thread pool beyond the core size when the specified queue is full. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251722#comment-14251722 ] Hudson commented on YARN-2964: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/]) YARN-2964. FSLeafQueue#assignContainer - document the reason for using both write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev f2d150ea1205b77a75c347ace667b4cd060aaf40) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/CHANGES.txt RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Attachments: YARN-2964.1.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2944) InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance
[ https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251727#comment-14251727 ] Hudson commented on YARN-2944: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/]) YARN-2944. InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance. (Chris Trezzo via kasha) (kasha: rev a1bd1409649da96c9fde4a9f9398d7711bc6c281) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStoreBaseTest.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestClientSCMProtocolService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestSharedCacheUploaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/DummyAppChecker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance - Key: YARN-2944 URL: https://issues.apache.org/jira/browse/YARN-2944 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Priority: Minor Fix For: 2.7.0 Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, YARN-2944-trunk-v3.patch Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create the SCMStore service. Unfortunately the SCMStore class does not have a 0-argument constructor. On startup, the SCM fails with the following: {noformat} 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager failed in state INITED; cause: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156) Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at java.lang.Class.getConstructor0(Class.java:2763) at java.lang.Class.getDeclaredConstructor(Class.java:2021) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125) ... 4 more 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting SharedCacheManager java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156) Caused by: java.lang.NoSuchMethodException:
[jira] [Commented] (YARN-2944) InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance
[ https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251713#comment-14251713 ] Hudson commented on YARN-2944: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/]) YARN-2944. InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance. (Chris Trezzo via kasha) (kasha: rev a1bd1409649da96c9fde4a9f9398d7711bc6c281) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStoreBaseTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/DummyAppChecker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestSharedCacheUploaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestClientSCMProtocolService.java * hadoop-yarn-project/CHANGES.txt InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance - Key: YARN-2944 URL: https://issues.apache.org/jira/browse/YARN-2944 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Priority: Minor Fix For: 2.7.0 Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, YARN-2944-trunk-v3.patch Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create the SCMStore service. Unfortunately the SCMStore class does not have a 0-argument constructor. On startup, the SCM fails with the following: {noformat} 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager failed in state INITED; cause: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156) Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at java.lang.Class.getConstructor0(Class.java:2763) at java.lang.Class.getDeclaredConstructor(Class.java:2021) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125) ... 4 more 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting SharedCacheManager java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156) Caused by: java.lang.NoSuchMethodException:
[jira] [Commented] (YARN-2189) Admin service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251718#comment-14251718 ] Hudson commented on YARN-2189: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/]) YARN-2189. Admin service for cache manager. Addendum to sort entries (kasha: rev 9b4ba409c6683c52c8e931809fc47b593bb90b48) * hadoop-yarn-project/hadoop-yarn/bin/yarn * hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh Admin service for cache manager --- Key: YARN-2189 URL: https://issues.apache.org/jira/browse/YARN-2189 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: YARN-1492-trunk-addendum.patch, YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch Implement the admin service for the shared cache manager. This service is responsible for handling administrative commands such as manually running a cleaner task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2203) Web UI for cache manager
[ https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251731#comment-14251731 ] Hudson commented on YARN-2203: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/]) YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) (kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java Web UI for cache manager Key: YARN-2203 URL: https://issues.apache.org/jira/browse/YARN-2203 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, YARN-2203-trunk-v5.patch Implement the web server and web ui for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251763#comment-14251763 ] Hudson commented on YARN-1492: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/]) YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) (kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Priority: Critical Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf, shared_cache_design_v6.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2944) InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance
[ https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251771#comment-14251771 ] Hudson commented on YARN-2944: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/]) YARN-2944. InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance. (Chris Trezzo via kasha) (kasha: rev a1bd1409649da96c9fde4a9f9398d7711bc6c281) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStoreBaseTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestSharedCacheUploaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestClientSCMProtocolService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/DummyAppChecker.java InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance - Key: YARN-2944 URL: https://issues.apache.org/jira/browse/YARN-2944 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Priority: Minor Fix For: 2.7.0 Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, YARN-2944-trunk-v3.patch Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create the SCMStore service. Unfortunately the SCMStore class does not have a 0-argument constructor. On startup, the SCM fails with the following: {noformat} 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager failed in state INITED; cause: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156) Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at java.lang.Class.getConstructor0(Class.java:2763) at java.lang.Class.getDeclaredConstructor(Class.java:2021) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125) ... 4 more 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting SharedCacheManager java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156) Caused by: java.lang.NoSuchMethodException:
[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands
[ https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251767#comment-14251767 ] Hudson commented on YARN-2972: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/]) YARN-2972. DelegationTokenRenewer thread pool never expands. Contributed by Jason Lowe (junping_du: rev 2b4b0e8847048850206f091c6870a02e08cfe836) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java DelegationTokenRenewer thread pool never expands Key: YARN-2972 URL: https://issues.apache.org/jira/browse/YARN-2972 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 2.7.0 Attachments: YARN-2972.001.patch DelegationTokenRenewer uses a thread pool to manage token renewals. The number of threads is configurable, but unfortunately the pool never expands beyond the hardcoded initial 5 threads because we are using an unbounded LinkedBlockingQueue. ThreadPoolExecutor only grows the thread pool beyond the core size when the specified queue is full. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251766#comment-14251766 ] Hudson commented on YARN-2964: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/]) YARN-2964. FSLeafQueue#assignContainer - document the reason for using both write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev f2d150ea1205b77a75c347ace667b4cd060aaf40) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/CHANGES.txt RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Attachments: YARN-2964.1.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2189) Admin service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251762#comment-14251762 ] Hudson commented on YARN-2189: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/]) YARN-2189. Admin service for cache manager. Addendum to sort entries (kasha: rev 9b4ba409c6683c52c8e931809fc47b593bb90b48) * hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh * hadoop-yarn-project/hadoop-yarn/bin/yarn Admin service for cache manager --- Key: YARN-2189 URL: https://issues.apache.org/jira/browse/YARN-2189 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: YARN-1492-trunk-addendum.patch, YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch Implement the admin service for the shared cache manager. This service is responsible for handling administrative commands such as manually running a cleaner task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2203) Web UI for cache manager
[ https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251775#comment-14251775 ] Hudson commented on YARN-2203: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/]) YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) (kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java Web UI for cache manager Key: YARN-2203 URL: https://issues.apache.org/jira/browse/YARN-2203 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, YARN-2203-trunk-v5.patch Implement the web server and web ui for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251801#comment-14251801 ] Hudson commented on YARN-1492: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/]) YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) (kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Priority: Critical Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf, shared_cache_design_v6.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2203) Web UI for cache manager
[ https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251813#comment-14251813 ] Hudson commented on YARN-2203: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/]) YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) (kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java Web UI for cache manager Key: YARN-2203 URL: https://issues.apache.org/jira/browse/YARN-2203 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, YARN-2203-trunk-v5.patch Implement the web server and web ui for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands
[ https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251805#comment-14251805 ] Hudson commented on YARN-2972: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/]) YARN-2972. DelegationTokenRenewer thread pool never expands. Contributed by Jason Lowe (junping_du: rev 2b4b0e8847048850206f091c6870a02e08cfe836) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * hadoop-yarn-project/CHANGES.txt DelegationTokenRenewer thread pool never expands Key: YARN-2972 URL: https://issues.apache.org/jira/browse/YARN-2972 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 2.7.0 Attachments: YARN-2972.001.patch DelegationTokenRenewer uses a thread pool to manage token renewals. The number of threads is configurable, but unfortunately the pool never expands beyond the hardcoded initial 5 threads because we are using an unbounded LinkedBlockingQueue. ThreadPoolExecutor only grows the thread pool beyond the core size when the specified queue is full. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2944) InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance
[ https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251809#comment-14251809 ] Hudson commented on YARN-2944: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/]) YARN-2944. InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance. (Chris Trezzo via kasha) (kasha: rev a1bd1409649da96c9fde4a9f9398d7711bc6c281) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStoreBaseTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestSharedCacheUploaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/DummyAppChecker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestClientSCMProtocolService.java InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance - Key: YARN-2944 URL: https://issues.apache.org/jira/browse/YARN-2944 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Priority: Minor Fix For: 2.7.0 Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, YARN-2944-trunk-v3.patch Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create the SCMStore service. Unfortunately the SCMStore class does not have a 0-argument constructor. On startup, the SCM fails with the following: {noformat} 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager failed in state INITED; cause: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156) Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at java.lang.Class.getConstructor0(Class.java:2763) at java.lang.Class.getDeclaredConstructor(Class.java:2021) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125) ... 4 more 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting SharedCacheManager java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156) Caused by: java.lang.NoSuchMethodException:
[jira] [Commented] (YARN-2189) Admin service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251800#comment-14251800 ] Hudson commented on YARN-2189: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/]) YARN-2189. Admin service for cache manager. Addendum to sort entries (kasha: rev 9b4ba409c6683c52c8e931809fc47b593bb90b48) * hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh * hadoop-yarn-project/hadoop-yarn/bin/yarn Admin service for cache manager --- Key: YARN-2189 URL: https://issues.apache.org/jira/browse/YARN-2189 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: YARN-1492-trunk-addendum.patch, YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch Implement the admin service for the shared cache manager. This service is responsible for handling administrative commands such as manually running a cleaner task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251804#comment-14251804 ] Hudson commented on YARN-2964: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/]) YARN-2964. FSLeafQueue#assignContainer - document the reason for using both write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev f2d150ea1205b77a75c347ace667b4cd060aaf40) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Attachments: YARN-2964.1.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251818#comment-14251818 ] Jason Lowe commented on YARN-2964: -- Thanks for the patch, Jian! Findbug warnings appear to be unrelated. I'm wondering about the change in the removeApplicationFromRenewal method or remove. If a sub-job completes, won't we remove the token from the allTokens map before the launcher job has completed? Then a subsequent sub-job that requests token cancelation can put the token back in the map and cause the token to be canceled when it leaves. I think we need to repeat the logic from the original code before YARN-2704 here, i.e.: only remove the token if the application ID matches. That way the launcher job's token will remain _the_ token in that collection until the launcher job completes. This comment doesn't match the code, since the code looks like if any token wants to cancel at the end then we will cancel at the end. {code} // If any of the jobs sharing the same token set shouldCancelAtEnd // to true, we should not cancel the token. if (evt.shouldCancelAtEnd) { dttr.shouldCancelAtEnd = evt.shouldCancelAtEnd; } {code} I think the logic and comment should be if any job doesn't want to cancel then we won't cancel. The code seems to be trying to do the opposite, so I'm not sure how the unit test is passing. Maybe I'm missing something. The info log message added in handleAppSubmitEvent also is misleading, as it says we are setting shouldCancelAtEnd to whatever the event said, when in reality we only set it sometimes. Probably needs to be inside the conditional. Wonder if we should be using a Set instead of a Map to track these tokens. Adding an already existing DelegationTokenToRenew in a set will not change the one already there, but with the map a sub-job can clobber the DelegationTokenToRenew that's already there with its own when it does the allTokens.put(dtr.token, dtr). RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Attachments: YARN-2964.1.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2203) Web UI for cache manager
[ https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251854#comment-14251854 ] Allen Wittenauer commented on YARN-2203: Should this have been .http.address instead of webapp.address to be consistent with the rest of Hadoop? Web UI for cache manager Key: YARN-2203 URL: https://issues.apache.org/jira/browse/YARN-2203 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, YARN-2203-trunk-v5.patch Implement the web server and web ui for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2936: --- Attachment: YARN-2936.001.patch YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2868) Add metric for initial container launch time
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated YARN-2868: - Attachment: YARN-2868.005.patch - Cleaned up unused imports - Didn't move @metric to a single line--that violates the 80 column width Add metric for initial container launch time Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2203) Web UI for cache manager
[ https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252003#comment-14252003 ] Chris Trezzo commented on YARN-2203: [~aw] Thanks for the comment! I didn't realize that was a convention considering there are other similar parameters that do not seem to follow that form. For some examples: yarn.resourcemanager.webapp.address, yarn.resourcemanager.admin.address, yarn.nodemanager.webapp.address. I can make note of your comment in YARN-2654 when we do a final pass on the config parameters and ensure that they have quality names. Web UI for cache manager Key: YARN-2203 URL: https://issues.apache.org/jira/browse/YARN-2203 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, YARN-2203-trunk-v5.patch Implement the web server and web ui for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2654) Revisit all shared cache config parameters to ensure quality names
[ https://issues.apache.org/jira/browse/YARN-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252007#comment-14252007 ] Karthik Kambatla commented on YARN-2654: From [~aw] on YARN-2203: Should this have been .http.address instead of webapp.address to be consistent with the rest of Hadoop? Revisit all shared cache config parameters to ensure quality names -- Key: YARN-2654 URL: https://issues.apache.org/jira/browse/YARN-2654 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Priority: Blocker Revisit all the shared cache config parameters in YarnConfiguration and yarn-default.xml to ensure quality names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2203) Web UI for cache manager
[ https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252008#comment-14252008 ] Karthik Kambatla commented on YARN-2203: Added a comment there. Thanks Allen. Web UI for cache manager Key: YARN-2203 URL: https://issues.apache.org/jira/browse/YARN-2203 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, YARN-2203-trunk-v5.patch Implement the web server and web ui for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2203) Web UI for cache manager
[ https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252031#comment-14252031 ] Allen Wittenauer commented on YARN-2203: I wouldn't trust the YARN project to be any sort of guide to do anything with consistency. It has proven over and over again that it wants everything to be hard to administer. Web UI for cache manager Key: YARN-2203 URL: https://issues.apache.org/jira/browse/YARN-2203 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, YARN-2203-trunk-v5.patch Implement the web server and web ui for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252045#comment-14252045 ] Jian He commented on YARN-2964: --- thanks for your comments, Jason ! bq. I'm wondering about the change in the removeApplicationFromRenewal method or remove. If launcher job first gets added to the appTokens map, DelegationTokenRenewer will not add DelegationTokenToRenew instance for the sub-job. So the tokens in removeApplicationFromRenewal will return empty for the sub-job when the sub-job completes. So the token won’t be removed from the allTokens. My only concern with a global set that is that each time an application completes, we end up looping all the applications or worse (each app may have at least one token). bq. This comment doesn't match the code good catch.. what a mistake.. I might be in the impression the semantics is “shouldKeepAtEnd”, I added one line in the test case to guard against this. bq. Wonder if we should be using a Set instead of a Map to track these tokens Thought about that too, the reason that switched to a map is to get the DelegationTokenToRenew instance based on the token app provided and change the shouldCancelAtEnd field on submission. RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Attachments: YARN-2964.1.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2964: -- Attachment: YARN-2964.2.patch updated the patch based on some comments from Jason RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Attachments: YARN-2964.1.patch, YARN-2964.2.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2203) Web UI for cache manager
[ https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252078#comment-14252078 ] Karthik Kambatla commented on YARN-2203: bq. I wouldn't trust the YARN project to be any sort of guide to do anything with consistency. It has proven over and over again that it wants everything to be hard to administer. If this comment was to provide direction, I am clearly missing it. Is the suggestion to drop whatever consistency the *sub*-project is *trying* to achieve and run in different directions? Web UI for cache manager Key: YARN-2203 URL: https://issues.apache.org/jira/browse/YARN-2203 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, YARN-2203-trunk-v5.patch Implement the web server and web ui for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2203) Web UI for cache manager
[ https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252103#comment-14252103 ] Allen Wittenauer commented on YARN-2203: Consistency within the larger project is more important than consistency within the sub-project. Operationally, it sucks having to switch gears depending upon which sub-project one is working on. Oh, YARN calls it feature X, even though the rest of Hadoop has called it Y since pre-YARN. Web UI for cache manager Key: YARN-2203 URL: https://issues.apache.org/jira/browse/YARN-2203 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, YARN-2203-trunk-v5.patch Implement the web server and web ui for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2203) Web UI for cache manager
[ https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252116#comment-14252116 ] Karthik Kambatla commented on YARN-2203: I see the value in the entire project being consistent, but not at the expense of inconsistency within the sub-project. I would like to stay out of how we should have done things. If it is true that there are inconsistencies with no good reason, it would be nice to address those inconsistencies in Hadoop-3, preferably in a backwards-compatible way. Any volunteers? Web UI for cache manager Key: YARN-2203 URL: https://issues.apache.org/jira/browse/YARN-2203 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, YARN-2203-trunk-v5.patch Implement the web server and web ui for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252139#comment-14252139 ] Hadoop QA commented on YARN-2936: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688081/YARN-2936.001.patch against trunk revision 389f881. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 39 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestRM The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6147//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6147//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6147//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6147//console This message is automatically generated. YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252164#comment-14252164 ] Hadoop QA commented on YARN-2868: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688084/YARN-2868.005.patch against trunk revision 389f881. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 14 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6148//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6148//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6148//console This message is automatically generated. Add metric for initial container launch time Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks
[ https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252195#comment-14252195 ] Karthik Kambatla commented on YARN-2975: bq. The removeApp from one operation is now 2 steps. removeApp did not expose the two operations outside, but had them. IIUC, an app can only go from non-runnable to runnable, but not vice-versa. For instance, see the following snippet in FairScheduler#executeMove. So, don't think we need to worry about consistency. {code} if (wasRunnable !nowRunnable) { throw new IllegalStateException(Should have already verified that app + attempt.getApplicationId() + would be runnable in new queue); } {code} {quote} Nit: suggest resetPreemptedResources - resetPreemptedResourcesRunnableApps clearPreemptedResources - clear PreemptedResourcesRunnableApps {quote} I thought of this, but decided against it. The additional RunnableApps doesn't add anything but extra characters. FSLeafQueue app lists are accessed without required locks - Key: YARN-2975 URL: https://issues.apache.org/jira/browse/YARN-2975 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-2975-1.patch YARN-2910 adds explicit locked access to runnable and non-runnable apps in FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed without locks in other places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks
[ https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252205#comment-14252205 ] Karthik Kambatla commented on YARN-2975: [~adhoot] - I see your point about consistency. If the app was not runnable at the time of check but becomes runnable by the time we remove it, we can miss untracking it. May be, we should clean up how we store runnability of an app in FSAppAttempt, and add transitions like rest of RM code to handle addition, move, removal of the apps. The scope might be too big to do in this JIRA. Okay with doing it in a follow-up? By the way, thanks a bunch for taking the time to review. FSLeafQueue app lists are accessed without required locks - Key: YARN-2975 URL: https://issues.apache.org/jira/browse/YARN-2975 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-2975-1.patch YARN-2910 adds explicit locked access to runnable and non-runnable apps in FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed without locks in other places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252213#comment-14252213 ] Varun Saxena commented on YARN-2936: [~zjshen], [~jianhe], kindly review. Findbugs to be handled by YARN-2937 to YARN-2940. Test failure unrelated and all these tests are passing in my local setup. YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252216#comment-14252216 ] Hadoop QA commented on YARN-2964: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688092/YARN-2964.2.patch against trunk revision 07619aa. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 14 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6149//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6149//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6149//console This message is automatically generated. RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Attachments: YARN-2964.1.patch, YARN-2964.2.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252218#comment-14252218 ] Jason Lowe commented on YARN-2964: -- bq. If launcher job first gets added to the appTokens map, DelegationTokenRenewer will not add DelegationTokenToRenew instance for the sub-job. Ah, sorry, I missed this critical change from the original patch. However if we don't add the delegation token for each sub-job then I think we have a problem with the following use-case: # Oozie launcher submits a MapReduce sub-job # MapReduce job starts # Oozie launcher job leaves # MapReduce job now running with a token that the RM has forgotten and won't be automatically renewed We might have had the same issue in this case prior to YARN-2704, since the token would be pulled from the set when the launcher completed. RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Attachments: YARN-2964.1.patch, YARN-2964.2.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252243#comment-14252243 ] Jian He commented on YARN-2964: --- bq. We might have had the same issue in this case prior to YARN-2704. Yes, this is an existing issue. As Robert pointed out in the previous comment, oozie MapReduce sub-job now cannot run beyond 24 hrs. IMO, we can fix this separately ? RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Attachments: YARN-2964.1.patch, YARN-2964.2.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252259#comment-14252259 ] Jason Lowe commented on YARN-2964: -- Sure, we can fix that as a followup issue since it's no worse than what we had before. +1 lgtm, only nit is the new getAllTokens method should be package-private instead of public but not a big deal either way. I assume the test failures are unrelated? RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Attachments: YARN-2964.1.patch, YARN-2964.2.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252286#comment-14252286 ] Jian He commented on YARN-2964: --- I believe the failures are not related. I just changed the visibility and uploaded a new patch to re-kick jenkins. RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2964: -- Attachment: YARN-2964.3.patch RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2939) Fix new findbugs warnings in hadoop-yarn-common
[ https://issues.apache.org/jira/browse/YARN-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252332#comment-14252332 ] Jian He commented on YARN-2939: --- +1 Fix new findbugs warnings in hadoop-yarn-common --- Key: YARN-2939 URL: https://issues.apache.org/jira/browse/YARN-2939 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Saxena Assignee: Li Lu Labels: findbugs Attachments: YARN-2939-120914.patch, YARN-2939-121614.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks
[ https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252343#comment-14252343 ] Anubhav Dhoot commented on YARN-2975: - Yes I am worried about getting it wrong for maxRunningEnforcer. Before the change, we would inside a lock achieve the removal of the app whether it was in runnable or not and be reasonably sure. Now the splitting it into 2 non atomic steps outside as i listed above, and also 2 steps inside {noformat} return removeRunnableApp(app) || removeNonRunnableApp(app) {noformat}, we might make it worse as each one leaves the lock before the other acquires. The application could be completely missed when it moves from nonrunnable to runnable in between. How about making removeApp do try to remove from both runnable or nonRunnable inside a single writelock. We can try removing the redundancies with removeRunnableApp and removeNonRunnableApp by having a fourth internal method that all 3 delegate via flags to limit where to look for the app. FSLeafQueue app lists are accessed without required locks - Key: YARN-2975 URL: https://issues.apache.org/jira/browse/YARN-2975 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-2975-1.patch YARN-2910 adds explicit locked access to runnable and non-runnable apps in FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed without locks in other places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks
[ https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252467#comment-14252467 ] Karthik Kambatla commented on YARN-2975: I understand the concern. Pre-YARN-2910, we were in the exact same place and haven't seen any issues in practice. Let me take a look again and see if we can do what you are suggesting - do all of removeApp while holding the writeLock. Initially, I had a single method, but had to split based on prior accesses. FSLeafQueue app lists are accessed without required locks - Key: YARN-2975 URL: https://issues.apache.org/jira/browse/YARN-2975 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-2975-1.patch YARN-2910 adds explicit locked access to runnable and non-runnable apps in FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed without locks in other places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2978) Null pointer in YarnProtos
Jason Tufo created YARN-2978: Summary: Null pointer in YarnProtos Key: YARN-2978 URL: https://issues.apache.org/jira/browse/YARN-2978 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.1 Reporter: Jason Tufo java.lang.NullPointerException at org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625) at org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252473#comment-14252473 ] Hadoop QA commented on YARN-2964: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688133/YARN-2964.3.patch against trunk revision b9d4976. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 14 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestRM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6150//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6150//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6150//console This message is automatically generated. RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.
[ https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252481#comment-14252481 ] Ray Chiang commented on YARN-2675: -- Started reviewing, but it looks like this patch needs to be updated due to YARN-1156. the containersKilled metrics is not updated when the container is killed during localization. - Key: YARN-2675 URL: https://issues.apache.org/jira/browse/YARN-2675 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Labels: metrics, supportability Attachments: YARN-2675.000.patch, YARN-2675.001.patch, YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch, YARN-2675.005.patch The containersKilled metrics is not updated when the container is killed during localization. We should add KILLING state in finished of ContainerImpl.java to update killedContainer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2979) Unsupported operation exception in message building (YarnProtos)
Jason Tufo created YARN-2979: Summary: Unsupported operation exception in message building (YarnProtos) Key: YARN-2979 URL: https://issues.apache.org/jira/browse/YARN-2979 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.5.1 Reporter: Jason Tufo java.lang.UnsupportedOperationException at java.util.AbstractList.add(AbstractList.java:148) at java.util.AbstractList.add(AbstractList.java:108) at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330) at org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.addAllApplications(YarnProtos.java:30702) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.addApplicationsToProto(QueueInfoPBImpl.java:227) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToBuilder(QueueInfoPBImpl.java:282) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:289) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2203) Web UI for cache manager
[ https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252507#comment-14252507 ] Allen Wittenauer commented on YARN-2203: I'm willing to do it, but it won't be backward compatible. Web UI for cache manager Key: YARN-2203 URL: https://issues.apache.org/jira/browse/YARN-2203 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, YARN-2203-trunk-v5.patch Implement the web server and web ui for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2939) Fix new findbugs warnings in hadoop-yarn-common
[ https://issues.apache.org/jira/browse/YARN-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252509#comment-14252509 ] Hadoop QA commented on YARN-2939: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687590/YARN-2939-121614.patch against trunk revision c4d9713. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 14 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6151//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6151//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6151//console This message is automatically generated. Fix new findbugs warnings in hadoop-yarn-common --- Key: YARN-2939 URL: https://issues.apache.org/jira/browse/YARN-2939 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Saxena Assignee: Li Lu Labels: findbugs Attachments: YARN-2939-120914.patch, YARN-2939-121614.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252510#comment-14252510 ] Jason Lowe commented on YARN-2964: -- +1 lgtm. I don't believe the test failures are related since they pass for me locally. Committing this. RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)
[ https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay Bhat updated YARN-2230: - Component/s: documentation Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code) - Key: YARN-2230 URL: https://issues.apache.org/jira/browse/YARN-2230 Project: Hadoop YARN Issue Type: Bug Components: client, documentation, scheduler Affects Versions: 2.4.0 Reporter: Adam Kawa Priority: Minor Attachments: YARN-2230.001.patch When a user requests more vcores than the allocation limit (e.g. mapreduce.map.cpu.vcores is larger than yarn.scheduler.maximum-allocation-vcores), then InvalidResourceRequestException is thrown - https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java {code} if (resReq.getCapability().getVirtualCores() 0 || resReq.getCapability().getVirtualCores() maximumResource.getVirtualCores()) { throw new InvalidResourceRequestException(Invalid resource request + , requested virtual cores 0 + , or requested virtual cores max configured + , requestedVirtualCores= + resReq.getCapability().getVirtualCores() + , maxVirtualCores= + maximumResource.getVirtualCores()); } {code} According to documentation - yarn-default.xml http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml, the request should be capped to the allocation limit. {code} property descriptionThe maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value./description nameyarn.scheduler.maximum-allocation-vcores/name value32/value /property {code} This means that: * Either documentation or code should be corrected (unless this exception is handled elsewhere accordingly, but it looks that it is not). This behavior is confusing, because when such a job (with mapreduce.map.cpu.vcores is larger than yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g. {code} 2014-06-29 00:34:51,469 WARN org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Invalid resource ask by application appattempt_1403993411503_0002_01 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores 0, or requested virtual cores max configured, requestedVirtualCores=32, maxVirtualCores=3 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237) at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420) . at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) {code} * IMHO, such an exception should be forwarded to client. Otherwise, it is non obvious to discover why a job does not make any progress. The same looks to be related to memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)
[ https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay Bhat updated YARN-2230: - Attachment: YARN-2230.001.patch Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code) - Key: YARN-2230 URL: https://issues.apache.org/jira/browse/YARN-2230 Project: Hadoop YARN Issue Type: Bug Components: client, documentation, scheduler Affects Versions: 2.4.0 Reporter: Adam Kawa Priority: Minor Attachments: YARN-2230.001.patch When a user requests more vcores than the allocation limit (e.g. mapreduce.map.cpu.vcores is larger than yarn.scheduler.maximum-allocation-vcores), then InvalidResourceRequestException is thrown - https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java {code} if (resReq.getCapability().getVirtualCores() 0 || resReq.getCapability().getVirtualCores() maximumResource.getVirtualCores()) { throw new InvalidResourceRequestException(Invalid resource request + , requested virtual cores 0 + , or requested virtual cores max configured + , requestedVirtualCores= + resReq.getCapability().getVirtualCores() + , maxVirtualCores= + maximumResource.getVirtualCores()); } {code} According to documentation - yarn-default.xml http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml, the request should be capped to the allocation limit. {code} property descriptionThe maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value./description nameyarn.scheduler.maximum-allocation-vcores/name value32/value /property {code} This means that: * Either documentation or code should be corrected (unless this exception is handled elsewhere accordingly, but it looks that it is not). This behavior is confusing, because when such a job (with mapreduce.map.cpu.vcores is larger than yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g. {code} 2014-06-29 00:34:51,469 WARN org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Invalid resource ask by application appattempt_1403993411503_0002_01 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores 0, or requested virtual cores max configured, requestedVirtualCores=32, maxVirtualCores=3 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237) at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420) . at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) {code} * IMHO, such an exception should be forwarded to client. Otherwise, it is non obvious to discover why a job does not make any progress. The same looks to be related to memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)
[ https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay Bhat updated YARN-2230: - Attachment: (was: YARN-2230.001.patch) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code) - Key: YARN-2230 URL: https://issues.apache.org/jira/browse/YARN-2230 Project: Hadoop YARN Issue Type: Bug Components: client, documentation, scheduler Affects Versions: 2.4.0 Reporter: Adam Kawa Priority: Minor Attachments: YARN-2230.001.patch When a user requests more vcores than the allocation limit (e.g. mapreduce.map.cpu.vcores is larger than yarn.scheduler.maximum-allocation-vcores), then InvalidResourceRequestException is thrown - https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java {code} if (resReq.getCapability().getVirtualCores() 0 || resReq.getCapability().getVirtualCores() maximumResource.getVirtualCores()) { throw new InvalidResourceRequestException(Invalid resource request + , requested virtual cores 0 + , or requested virtual cores max configured + , requestedVirtualCores= + resReq.getCapability().getVirtualCores() + , maxVirtualCores= + maximumResource.getVirtualCores()); } {code} According to documentation - yarn-default.xml http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml, the request should be capped to the allocation limit. {code} property descriptionThe maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value./description nameyarn.scheduler.maximum-allocation-vcores/name value32/value /property {code} This means that: * Either documentation or code should be corrected (unless this exception is handled elsewhere accordingly, but it looks that it is not). This behavior is confusing, because when such a job (with mapreduce.map.cpu.vcores is larger than yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g. {code} 2014-06-29 00:34:51,469 WARN org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Invalid resource ask by application appattempt_1403993411503_0002_01 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores 0, or requested virtual cores max configured, requestedVirtualCores=32, maxVirtualCores=3 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237) at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420) . at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) {code} * IMHO, such an exception should be forwarded to client. Otherwise, it is non obvious to discover why a job does not make any progress. The same looks to be related to memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252563#comment-14252563 ] Hudson commented on YARN-2964: -- FAILURE: Integrated in Hadoop-trunk-Commit #6755 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6755/]) YARN-2964. RM prematurely cancels tokens for jobs that submit jobs (oozie). Contributed by Jian He (jlowe: rev 0402bada1989258ecbfdc437cb339322a1f55a97) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Fix For: 2.7.0 Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)
[ https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252592#comment-14252592 ] Hadoop QA commented on YARN-2230: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688178/YARN-2230.001.patch against trunk revision 0402bad. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 25 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6152//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6152//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6152//console This message is automatically generated. Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code) - Key: YARN-2230 URL: https://issues.apache.org/jira/browse/YARN-2230 Project: Hadoop YARN Issue Type: Bug Components: client, documentation, scheduler Affects Versions: 2.4.0 Reporter: Adam Kawa Priority: Minor Attachments: YARN-2230.001.patch When a user requests more vcores than the allocation limit (e.g. mapreduce.map.cpu.vcores is larger than yarn.scheduler.maximum-allocation-vcores), then InvalidResourceRequestException is thrown - https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java {code} if (resReq.getCapability().getVirtualCores() 0 || resReq.getCapability().getVirtualCores() maximumResource.getVirtualCores()) { throw new InvalidResourceRequestException(Invalid resource request + , requested virtual cores 0 + , or requested virtual cores max configured + , requestedVirtualCores= + resReq.getCapability().getVirtualCores() + , maxVirtualCores= + maximumResource.getVirtualCores()); } {code} According to documentation - yarn-default.xml http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml, the request should be capped to the allocation limit. {code} property descriptionThe maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value./description nameyarn.scheduler.maximum-allocation-vcores/name value32/value /property {code} This means that: * Either documentation or code should be corrected (unless this exception is handled elsewhere accordingly, but it looks that it is not). This behavior is confusing, because when such a job (with mapreduce.map.cpu.vcores is larger than yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g. {code} 2014-06-29 00:34:51,469 WARN org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Invalid resource ask by application appattempt_1403993411503_0002_01 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores 0, or requested virtual cores max configured, requestedVirtualCores=32, maxVirtualCores=3 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237) at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420) . at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at
[jira] [Updated] (YARN-2217) Shared cache client side changes
[ https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-2217: --- Attachment: YARN-2217-trunk-v4.patch [~kasha] V4 attached. Patch verified manually. Shared cache client side changes Key: YARN-2217 URL: https://issues.apache.org/jira/browse/YARN-2217 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch Implement the client side changes for the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252611#comment-14252611 ] Jian He commented on YARN-2964: --- thanks for reviewing and committing, Jason ! RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Fix For: 2.7.0 Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.19.patch Modified patch to use minimum allocation value if application master resource is unavailable maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2654) Revisit all shared cache config parameters to ensure quality names
[ https://issues.apache.org/jira/browse/YARN-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-2654: --- Attachment: shared_cache_config_parameters.txt Attached is a text file with all of the shared cache config parameters, their descriptions and defaults (taken from yarn-default.xml). Revisit all shared cache config parameters to ensure quality names -- Key: YARN-2654 URL: https://issues.apache.org/jira/browse/YARN-2654 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Priority: Blocker Attachments: shared_cache_config_parameters.txt Revisit all the shared cache config parameters in YarnConfiguration and yarn-default.xml to ensure quality names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252620#comment-14252620 ] Craig Welch commented on YARN-2637: --- Minor update - TestAllocationFileLoaderService passes for me, I think it's a build server issue. Also, I believe the findbugs is still related to the jdk7 update - but they had disappeared before I had a chance to verify, they will re-run with the new patch and I will confirm that they are not related to my change. In all, I believe the change is fine in terms of the release audit checks... The only change in this version vs the last is with respect to: bq. 2. FiCaSchedulerApp constructor As I said before, this is present in non-test test scenarios. However, I realized that I could use the minimum allocation from the scheduler if it is not present, which would mean at worse we would have the old behavior if there is not an actual amresource to work with - so I adjusted the code to do that if necessary possible maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252629#comment-14252629 ] Mayank Bansal commented on YARN-2933: - These find bug and test failure is not due to this patch. Thanks, Mayank Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Attachments: YARN-2933-1.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2217) Shared cache client side changes
[ https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252682#comment-14252682 ] Hadoop QA commented on YARN-2217: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688186/YARN-2217-trunk-v4.patch against trunk revision 0402bad. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 10 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA org.apache.hadoop.yarn.client.api.impl.TestYarnClient Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6153//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6153//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6153//console This message is automatically generated. Shared cache client side changes Key: YARN-2217 URL: https://issues.apache.org/jira/browse/YARN-2217 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch Implement the client side changes for the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2217) Shared cache client side changes
[ https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252700#comment-14252700 ] Chris Trezzo commented on YARN-2217: Findbugs and test failures seem unrelated. Shared cache client side changes Key: YARN-2217 URL: https://issues.apache.org/jira/browse/YARN-2217 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch Implement the client side changes for the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252753#comment-14252753 ] Hadoop QA commented on YARN-2637: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688189/YARN-2637.19.patch against trunk revision 0402bad. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 14 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6154//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6154//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6154//console This message is automatically generated. maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2978) Null pointer in YarnProtos
[ https://issues.apache.org/jira/browse/YARN-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-2978: -- Assignee: Varun Saxena Null pointer in YarnProtos -- Key: YARN-2978 URL: https://issues.apache.org/jira/browse/YARN-2978 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.1 Reporter: Jason Tufo Assignee: Varun Saxena java.lang.NullPointerException at org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625) at org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2979) Unsupported operation exception in message building (YarnProtos)
[ https://issues.apache.org/jira/browse/YARN-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-2979: -- Assignee: Varun Saxena Unsupported operation exception in message building (YarnProtos) Key: YARN-2979 URL: https://issues.apache.org/jira/browse/YARN-2979 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.5.1 Reporter: Jason Tufo Assignee: Varun Saxena java.lang.UnsupportedOperationException at java.util.AbstractList.add(AbstractList.java:148) at java.util.AbstractList.add(AbstractList.java:108) at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330) at org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.addAllApplications(YarnProtos.java:30702) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.addApplicationsToProto(QueueInfoPBImpl.java:227) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToBuilder(QueueInfoPBImpl.java:282) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:289) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2980) Move health check script related functionality to hadoop-common
Ming Ma created YARN-2980: - Summary: Move health check script related functionality to hadoop-common Key: YARN-2980 URL: https://issues.apache.org/jira/browse/YARN-2980 Project: Hadoop YARN Issue Type: Improvement Reporter: Ming Ma HDFS might want to leverage health check functionality available in YARN in both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode https://issues.apache.org/jira/browse/HDFS-7441. We can move health check functionality including the protocol between hadoop daemons and health check script to hadoop-common. That will simplify the development and maintenance for both hadoop source code and health check script. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252830#comment-14252830 ] Robert Kanter commented on YARN-2964: - Thanks for fixing this. [~jianhe], [~jlowe], on the 24 hrs thing, do you think this is something we can/should fix in YARN? My understanding of this issue is that it's by design (there's even a config for the interval). Given that, I'm thinking the proper fix for this is just to have the launcher job periodically renew the token (a fix in OOZIE)? RM prematurely cancels tokens for jobs that submit jobs (oozie) --- Key: YARN-2964 URL: https://issues.apache.org/jira/browse/YARN-2964 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Jian He Priority: Blocker Fix For: 2.7.0 Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched 10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2980) Move health check script related functionality to hadoop-common
[ https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-2980: -- Assignee: Varun Saxena Move health check script related functionality to hadoop-common --- Key: YARN-2980 URL: https://issues.apache.org/jira/browse/YARN-2980 Project: Hadoop YARN Issue Type: Improvement Reporter: Ming Ma Assignee: Varun Saxena HDFS might want to leverage health check functionality available in YARN in both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode https://issues.apache.org/jira/browse/HDFS-7441. We can move health check functionality including the protocol between hadoop daemons and health check script to hadoop-common. That will simplify the development and maintenance for both hadoop source code and health check script. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.
[ https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2675: Attachment: YARN-2675.006.patch the containersKilled metrics is not updated when the container is killed during localization. - Key: YARN-2675 URL: https://issues.apache.org/jira/browse/YARN-2675 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Labels: metrics, supportability Attachments: YARN-2675.000.patch, YARN-2675.001.patch, YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch, YARN-2675.005.patch, YARN-2675.006.patch The containersKilled metrics is not updated when the container is killed during localization. We should add KILLING state in finished of ContainerImpl.java to update killedContainer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)