[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU
[ https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096610#comment-14096610 ] Ivan Mitic commented on YARN-2190: -- Thanks Chuan for the patch. Looks great overall! A few questions/suggestions below: 1. Where can I see that there are CPU/memory limits set on the job? ProcessExplorer? 2. Please make sure the code continues to compile/run on Win7/SDK. I am still on Server 2008R2 :) 3. task.c: Can you init {{wchar_t *end}} to NULL? In the {{if}} check after wcstol, might make sense to add {{end == NULL || *end !=...}} 4. task.c: ParseCommandLine: Given that you're passing pointers to variables on stack, you could as well assert that {{memory}} and {{vcore}} are != NULL. 5. {code} OPTIONS: -c [cores] set virtual core limits on the job object.\n\ -m [memory] set the memory limit on the job object.\n\ {code} Can you please specify the unit used for the memory/CPU limit? 6. task.c: {code} jcrci.CpuRate = vcores * (1 / sysinfo.dwNumberOfProcessors); {code} Should we multiply first and then divide, to minimize precision loss? 7. Would you mind including a unittest for WindowsContainerExecutor? At this point it will be a trivial test, but will likely grow over time. 8. Just to confirm, by default, we will still use the DefaultContainerExecutor on Windows, right? And users can configure the WindowsContainerExecutor if they want? This sounds good until we develop better understand of how new limits behave in production. Provide a Windows container executor that can limit memory and CPU -- Key: YARN-2190 URL: https://issues.apache.org/jira/browse/YARN-2190 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Reporter: Chuan Liu Assignee: Chuan Liu Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, YARN-2190.2.patch Yarn default container executor on Windows does not set the resource limit on the containers currently. The memory limit is enforced by a separate monitoring thread. The container implementation on Windows uses Job Object right now. The latest Windows (8 or later) API allows CPU and memory limits on the job objects. We want to create a Windows container executor that sets the limits on job objects thus provides resource enforcement at OS level. http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2383) Add ability to renew ClientToAMToken
[ https://issues.apache.org/jira/browse/YARN-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2383: Attachment: YARN-2383.preview.3.1.patch fix the testcase failures and findBugs Add ability to renew ClientToAMToken Key: YARN-2383 URL: https://issues.apache.org/jira/browse/YARN-2383 Project: Hadoop YARN Issue Type: Bug Components: applications, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2383.preview.1.patch, YARN-2383.preview.2.patch, YARN-2383.preview.3.1.patch, YARN-2383.preview.3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-2173) Enabling HTTPS for the reader REST APIs of TimelineServer
[ https://issues.apache.org/jira/browse/YARN-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2173. --- Resolution: Implemented Enabling HTTPS for the reader REST APIs of TimelineServer - Key: YARN-2173 URL: https://issues.apache.org/jira/browse/YARN-2173 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2173) Enabling HTTPS for the reader REST APIs of TimelineServer
[ https://issues.apache.org/jira/browse/YARN-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096685#comment-14096685 ] Zhijie Shen commented on YARN-2173: --- I've setup HTTPS locally for the timeline server, and verify it with security on and off. In both scenarios, the three timeline GET APIs and the generic history web services and UI were working fine. Therefore, HTTPS of the timeline server should just work by using WebApp. Close this ticket as implemented. Enabling HTTPS for the reader REST APIs of TimelineServer - Key: YARN-2173 URL: https://issues.apache.org/jira/browse/YARN-2173 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2418) Resource Manager JMX root queue active users 0
Hari Sekhon created YARN-2418: - Summary: Resource Manager JMX root queue active users 0 Key: YARN-2418 URL: https://issues.apache.org/jira/browse/YARN-2418 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Environment: HDP2.1 Reporter: Hari Sekhon Priority: Minor I've observed the Yarn Resource Manager's JMX shows the active users in the root queue as 0 when the other metrics such as submitted jobs are showing the correct stats from the leaf queues. I think the active users for the root queue should be the total of the active users for all the leaf queues for correctness since this is the cluster-wide stats? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2380) The normalizeRequests method in SchedulerUtils always resets the vCore to 1
[ https://issues.apache.org/jira/browse/YARN-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenji Kikushima updated YARN-2380: -- Attachment: YARN-2380.patch Hi, how about to keep vcores in DefaultResourceCalculator#normalize? I think DefaultResourceCalculator should care only about memory. The normalizeRequests method in SchedulerUtils always resets the vCore to 1 --- Key: YARN-2380 URL: https://issues.apache.org/jira/browse/YARN-2380 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jian Fang Priority: Critical Attachments: YARN-2380.patch I added some log info to the method normalizeRequest() as follows. public static void normalizeRequest( ResourceRequest ask, ResourceCalculator resourceCalculator, Resource clusterResource, Resource minimumResource, Resource maximumResource, Resource incrementResource) { LOG.info(Before request normalization, the ask capacity: + ask.getCapability()); Resource normalized = Resources.normalize( resourceCalculator, ask.getCapability(), minimumResource, maximumResource, incrementResource); LOG.info(After request normalization, the ask capacity: + normalized); ask.setCapability(normalized); } The resulted log showed that the vcore in ask was changed from 2 to 1. 2014-08-01 20:54:15,537 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils (IPC Server handler 4 on 9024): Before request normalization, the ask capacity: memory:1536, vCores:2 2014-08-01 20:54:15,537 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils (IPC Server handler 4 on 9024): After request normalization, the ask capacity: memory:1536, vCores:1 The root cause is the DefaultResourceCalculator calls Resources.createResource(normalizedMemory) to regenerate a new resource with vcore = 1. This bug is critical and it leads to the mismatch of the request resource and the container resource and many other potential issues if the user requests containers with vcore 1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2383) Add ability to renew ClientToAMToken
[ https://issues.apache.org/jira/browse/YARN-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096838#comment-14096838 ] Hadoop QA commented on YARN-2383: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661662/YARN-2383.preview.3.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 14 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA org.apache.hadoop.yarn.client.TestRMFailover org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA org.apache.hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4622//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4622//console This message is automatically generated. Add ability to renew ClientToAMToken Key: YARN-2383 URL: https://issues.apache.org/jira/browse/YARN-2383 Project: Hadoop YARN Issue Type: Bug Components: applications, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2383.preview.1.patch, YARN-2383.preview.2.patch, YARN-2383.preview.3.1.patch, YARN-2383.preview.3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2070) DistributedShell publishes unfriendly user information to the timeline server
[ https://issues.apache.org/jira/browse/YARN-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096867#comment-14096867 ] Hudson commented on YARN-2070: -- FAILURE: Integrated in Hadoop-Yarn-trunk #646 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/646/]) YARN-2070. Made DistributedShell publish the short user name to the timeline server. Contributed by Robert Kanter. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617837) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java DistributedShell publishes unfriendly user information to the timeline server - Key: YARN-2070 URL: https://issues.apache.org/jira/browse/YARN-2070 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Priority: Minor Labels: newbie Fix For: 2.6.0 Attachments: YARN-2070.patch Bellow is the code of using the string of current user object as the user value. {code} entity.addPrimaryFilter(user, UserGroupInformation.getCurrentUser() .toString()); {code} When we use kerberos authentication, it's going to output the full name, such as zjshen/localhost@LOCALHOST (auth.KERBEROS). It is not user friendly for searching by the primary filters. It's better to use shortUserName instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API
[ https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096859#comment-14096859 ] Hudson commented on YARN-2277: -- FAILURE: Integrated in Hadoop-Yarn-trunk #646 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/646/]) YARN-2277. Added cross-origin support for the timeline server web services. Contributed by Jonathan Eagles. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617832) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilterInitializer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilter.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilterInitializer.java Add Cross-Origin support to the ATS REST API Key: YARN-2277 URL: https://issues.apache.org/jira/browse/YARN-2277 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Fix For: 2.6.0 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch, YARN-2277-v7.patch, YARN-2277-v8.patch As the Application Timeline Server is not provided with built-in UI, it may make sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2070) DistributedShell publishes unfriendly user information to the timeline server
[ https://issues.apache.org/jira/browse/YARN-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097097#comment-14097097 ] Hudson commented on YARN-2070: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1837 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1837/]) YARN-2070. Made DistributedShell publish the short user name to the timeline server. Contributed by Robert Kanter. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617837) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java DistributedShell publishes unfriendly user information to the timeline server - Key: YARN-2070 URL: https://issues.apache.org/jira/browse/YARN-2070 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Priority: Minor Labels: newbie Fix For: 2.6.0 Attachments: YARN-2070.patch Bellow is the code of using the string of current user object as the user value. {code} entity.addPrimaryFilter(user, UserGroupInformation.getCurrentUser() .toString()); {code} When we use kerberos authentication, it's going to output the full name, such as zjshen/localhost@LOCALHOST (auth.KERBEROS). It is not user friendly for searching by the primary filters. It's better to use shortUserName instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2419) RM applications page doesn't sort application id properly
Thomas Graves created YARN-2419: --- Summary: RM applications page doesn't sort application id properly Key: YARN-2419 URL: https://issues.apache.org/jira/browse/YARN-2419 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Thomas Graves The ResourceManager apps page doesn't sort the application ids properly when the app id rolls over from to 1. When it rolls over the 1+ application ids end up being many pages down by the 0XXX numbers. I assume we just sort alphabetically so we would need a special sorter that knows about application ids. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API
[ https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097121#comment-14097121 ] Hudson commented on YARN-2277: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1863 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1863/]) YARN-2277. Added cross-origin support for the timeline server web services. Contributed by Jonathan Eagles. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617832) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilterInitializer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilter.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilterInitializer.java Add Cross-Origin support to the ATS REST API Key: YARN-2277 URL: https://issues.apache.org/jira/browse/YARN-2277 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Fix For: 2.6.0 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch, YARN-2277-v7.patch, YARN-2277-v8.patch As the Application Timeline Server is not provided with built-in UI, it may make sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2070) DistributedShell publishes unfriendly user information to the timeline server
[ https://issues.apache.org/jira/browse/YARN-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097129#comment-14097129 ] Hudson commented on YARN-2070: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1863 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1863/]) YARN-2070. Made DistributedShell publish the short user name to the timeline server. Contributed by Robert Kanter. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617837) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java DistributedShell publishes unfriendly user information to the timeline server - Key: YARN-2070 URL: https://issues.apache.org/jira/browse/YARN-2070 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Priority: Minor Labels: newbie Fix For: 2.6.0 Attachments: YARN-2070.patch Bellow is the code of using the string of current user object as the user value. {code} entity.addPrimaryFilter(user, UserGroupInformation.getCurrentUser() .toString()); {code} When we use kerberos authentication, it's going to output the full name, such as zjshen/localhost@LOCALHOST (auth.KERBEROS). It is not user friendly for searching by the primary filters. It's better to use shortUserName instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2383) Add ability to renew ClientToAMToken
[ https://issues.apache.org/jira/browse/YARN-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097167#comment-14097167 ] Xuan Gong commented on YARN-2383: - testcase failures are un-related. All of them are port binding problems. Add ability to renew ClientToAMToken Key: YARN-2383 URL: https://issues.apache.org/jira/browse/YARN-2383 Project: Hadoop YARN Issue Type: Bug Components: applications, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2383.preview.1.patch, YARN-2383.preview.2.patch, YARN-2383.preview.3.1.patch, YARN-2383.preview.3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2056) Disable preemption at Queue level
[ https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097194#comment-14097194 ] Eric Payne commented on YARN-2056: -- {quote} Could this be accomplished by changing {{yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity}} to be a per-queue value. Then for queues that we don't ever want to be preempted, we set {{max_ignored_over_capacity == (max_capacity/capacity)-1.0}}? Vinod, the motivation from my perspective is that we need a way to gradually phase in preemption and so being able to configure the queues in a way that prevents and/or gradually allows preemption seems desirable. {quote} [~nroberts], [~mayank_bansal], and [~vinodkv], Would the {{max_ignored_over_capacity}} property become something like {{yarn.resourcemanager.monitor.capacity.preemption.queue-path.max_ignored_over_capacity}}? For example, if the capacity scheduler were configured with 2 leaf queues, {{excalibur}} and {{brisingr}}, I would imagine that the {{max_ignored_over_capacity}} property name would look like this: {{yarn.resourcemanager.monitor.capacity.preemption.root.excalibur.max_ignored_over_capacity}} {{yarn.resourcemanager.monitor.capacity.preemption.root.brisingr.max_ignored_over_capacity}} Disable preemption at Queue level - Key: YARN-2056 URL: https://issues.apache.org/jira/browse/YARN-2056 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Mayank Bansal Assignee: Eric Payne We need to be able to disable preemption at individual queue level -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2409) Active to StandBy transition does not stop rmDispatcher that causes 1 AsyncDispatcher thread leak.
[ https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097221#comment-14097221 ] Eric Payne commented on YARN-2409: -- [~rohithsharma], thanks for the analysis and detailed description. +1 (non-binding) Active to StandBy transition does not stop rmDispatcher that causes 1 AsyncDispatcher thread leak. --- Key: YARN-2409 URL: https://issues.apache.org/jira/browse/YARN-2409 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0 Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Attachments: YARN-2409.patch {code} at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2380) The normalizeRequests method in SchedulerUtils always resets the vCore to 1
[ https://issues.apache.org/jira/browse/YARN-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097225#comment-14097225 ] Jian Fang commented on YARN-2380: - Any reason why you think DefaultResourceCalculator should care only about memory? Up to now, the resource is defined as memory, vcore. If DefaultResourceCalculator does not do anything about memory, it should pass through the vcore value instead of setting it to 1, which would lead to a lot of potential issues such as the case in Tez. The normalizeRequests method in SchedulerUtils always resets the vCore to 1 --- Key: YARN-2380 URL: https://issues.apache.org/jira/browse/YARN-2380 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jian Fang Priority: Critical Attachments: YARN-2380.patch I added some log info to the method normalizeRequest() as follows. public static void normalizeRequest( ResourceRequest ask, ResourceCalculator resourceCalculator, Resource clusterResource, Resource minimumResource, Resource maximumResource, Resource incrementResource) { LOG.info(Before request normalization, the ask capacity: + ask.getCapability()); Resource normalized = Resources.normalize( resourceCalculator, ask.getCapability(), minimumResource, maximumResource, incrementResource); LOG.info(After request normalization, the ask capacity: + normalized); ask.setCapability(normalized); } The resulted log showed that the vcore in ask was changed from 2 to 1. 2014-08-01 20:54:15,537 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils (IPC Server handler 4 on 9024): Before request normalization, the ask capacity: memory:1536, vCores:2 2014-08-01 20:54:15,537 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils (IPC Server handler 4 on 9024): After request normalization, the ask capacity: memory:1536, vCores:1 The root cause is the DefaultResourceCalculator calls Resources.createResource(normalizedMemory) to regenerate a new resource with vcore = 1. This bug is critical and it leads to the mismatch of the request resource and the container resource and many other potential issues if the user requests containers with vcore 1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2393) Fair Scheduler : Implement static fair share
[ https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2393: -- Attachment: YARN-2393-3.patch Rebase a new patch. [~kasha], for the reloading, if we want only update queues whose weights have been changed, it seems we need to change bundle of code as we need to compare the previous weight and current weight. I don't know whether this is a good problem. So in this patch, still keep the old way that does rootQueue.recomputeSteadyShares() once allocation file reloaded. Fair Scheduler : Implement static fair share Key: YARN-2393 URL: https://issues.apache.org/jira/browse/YARN-2393 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Attachments: YARN-2393-1.patch, YARN-2393-2.patch, YARN-2393-3.patch Static fair share is a fair share allocation considering all(active/inactive) queues.It would be shown on the UI for better predictability of finish time of applications. We would compute static fair share only when needed, like on queue creation, node added/removed. Please see YARN-2026 for discussions on this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2397) RM web interface sometimes returns request is a replay error in secure mode
[ https://issues.apache.org/jira/browse/YARN-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097244#comment-14097244 ] Varun Vasudev commented on YARN-2397: - Thanks for the feedback [~zjshen]. My thinking is that in secure mode, we should replace the AuthenticationFilterInitializer with the RMAuthenticationInitializer to add support for authentication using delegation tokens. In non-secure mode, the RMAuthenticationFilterInitializer and the AuthenticationFilterInitializer are the the same so there's no need for any replacement. However, in non-secure mode, we should have a default filter in case none is specified(so that users can use the rm web services), hence the code block for non-secure mode. RM web interface sometimes returns request is a replay error in secure mode --- Key: YARN-2397 URL: https://issues.apache.org/jira/browse/YARN-2397 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Critical Attachments: apache-yarn-2397.0.patch, apache-yarn-2397.1.patch The RM web interface sometimes returns a request is a replay error if the default kerberos http filter is enabled. This is because it uses the new RMAuthenticationFilter in addition to the AuthenticationFilter. There is a workaround to set yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled to false. This bug is to fix the code to use only the RMAuthenticationFilter and not both. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced
[ https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097266#comment-14097266 ] Jian He commented on YARN-2136: --- bq. Hence dispatcher queue draining shouldn't matter as ZKClient is already closed. After checking the code, I think we should flip the order closeInternal() and dispatcher.stop(); right? {code} protected void serviceStop() throws Exception { closeInternal(); dispatcher.stop(); } {code} RMStateStore can explicitly handle store/update events when fenced -- Key: YARN-2136 URL: https://issues.apache.org/jira/browse/YARN-2136 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He RMStateStore can choose to handle/ignore store/update events upfront instead of invoking more ZK operations if state store is at fenced state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097275#comment-14097275 ] Craig Welch commented on YARN-1198: --- So, it's possible to avoid iterating the applications in the queue and even the queue users if the antecedents of the headroom calculation are shared and updated at the queue level on change (qmaxcap...) and the final calculation is done during the heartbeat request / call to scheduler application attempt. It would just be a calculation over these resources some user specific values, should be reasonably performant, but it would move the final activity away from where it is today. Capacity Scheduler headroom calculation does not work as expected - Key: YARN-1198 URL: https://issues.apache.org/jira/browse/YARN-1198 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Craig Welch Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch Today headroom calculation (for the app) takes place only when * New node is added/removed from the cluster * New container is getting assigned to the application. However there are potentially lot of situations which are not considered for this calculation * If a container finishes then headroom for that application will change and should be notified to the AM accordingly. * If a single user has submitted multiple applications (app1 and app2) to the same queue then ** If app1's container finishes then not only app1's but also app2's AM should be notified about the change in headroom. ** Similarly if a container is assigned to any applications app1/app2 then both AM should be notified about their headroom. ** To simplify the whole communication process it is ideal to keep headroom per User per LeafQueue so that everyone gets the same picture (apps belonging to same user and submitted in same queue). * If a new user submits an application to the queue then all applications submitted by all users in that queue should be notified of the headroom change. * Also today headroom is an absolute number ( I think it should be normalized but then this is going to be not backward compatible..) * Also when admin user refreshes queue headroom has to be updated. These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2390) Investigating whehther generic history service needs to support queue-acls
[ https://issues.apache.org/jira/browse/YARN-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097280#comment-14097280 ] Sunil G commented on YARN-2390: --- Yes. I understood your idea, but completed apps can be there in RM for some more time (1 is default number of completed apps in RM). and ACL's will be applicable for these completed apps still. In History Server, behavior now is different for same completed app once its moved from RM. This was the only point i was thinking we may need to look to. What do you feel about this? Investigating whehther generic history service needs to support queue-acls -- Key: YARN-2390 URL: https://issues.apache.org/jira/browse/YARN-2390 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen According YARN-1250, it's arguable whether queue-acls should be applied to the generic history service as well, because the queue admin may not need the access to the completed application that is removed from the queue. Create this ticket to tackle the discussion around. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2056) Disable preemption at Queue level
[ https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097298#comment-14097298 ] Sunil G commented on YARN-2056: --- +1 This make sense. I have a doubt though. *max_ignored_over_capacity* will help to avoid the jitter when container sizes varies and some times we do a little more/less preemption from leaf queue than its defined capacity. So more or less its boil down to the resource size of containers. per-queue configuration for max_ignored_over_capacity will definitely give more control than now, but still if heterogeneous applications (in terms of container resource) keeps running same queue, it may be still hard to get a correct value. Disable preemption at Queue level - Key: YARN-2056 URL: https://issues.apache.org/jira/browse/YARN-2056 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Mayank Bansal Assignee: Eric Payne We need to be able to disable preemption at individual queue level -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced
[ https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097341#comment-14097341 ] Sunil G commented on YARN-2136: --- Yes. I also feel we need to flip the order for dispatcher.stop. RMStateStore can explicitly handle store/update events when fenced -- Key: YARN-2136 URL: https://issues.apache.org/jira/browse/YARN-2136 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He RMStateStore can choose to handle/ignore store/update events upfront instead of invoking more ZK operations if state store is at fenced state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2393) Fair Scheduler : Implement static fair share
[ https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097359#comment-14097359 ] Hadoop QA commented on YARN-2393: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661745/YARN-2393-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4623//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4623//console This message is automatically generated. Fair Scheduler : Implement static fair share Key: YARN-2393 URL: https://issues.apache.org/jira/browse/YARN-2393 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Attachments: YARN-2393-1.patch, YARN-2393-2.patch, YARN-2393-3.patch Static fair share is a fair share allocation considering all(active/inactive) queues.It would be shown on the UI for better predictability of finish time of applications. We would compute static fair share only when needed, like on queue creation, node added/removed. Please see YARN-2026 for discussions on this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-1857: -- Attachment: YARN-1857.1.patch Just updating to a patch which applies against current trunk, otherwise unchanged CapacityScheduler headroom doesn't account for other AM's running - Key: YARN-1857 URL: https://issues.apache.org/jira/browse/YARN-1857 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Chen He Priority: Critical Attachments: YARN-1857.1.patch, YARN-1857.patch, YARN-1857.patch, YARN-1857.patch Its possible to get an application to hang forever (or a long time) in a cluster with multiple users. The reason why is that the headroom sent to the application is based on the user limit but it doesn't account for other Application masters using space in that queue. So the headroom (user limit - user consumed) can be 0 even though the cluster is 100% full because the other space is being used by application masters from other users. For instance if you have a cluster with 1 queue, user limit is 100%, you have multiple users submitting applications. One very large application by user 1 starts up, runs most of its maps and starts running reducers. other users try to start applications and get their application masters started but not tasks. The very large application then gets to the point where it has consumed the rest of the cluster resources with all reduces. But at this point it needs to still finish a few maps. The headroom being sent to this application is only based on the user limit (which is 100% of the cluster capacity) its using lets say 95% of the cluster for reduces and then other 5% is being used by other users running application masters. The MRAppMaster thinks it still has 5% so it doesn't know that it should kill a reduce in order to run a map. This can happen in other scenarios also. Generally in a large cluster with multiple queues this shouldn't cause a hang forever but it could cause the application to take much longer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-281) Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits
[ https://issues.apache.org/jira/browse/YARN-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reassigned YARN-281: Assignee: Wangda Tan (was: Harsh J) Sorry on delay, reassigned. Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits - Key: YARN-281 URL: https://issues.apache.org/jira/browse/YARN-281 Project: Hadoop YARN Issue Type: Test Components: scheduler Affects Versions: 2.0.0-alpha Reporter: Harsh J Assignee: Wangda Tan Labels: test We currently have tests that test MINIMUM_ALLOCATION limits for FifoScheduler and the likes, but no test for MAXIMUM_ALLOCATION yet. We should add a test to prevent regressions of any kind on such limits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2390) Investigating whehther generic history service needs to support queue-acls
[ https://issues.apache.org/jira/browse/YARN-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097385#comment-14097385 ] Zhijie Shen commented on YARN-2390: --- bq. but completed apps can be there in RM for some more time (1 is default number of completed apps in RM). and ACL's will be applicable for these completed apps still. [~sunilg], that's a good point. I agree it would be nice if RM and GHS have consistent access control for finished application. However, if it's reasonable that the queue admin shouldn't have the access to the complete app which is removed from the queue, is the right fix to be correcting the ACLs on RM side? One related issue is that while CLI will check the user's ACLs properly, neither GET APIs nor web UI honor the ACLs completely at RM side (therefore, I filed YARN-2310 and YARN-2311 before). Investigating whehther generic history service needs to support queue-acls -- Key: YARN-2390 URL: https://issues.apache.org/jira/browse/YARN-2390 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen According YARN-1250, it's arguable whether queue-acls should be applied to the generic history service as well, because the queue admin may not need the access to the completed application that is removed from the queue. Create this ticket to tackle the discussion around. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced
[ https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097402#comment-14097402 ] Varun Saxena commented on YARN-2136: Yes, completely agree with you [~jianhe]. dispatcher.stop() will cause events in dispatcher queue(if any) to be processed first. These events would be lost if we call closeInternal() before dispatcher.stop() RMStateStore can explicitly handle store/update events when fenced -- Key: YARN-2136 URL: https://issues.apache.org/jira/browse/YARN-2136 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He RMStateStore can choose to handle/ignore store/update events upfront instead of invoking more ZK operations if state store is at fenced state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Venkatraman Krishnan updated YARN-2378: --- Attachment: YARN-2378.patch Good suggestion [~jianhe]. Uploading an updated pitch that has the fix. Adding support for moving apps between queues in Capacity Scheduler --- Key: YARN-2378 URL: https://issues.apache.org/jira/browse/YARN-2378 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Subramaniam Venkatraman Krishnan Assignee: Subramaniam Venkatraman Krishnan Labels: capacity-scheduler Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch, YARN-2378.patch, YARN-2378.patch As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 to smaller patches for manageability. This JIRA will address adding support for moving apps between queues in Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced
[ https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097513#comment-14097513 ] Varun Saxena commented on YARN-2136: However, if we do flip the order of these statements, I think we can then have a FENCED state because when we stop the dispatcher queue, it will be drained first and hence the pending events will be processed first. In this case, store/update will be sent to ZK. What's your opinion, [~jianhe] and [~sunilg] ? RMStateStore can explicitly handle store/update events when fenced -- Key: YARN-2136 URL: https://issues.apache.org/jira/browse/YARN-2136 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He RMStateStore can choose to handle/ignore store/update events upfront instead of invoking more ZK operations if state store is at fenced state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced
[ https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097548#comment-14097548 ] Jian He commented on YARN-2136: --- bq. it will be drained first and hence the pending events will be processed first. we are supposed to handle these pending events. right? RMStateStore can explicitly handle store/update events when fenced -- Key: YARN-2136 URL: https://issues.apache.org/jira/browse/YARN-2136 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He RMStateStore can choose to handle/ignore store/update events upfront instead of invoking more ZK operations if state store is at fenced state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced
[ https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097582#comment-14097582 ] Varun Saxena commented on YARN-2136: Ideally these events should be processed. But if Store is already fenced, I guess NoAuthException will again be reported by ZK, so processing this event wont lead to any useful operation. RMStateStore can explicitly handle store/update events when fenced -- Key: YARN-2136 URL: https://issues.apache.org/jira/browse/YARN-2136 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He RMStateStore can choose to handle/ignore store/update events upfront instead of invoking more ZK operations if state store is at fenced state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2385) Adding support for listing all applications in a queue
[ https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097631#comment-14097631 ] Zhijie Shen commented on YARN-2385: --- bq. May be two separate apis (getRunningAppsInQueue, getPendingAppsInQueue) with common behavior across CS/Fair could be a better approach. +1 for getRunningAppsInQueue + getPendingAppsInQueue, which sounds more flexible to get each individual metric than a sum. Previously, getAppsInQueue is used for getQueueInfo and getApplications. In the former use case, we can replace it with getRunningAppsInQueue + getPendingAppsInQueue, while in the latter one, it's not accurate enough to only include the apps inside the queue, but it's a separate issue. Adding support for listing all applications in a queue -- Key: YARN-2385 URL: https://issues.apache.org/jira/browse/YARN-2385 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler Reporter: Subramaniam Venkatraman Krishnan Assignee: Karthik Kambatla Labels: abstractyarnscheduler This JIRA proposes adding a method in AbstractYarnScheduler to get all the pending/active applications. Fair scheduler already supports moving a single application from one queue to another. Support for the same is being added to Capacity Scheduler as part of YARN-2378 and YARN-2248. So with the addition of this method, we can transparently add support for moving all applications from source queue to target queue and draining a queue, i.e. killing all applications in a queue as proposed by YARN-2389 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2397) RM web interface sometimes returns request is a replay error in secure mode
[ https://issues.apache.org/jira/browse/YARN-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097652#comment-14097652 ] Zhijie Shen commented on YARN-2397: --- Make sense to me. I'll commit the patch. RM web interface sometimes returns request is a replay error in secure mode --- Key: YARN-2397 URL: https://issues.apache.org/jira/browse/YARN-2397 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Critical Attachments: apache-yarn-2397.0.patch, apache-yarn-2397.1.patch The RM web interface sometimes returns a request is a replay error if the default kerberos http filter is enabled. This is because it uses the new RMAuthenticationFilter in addition to the AuthenticationFilter. There is a workaround to set yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled to false. This bug is to fix the code to use only the RMAuthenticationFilter and not both. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2397) RM and TS web interfaces sometimes return request is a replay error in secure mode
[ https://issues.apache.org/jira/browse/YARN-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2397: -- Summary: RM and TS web interfaces sometimes return request is a replay error in secure mode (was: RM web interface sometimes returns request is a replay error in secure mode) RM and TS web interfaces sometimes return request is a replay error in secure mode -- Key: YARN-2397 URL: https://issues.apache.org/jira/browse/YARN-2397 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Critical Fix For: 2.6.0 Attachments: apache-yarn-2397.0.patch, apache-yarn-2397.1.patch The RM web interface sometimes returns a request is a replay error if the default kerberos http filter is enabled. This is because it uses the new RMAuthenticationFilter in addition to the AuthenticationFilter. There is a workaround to set yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled to false. This bug is to fix the code to use only the RMAuthenticationFilter and not both. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2397) RM and TS web interfaces sometimes return request is a replay error in secure mode
[ https://issues.apache.org/jira/browse/YARN-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2397: -- Description: The RM web interface sometimes returns a request is a replay error if the default kerberos http filter is enabled. This is because it uses the new RMAuthenticationFilter in addition to the AuthenticationFilter. There is a workaround to set yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled to false. This bug is to fix the code to use only the RMAuthenticationFilter and not both. The similar problem happens to the timeline server web interface as well. was:The RM web interface sometimes returns a request is a replay error if the default kerberos http filter is enabled. This is because it uses the new RMAuthenticationFilter in addition to the AuthenticationFilter. There is a workaround to set yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled to false. This bug is to fix the code to use only the RMAuthenticationFilter and not both. RM and TS web interfaces sometimes return request is a replay error in secure mode -- Key: YARN-2397 URL: https://issues.apache.org/jira/browse/YARN-2397 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Critical Fix For: 2.6.0 Attachments: apache-yarn-2397.0.patch, apache-yarn-2397.1.patch The RM web interface sometimes returns a request is a replay error if the default kerberos http filter is enabled. This is because it uses the new RMAuthenticationFilter in addition to the AuthenticationFilter. There is a workaround to set yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled to false. This bug is to fix the code to use only the RMAuthenticationFilter and not both. The similar problem happens to the timeline server web interface as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2397) RM and TS web interfaces sometimes return request is a replay error in secure mode
[ https://issues.apache.org/jira/browse/YARN-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097688#comment-14097688 ] Hudson commented on YARN-2397: -- FAILURE: Integrated in Hadoop-trunk-Commit #6071 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6071/]) YARN-2397. Avoided loading two authentication filters for RM and TS web interfaces. Contributed by Varun Vasudev. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1618054) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryServer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/http/RMAuthenticationFilterInitializer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesDelegationTokenAuthentication.java RM and TS web interfaces sometimes return request is a replay error in secure mode -- Key: YARN-2397 URL: https://issues.apache.org/jira/browse/YARN-2397 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Critical Fix For: 2.6.0 Attachments: apache-yarn-2397.0.patch, apache-yarn-2397.1.patch The RM web interface sometimes returns a request is a replay error if the default kerberos http filter is enabled. This is because it uses the new RMAuthenticationFilter in addition to the AuthenticationFilter. There is a workaround to set yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled to false. This bug is to fix the code to use only the RMAuthenticationFilter and not both. The similar problem happens to the timeline server web interface as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2365) TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry fails on branch-2
[ https://issues.apache.org/jira/browse/YARN-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-2365: Description: TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails on branch-2 with the following errror {noformat} Running org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 46.471 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart testShouldNotCountFailureToMaxAttemptRetry(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart) Time elapsed: 46.354 sec FAILURE! java.lang.AssertionError: AppAttempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:414) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:569) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:576) at org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:389) {noformat} was: TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails on branch with the following errror {noformat} Running org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 46.471 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart testShouldNotCountFailureToMaxAttemptRetry(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart) Time elapsed: 46.354 sec FAILURE! java.lang.AssertionError: AppAttempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:414) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:569) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:576) at org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:389) {noformat} TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry fails on branch-2 -- Key: YARN-2365 URL: https://issues.apache.org/jira/browse/YARN-2365 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Mit Desai TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails on branch-2 with the following errror {noformat} Running org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 46.471 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart testShouldNotCountFailureToMaxAttemptRetry(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart) Time elapsed: 46.354 sec FAILURE! java.lang.AssertionError: AppAttempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:414) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:569) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:576) at org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:389) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097737#comment-14097737 ] Eric Payne commented on YARN-415: - [~jianhe], Thank you very much for reviewing this patch. {quote} - we can reuse the previous rmAttempt and resource object {code} RMAppAttempt rmAttempt = container.rmContext.getRMApps() .get(container.getApplicationAttemptId().getApplicationId()) .getRMAppAttempt(container.getApplicationAttemptId()); Resource resource = container.getContainer().getResource(); {code} {quote} I will reuse the Resource object, but I'm not sure if I can reuse the RMAppAttempt object. In the following code snippet, the preemption path is always updating the attempt metrics for the current app attempt. In the chargeback (resource utilization metrics) path, that's not always what we want. Containers do not always complete before a current attempt dies and a new one is started. If this happens, the chargeback path should update the metrics for the first attempt, not the second one. The call to {{...getRMAppAttempt(container.getApplicationAttemptId())}} will always get the attempt that started the container. Now that I think about it, it seems like that is what we want in the preemption path as well. [~leftnoteasy], can you please comment? If the preemption path should update the preemption info for the attempt that started the finished container, then we can reuse the RMAppAttempt object for both paths. {code} if (ContainerExitStatus.PREEMPTED == container.finishedStatus .getExitStatus()) { Resource resource = container.getContainer().getResource(); RMAppAttempt rmAttempt = container.rmContext.getRMApps() .get(container.getApplicationAttemptId().getApplicationId()) .getCurrentAppAttempt(); rmAttempt.getRMAppAttemptMetrics().updatePreemptionInfo(resource, container); } RMAppAttempt rmAttempt = container.rmContext.getRMApps() .get(container.getApplicationAttemptId().getApplicationId()) .getRMAppAttempt(container.getApplicationAttemptId()); {code} Capture memory utilization at the app-level for chargeback -- Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp Assignee: Andrey Klochkov Attachments: YARN-415--n10.patch, YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, YARN-415.201406262136.txt, YARN-415.201407042037.txt, YARN-415.201407071542.txt, YARN-415.201407171553.txt, YARN-415.201407172144.txt, YARN-415.201407232237.txt, YARN-415.201407242148.txt, YARN-415.201407281816.txt, YARN-415.201408062232.txt, YARN-415.201408080204.txt, YARN-415.201408092006.txt, YARN-415.201408132109.txt, YARN-415.patch For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. (reserved ram for container 1 * lifetime of container 1) + (reserved ram for container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) It'd be nice to have this at the app level instead of the job level because: 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). This new metric should be available both through the RM UI and RM Web Services REST API. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1918) Typo in description and error message for 'yarn.resourcemanager.cluster-id'
[ https://issues.apache.org/jira/browse/YARN-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097753#comment-14097753 ] Hudson commented on YARN-1918: -- FAILURE: Integrated in Hadoop-trunk-Commit #6073 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6073/]) YARN-1918. Typo in description and error message for yarn.resourcemanager.cluster-id (Anandha L Ranganathan via aw) (aw: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1618070) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml Typo in description and error message for 'yarn.resourcemanager.cluster-id' --- Key: YARN-1918 URL: https://issues.apache.org/jira/browse/YARN-1918 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Devaraj K Assignee: Anandha L Ranganathan Priority: Trivial Labels: newbie Fix For: 3.0.0, 2.6.0 Attachments: YARN-1918.1.patch 1. In yarn-default.xml {code:xml} property descriptionName of the cluster. In a HA setting, this is used to ensure the RM participates in leader election fo this cluster and ensures it does not affect other clusters/description nameyarn.resourcemanager.cluster-id/name !--valueyarn-cluster/value-- /property {code} Here the line 'election fo this cluster and ensures it does not affect' should be replaced with 'election for this cluster and ensures it does not affect'. 2. {code:xml} org.apache.hadoop.HadoopIllegalArgumentException: Configuration doesn't specifyyarn.resourcemanager.cluster-id at org.apache.hadoop.yarn.conf.YarnConfiguration.getClusterId(YarnConfiguration.java:1336) {code} In the above exception message, it is missing a space between message and configuration name. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2197) Add a link to YARN CHANGES.txt in the left side of doc
[ https://issues.apache.org/jira/browse/YARN-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097752#comment-14097752 ] Hudson commented on YARN-2197: -- FAILURE: Integrated in Hadoop-trunk-Commit #6073 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6073/]) YARN-2197. Add a link to YARN CHANGES.txt in the left side of doc (Akira AJISAKA via aw) (aw: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1618066) * /hadoop/common/trunk/hadoop-project/src/site/site.xml * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Add a link to YARN CHANGES.txt in the left side of doc -- Key: YARN-2197 URL: https://issues.apache.org/jira/browse/YARN-2197 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Fix For: 3.0.0, 2.6.0 Attachments: YARN-2197.patch Now there're the links to Common, HDFS and MapReduce CHANGES.txt in the left side of the document (hadoop-project/src/site/site.xml), but YARN does not exist. {code} item name=Common CHANGES.txt href=hadoop-project-dist/hadoop-common/CHANGES.txt/ item name=HDFS CHANGES.txt href=hadoop-project-dist/hadoop-hdfs/CHANGES.txt/ item name=MapReduce CHANGES.txt href=hadoop-project-dist/hadoop-mapreduce/CHANGES.txt/ item name=Metrics href=hadoop-project-dist/hadoop-common/Metrics.html/ {code} A link to YARN CHANGES.txt should be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2056) Disable preemption at Queue level
[ https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097823#comment-14097823 ] Nathan Roberts commented on YARN-2056: -- [~sunilg] I'm not following the doubt. It may still be hard to get to a correct value for what exactly? As far as completely disabling preemption for the queue, that should just be a matter of setting max_ignored_over_capacity to a sufficiently large value. To disable, it has to be at least ((max_capacity/capacity)-1) but it could just as well be something quite large and that would effectively prevent preemption. I guess I'm saying it doesn't have to be ultra precise. Disable preemption at Queue level - Key: YARN-2056 URL: https://issues.apache.org/jira/browse/YARN-2056 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Mayank Bansal Assignee: Eric Payne We need to be able to disable preemption at individual queue level -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2420) Fair Scheduler: change yarn.scheduler.fair.assignmultiple from boolean to integer
Wei Yan created YARN-2420: - Summary: Fair Scheduler: change yarn.scheduler.fair.assignmultiple from boolean to integer Key: YARN-2420 URL: https://issues.apache.org/jira/browse/YARN-2420 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2420) Fair Scheduler: change yarn.scheduler.fair.assignmultiple from boolean to integer
[ https://issues.apache.org/jira/browse/YARN-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097937#comment-14097937 ] Sandy Ryza commented on YARN-2420: -- Does yarn.scheduler.fair.max.assign satisfy what you're looking for? Fair Scheduler: change yarn.scheduler.fair.assignmultiple from boolean to integer - Key: YARN-2420 URL: https://issues.apache.org/jira/browse/YARN-2420 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2420) Fair Scheduler: change yarn.scheduler.fair.assignmultiple from boolean to integer
[ https://issues.apache.org/jira/browse/YARN-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097941#comment-14097941 ] Wei Yan commented on YARN-2420: --- yes, my mistake. Just saw the max.assign field. I'll change this jira for another maxassign feature which automatically update the value for max.assign, based on current cluster load. Fair Scheduler: change yarn.scheduler.fair.assignmultiple from boolean to integer - Key: YARN-2420 URL: https://issues.apache.org/jira/browse/YARN-2420 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2420) Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on cluster load
[ https://issues.apache.org/jira/browse/YARN-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2420: -- Summary: Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on cluster load (was: Fair Scheduler: change yarn.scheduler.fair.assignmultiple from boolean to integer) Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on cluster load --- Key: YARN-2420 URL: https://issues.apache.org/jira/browse/YARN-2420 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1918) Typo in description and error message for 'yarn.resourcemanager.cluster-id'
[ https://issues.apache.org/jira/browse/YARN-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097951#comment-14097951 ] Tsuyoshi OZAWA commented on YARN-1918: -- Thanks for your review, Allen. Typo in description and error message for 'yarn.resourcemanager.cluster-id' --- Key: YARN-1918 URL: https://issues.apache.org/jira/browse/YARN-1918 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Devaraj K Assignee: Anandha L Ranganathan Priority: Trivial Labels: newbie Fix For: 3.0.0, 2.6.0 Attachments: YARN-1918.1.patch 1. In yarn-default.xml {code:xml} property descriptionName of the cluster. In a HA setting, this is used to ensure the RM participates in leader election fo this cluster and ensures it does not affect other clusters/description nameyarn.resourcemanager.cluster-id/name !--valueyarn-cluster/value-- /property {code} Here the line 'election fo this cluster and ensures it does not affect' should be replaced with 'election for this cluster and ensures it does not affect'. 2. {code:xml} org.apache.hadoop.HadoopIllegalArgumentException: Configuration doesn't specifyyarn.resourcemanager.cluster-id at org.apache.hadoop.yarn.conf.YarnConfiguration.getClusterId(YarnConfiguration.java:1336) {code} In the above exception message, it is missing a space between message and configuration name. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097958#comment-14097958 ] Jian He commented on YARN-2378: --- looks good to me , resubmitting the same patch to kick jenkins Adding support for moving apps between queues in Capacity Scheduler --- Key: YARN-2378 URL: https://issues.apache.org/jira/browse/YARN-2378 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Subramaniam Venkatraman Krishnan Assignee: Subramaniam Venkatraman Krishnan Labels: capacity-scheduler Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch, YARN-2378.patch, YARN-2378.patch As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 to smaller patches for manageability. This JIRA will address adding support for moving apps between queues in Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2420) Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on cluster load
[ https://issues.apache.org/jira/browse/YARN-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097961#comment-14097961 ] Sandy Ryza commented on YARN-2420: -- Cool. Regarding adjusting maxassign dynamically, my view has been that this isn't needed when continuous scheduling is turned on, and eventually we expect everyone to switch over to continuous scheduling. Thoughts? Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on cluster load --- Key: YARN-2420 URL: https://issues.apache.org/jira/browse/YARN-2420 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2420) Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on cluster load
[ https://issues.apache.org/jira/browse/YARN-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097965#comment-14097965 ] Wei Yan commented on YARN-2420: --- For continuous scheduling, yes, we don't need maxAttempt, and always assign one container to one node for each round. Where currently continuous scheduling assigns maxAttempt containers per node. Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on cluster load --- Key: YARN-2420 URL: https://issues.apache.org/jira/browse/YARN-2420 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1959) Fix headroom calculation in Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097966#comment-14097966 ] Karthik Kambatla commented on YARN-1959: Would it make more sense to have it to be {{queue-fair-share - queue-consumed}}? Now that the fairshare is instantaneous, it is the maximum resources the app can safely expect to get. No? Fix headroom calculation in Fair Scheduler -- Key: YARN-1959 URL: https://issues.apache.org/jira/browse/YARN-1959 Project: Hadoop YARN Issue Type: Bug Reporter: Sandy Ryza Assignee: Anubhav Dhoot The Fair Scheduler currently always sets the headroom to 0. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2411) [Capacity Scheduler] support simple user and group mappings to queues
[ https://issues.apache.org/jira/browse/YARN-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ram Venkatesh updated YARN-2411: Description: YARN-2257 has a proposal to extend and share the queue placement rules for the fair scheduler and the capacity scheduler. This is a good long term solution to streamline queue placement of both schedulers but it has core infra work that has to happen first and might require changes to current features in all schedulers along with corresponding configuration changes, if any. I would like to propose a change with a smaller scope in the capacity scheduler that addresses the core use cases for implicitly mapping jobs that have the default queue or no queue specified to specific queues based on the submitting user and user groups. It will be useful in a number of real-world scenarios and can be migrated over to the unified scheme when YARN-2257 becomes available. The proposal is to add two new configuration options: yarn.scheduler.capacity.queue-mappings-override.enable A boolean that controls if user-specified queues can be overridden by the mapping, default is false. and, yarn.scheduler.capacity.queue-mappings A string that specifies a list of mappings in the following format (default is which is the same as no mapping) map_specifier:source_attribute:queue_name[,map_specifier:source_attribute:queue_name]* map_specifier := user (u) | group (g) source_attribute := user | group | %user queue_name := the name of the mapped queue | %user | %primary_group The mappings will be evaluated left to right, and the first valid mapping will be used. If the mapped queue does not exist, or the current user does not have permissions to submit jobs to the mapped queue, the submission will fail. Example usages: 1. user1 is mapped to queue1, group1 is mapped to queue2 u:user1:queue1,g:group1:queue2 2. To map users to queues with the same name as the user: u:%user:%user I am happy to volunteer to take this up. was: YARN-2257 has a proposal to extend and share the queue placement rules for the fair scheduler and the capacity scheduler. This is a good long term solution to streamline queue placement of both schedulers but it has core infra work that has to happen first and might require changes to current features in all schedulers along with corresponding configuration changes, if any. I would like to propose a change with a smaller scope in the capacity scheduler that addresses the core use cases for implicitly mapping jobs that have the default queue or no queue specified to specific queues based on the submitting user and user groups. It will be useful in a number of real-world scenarios and can be migrated over to the unified scheme when YARN-2257 becomes available. The proposal is to add two new configuration options: yarn.scheduler.capacity.queue-mappings.enable A boolean that controls if queue mappings are enabled, default is false. and, yarn.scheduler.capacity.queue-mappings A string that specifies a list of mappings in the following format: map_specifier:source_attribute:queue_name[,map_specifier:source_attribute:queue_name]* map_specifier := user (u) | group (g) source_attribute := user | group | %user queue_name := the name of the mapped queue | %user | %primary_group The mappings will be evaluated left to right, and the first valid mapping will be used. If the mapped queue does not exist, or the current user does not have permissions to submit jobs to the mapped queue, the submission will fail. Example usages: 1. user1 is mapped to queue1, group1 is mapped to queue2 u:user1:queue1,g:group1:queue2 2. To map users to queues with the same name as the user: u:%user:%user I am happy to volunteer to take this up. [Capacity Scheduler] support simple user and group mappings to queues - Key: YARN-2411 URL: https://issues.apache.org/jira/browse/YARN-2411 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Ram Venkatesh Assignee: Ram Venkatesh YARN-2257 has a proposal to extend and share the queue placement rules for the fair scheduler and the capacity scheduler. This is a good long term solution to streamline queue placement of both schedulers but it has core infra work that has to happen first and might require changes to current features in all schedulers along with corresponding configuration changes, if any. I would like to propose a change with a smaller scope in the capacity scheduler that addresses the core use cases for implicitly mapping jobs that have the default queue or no queue specified to specific queues based on the submitting user and user groups. It will be useful in a number of real-world scenarios and can be migrated over to the unified
[jira] [Updated] (YARN-2411) [Capacity Scheduler] support simple user and group mappings to queues
[ https://issues.apache.org/jira/browse/YARN-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ram Venkatesh updated YARN-2411: Attachment: YARN-2411.1.patch This patch contains enables jobs to be submitted to queues based on mappings specified in the configuration file. The syntax of the mapping is in in the description of this JIRA. [Capacity Scheduler] support simple user and group mappings to queues - Key: YARN-2411 URL: https://issues.apache.org/jira/browse/YARN-2411 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Ram Venkatesh Assignee: Ram Venkatesh Attachments: YARN-2411.1.patch YARN-2257 has a proposal to extend and share the queue placement rules for the fair scheduler and the capacity scheduler. This is a good long term solution to streamline queue placement of both schedulers but it has core infra work that has to happen first and might require changes to current features in all schedulers along with corresponding configuration changes, if any. I would like to propose a change with a smaller scope in the capacity scheduler that addresses the core use cases for implicitly mapping jobs that have the default queue or no queue specified to specific queues based on the submitting user and user groups. It will be useful in a number of real-world scenarios and can be migrated over to the unified scheme when YARN-2257 becomes available. The proposal is to add two new configuration options: yarn.scheduler.capacity.queue-mappings-override.enable A boolean that controls if user-specified queues can be overridden by the mapping, default is false. and, yarn.scheduler.capacity.queue-mappings A string that specifies a list of mappings in the following format (default is which is the same as no mapping) map_specifier:source_attribute:queue_name[,map_specifier:source_attribute:queue_name]* map_specifier := user (u) | group (g) source_attribute := user | group | %user queue_name := the name of the mapped queue | %user | %primary_group The mappings will be evaluated left to right, and the first valid mapping will be used. If the mapped queue does not exist, or the current user does not have permissions to submit jobs to the mapped queue, the submission will fail. Example usages: 1. user1 is mapped to queue1, group1 is mapped to queue2 u:user1:queue1,g:group1:queue2 2. To map users to queues with the same name as the user: u:%user:%user I am happy to volunteer to take this up. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-415: Attachment: YARN-415.201408150030.txt {quote} - Can you please elaborate in what scenario we need the following extra check? {code} // Only add in the running containers if this is the active attempt. RMAppAttempt currentAttempt = rmContext.getRMApps() .get(attemptId.getApplicationId()).getCurrentAppAttempt(); if (currentAttempt != null currentAttempt.getAppAttemptId().compareTo(attemptId) == 0) { ApplicationResourceUsageReport appResUsageReport = rmContext .getScheduler().getAppResourceUsageReport(attemptId); if (appResUsageReport != null) { memorySeconds += appResUsageReport.getMemorySeconds(); vcoreSeconds += appResUsageReport.getVcoreSeconds(); } } {code} {quote} An app could have multiple attempts if, for example, the first attempt died in the middle and the RM starts a second attempt for this app. In that situation, when RMAppAttemptMetrics#getRMAppMetrics is called for the first attempt, we only want to report the info for the completed containers, and when it is called for the second (running) attempt, we want to report for both completed and running containers. Of course, this is a little misleading when you have work-preserving restart enabled, and the running containers didn't die with the first attempt. While they are running, they are reported as the metrics for the second attempt, but when they complete, their metrics go back into the first attempt. Since these metrics are only reported at the app level, I think this should be okay. The important thing is that the running metrics only get reported once and don't get double-counted. {quote} - Also, currentAttempt.getAppAttemptId().compareTo(attemptId) == 0, we can use equals instead which looks more intuitive. {quote} Good point. I made the change. {quote} - getFinishedMemorySeconds and getFinishedVcoreSeconds methods are not used. - For setFinishedVcoreSeconds and setFinishedMemorySeconds, we can just use updateResourceUtilization {quote} I used updateResourceUtilization as you suggested, and removed the getters and setters. {quote} - RMStateStore#removeApplication: no need to calculate the memory utilization when removing the app. Saving some cost for the loop of attempts {quote} Good catch. I removed this calculation. Capture memory utilization at the app-level for chargeback -- Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp Assignee: Andrey Klochkov Attachments: YARN-415--n10.patch, YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, YARN-415.201406262136.txt, YARN-415.201407042037.txt, YARN-415.201407071542.txt, YARN-415.201407171553.txt, YARN-415.201407172144.txt, YARN-415.201407232237.txt, YARN-415.201407242148.txt, YARN-415.201407281816.txt, YARN-415.201408062232.txt, YARN-415.201408080204.txt, YARN-415.201408092006.txt, YARN-415.201408132109.txt, YARN-415.201408150030.txt, YARN-415.patch For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. (reserved ram for container 1 * lifetime of container 1) + (reserved ram for container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) It'd be nice to have this at the app level instead of the job level because: 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). This new metric should be available both through the RM UI and RM Web Services REST API. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart
[ https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098024#comment-14098024 ] Jian He commented on YARN-2229: --- the latest patch seems not applying on trunk any more. Can you update please ? thx ContainerId can overflow with RM restart Key: YARN-2229 URL: https://issues.apache.org/jira/browse/YARN-2229 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2229.1.patch, YARN-2229.10.patch, YARN-2229.10.patch, YARN-2229.11.patch, YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch, YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch, YARN-2229.8.patch, YARN-2229.9.patch On YARN-2052, we changed containerId format: upper 10 bits are for epoch, lower 22 bits are for sequence number of Ids. This is for preserving semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM restarts 1024 times. To avoid the problem, its better to make containerId long. We need to define the new format of container Id with preserving backward compatibility on this JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2229) ContainerId can overflow with RM restart
[ https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2229: - Attachment: YARN-2229.12.patch Thanks for your notification, Jian. Refreshed a patch. ContainerId can overflow with RM restart Key: YARN-2229 URL: https://issues.apache.org/jira/browse/YARN-2229 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2229.1.patch, YARN-2229.10.patch, YARN-2229.10.patch, YARN-2229.11.patch, YARN-2229.12.patch, YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch, YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch, YARN-2229.8.patch, YARN-2229.9.patch On YARN-2052, we changed containerId format: upper 10 bits are for epoch, lower 22 bits are for sequence number of Ids. This is for preserving semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM restarts 1024 times. To avoid the problem, its better to make containerId long. We need to define the new format of container Id with preserving backward compatibility on this JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098088#comment-14098088 ] George Wong commented on YARN-1458: --- The regression is org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testIsStarvedForFairShare. I applied the patch to the latest trunk code, ran this UT in my local laptop. The UT always succeeds. I've also check the code, but could not figure out why the UT fails. Can anyone help? Thanks. In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely -- Key: YARN-1458 URL: https://issues.apache.org/jira/browse/YARN-1458 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Environment: Centos 2.6.18-238.19.1.el5 X86_64 hadoop2.2.0 Reporter: qingwu.fu Labels: patch Fix For: 2.2.1 Attachments: YARN-1458.patch Original Estimate: 408h Remaining Estimate: 408h The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when clients submit lots jobs, it is not easy to reapear. We run the test cluster for days to reapear it. The output of jstack command on resourcemanager pid: {code} ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 waiting for monitor entry [0x43aa9000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) - waiting to lock 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) at java.lang.Thread.run(Thread.java:744) …… FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 runnable [0x433a2000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) - locked 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282) - locked 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255) at java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1506: - Attachment: YARN-1506-v12.patch In v12 patch, - fix unit test failure for node reconnecting with resource update. - fix unit test failure for event cast. - fix findbug warning on synchronization. Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, YARN-1506-v11.patch, YARN-1506-v12.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Venkatraman Krishnan updated YARN-2378: --- Attachment: YARN-2378-1.patch Adding support for moving apps between queues in Capacity Scheduler --- Key: YARN-2378 URL: https://issues.apache.org/jira/browse/YARN-2378 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Subramaniam Venkatraman Krishnan Assignee: Subramaniam Venkatraman Krishnan Labels: capacity-scheduler Attachments: YARN-2378-1.patch, YARN-2378.patch, YARN-2378.patch, YARN-2378.patch, YARN-2378.patch, YARN-2378.patch As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 to smaller patches for manageability. This JIRA will address adding support for moving apps between queues in Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1371) FIFO scheduler to re-populate container allocation state
[ https://issues.apache.org/jira/browse/YARN-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1371: --- Fix Version/s: (was: 2.5.0) FIFO scheduler to re-populate container allocation state Key: YARN-1371 URL: https://issues.apache.org/jira/browse/YARN-1371 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running containers and the RM will pass this information to the schedulers along with the node information. The schedulers are currently already informed about previously running apps when the app data is recovered from the store. The scheduler is expected to be able to repopulate its allocation state from the above 2 sources of information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2020) observeOnly should be checked before any preemption computation started inside containerBasedPreemptOrKill() of ProportionalCapacityPreemptionPolicy.java
[ https://issues.apache.org/jira/browse/YARN-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2020: --- Fix Version/s: (was: 2.5.0) observeOnly should be checked before any preemption computation started inside containerBasedPreemptOrKill() of ProportionalCapacityPreemptionPolicy.java - Key: YARN-2020 URL: https://issues.apache.org/jira/browse/YARN-2020 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Environment: all Reporter: yeqi Priority: Trivial Attachments: YARN-2020.patch Original Estimate: 1m Remaining Estimate: 1m observeOnly should be checked in the very beginning of ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(), so that to avoid unnecessary workload. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2126) The FSLeafQueue.amResourceUsage shouldn't be updated when an Application removed before it runs AM
[ https://issues.apache.org/jira/browse/YARN-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2126: --- Fix Version/s: (was: 2.5.0) The FSLeafQueue.amResourceUsage shouldn't be updated when an Application removed before it runs AM -- Key: YARN-2126 URL: https://issues.apache.org/jira/browse/YARN-2126 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan When an application is removed, the FSLeafQueue updates its amResourceUsage. {code} if (runnableAppScheds.remove(app.getAppSchedulable())) { // Update AM resource usage if (app.getAMResource() != null) { Resources.subtractFrom(amResourceUsage, app.getAMResource()); } return true; } {code} If an application is removed before it has a chance to start its AM, the amResourceUsage shouldn't be updated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1369) Capacity scheduler to re-populate container allocation state
[ https://issues.apache.org/jira/browse/YARN-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1369: --- Fix Version/s: (was: 2.5.0) Capacity scheduler to re-populate container allocation state Key: YARN-1369 URL: https://issues.apache.org/jira/browse/YARN-1369 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running containers and the RM will pass this information to the schedulers along with the node information. The schedulers are currently already informed about previously running apps when the app data is recovered from the store. The scheduler is expected to be able to repopulate its allocation state from the above 2 sources of information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container
[ https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1621: --- Fix Version/s: (was: 2.5.0) 2.6.0 Add CLI to list rows of task attempt ID, container ID, host of container, state of container -- Key: YARN-1621 URL: https://issues.apache.org/jira/browse/YARN-1621 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Tassapol Athiapinya Fix For: 2.6.0 As more applications are moved to YARN, we need generic CLI to list rows of task attempt ID, container ID, host of container, state of container. Today if YARN application running in a container does hang, there is no way to find out more info because a user does not know where each attempt is running in. For each running application, it is useful to differentiate between running/succeeded/failed/killed containers. {code:title=proposed yarn cli} $ yarn application -list-containers -applicationId appId [-containerState state of container] where containerState is optional filter to list container in given state only. container state can be running/succeeded/killed/failed/all. A user can specify more than one container state at once e.g. KILLED,FAILED. task attempt ID container ID host of container state of container {code} CLI should work with running application/completed application. If a container runs many task attempts, all attempts should be shown. That will likely be the case of Tez container-reuse application. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS
[ https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-160: -- Fix Version/s: (was: 2.5.0) 2.6.0 nodemanagers should obtain cpu/memory values from underlying OS --- Key: YARN-160 URL: https://issues.apache.org/jira/browse/YARN-160 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Varun Vasudev Fix For: 2.6.0 As mentioned in YARN-2 *NM memory and CPU configs* Currently these values are coming from the config of the NM, we should be able to obtain those values from the OS (ie, in the case of Linux from /proc/meminfo /proc/cpuinfo). As this is highly OS dependent we should have an interface that obtains this information. In addition implementations of this interface should be able to specify a mem/cpu offset (amount of mem/cpu not to be avail as YARN resource), this would allow to reserve mem/cpu for the OS and other services outside of YARN containers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue
[ https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2113: --- Fix Version/s: (was: 2.5.0) 2.6.0 Add cross-user preemption within CapacityScheduler's leaf-queue --- Key: YARN-2113 URL: https://issues.apache.org/jira/browse/YARN-2113 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Fix For: 2.6.0 Preemption today only works across queues and moves around resources across queues per demand and usage. We should also have user-level preemption within a queue, to balance capacity across users in a predictable manner. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1156) Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values
[ https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1156: --- Fix Version/s: (was: 2.5.0) 2.6.0 Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values - Key: YARN-1156 URL: https://issues.apache.org/jira/browse/YARN-1156 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.1.0-beta Reporter: Akira AJISAKA Assignee: Tsuyoshi OZAWA Priority: Minor Labels: metrics, newbie Fix For: 2.6.0 Attachments: YARN-1156.1.patch AllocatedGB and AvailableGB metrics are now integer type. If there are four times 500MB memory allocation to container, AllocatedGB is incremented four times by {{(int)500/1024}}, which means 0. That is, the memory size allocated is actually 2000MB, but the metrics shows 0GB. Let's use float type for these metrics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1142) MiniYARNCluster web ui does not work properly
[ https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1142: --- Fix Version/s: (was: 2.5.0) 2.6.0 MiniYARNCluster web ui does not work properly - Key: YARN-1142 URL: https://issues.apache.org/jira/browse/YARN-1142 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Fix For: 2.6.0 When going to the RM http port, the NM web ui is displayed. It seems there is a singleton somewhere that breaks things when RM NMs run in the same process. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1327) Fix nodemgr native compilation problems on FreeBSD9
[ https://issues.apache.org/jira/browse/YARN-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1327: --- Fix Version/s: (was: 2.5.0) 2.6.0 Fix nodemgr native compilation problems on FreeBSD9 --- Key: YARN-1327 URL: https://issues.apache.org/jira/browse/YARN-1327 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Radim Kolar Assignee: Radim Kolar Fix For: 3.0.0, 2.6.0 Attachments: nodemgr-portability.txt There are several portability problems preventing from compiling native component on freebsd. 1. libgen.h is not included. correct function prototype is there but linux glibc has workaround to define it for user if libgen.h is not directly included. Include this file directly. 2. query max size of login name using sysconf. it follows same code style like rest of code using sysconf too. 3. cgroups are linux only feature, make conditional compile and return error if mount_cgroup is attempted on non linux OS 4. do not use posix function setpgrp() since it clashes with same function from BSD 4.2, use equivalent function. After inspecting glibc sources its just shortcut to setpgid(0,0) These changes makes it compile on both linux and freebsd. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-745) Move UnmanagedAMLauncher to yarn client package
[ https://issues.apache.org/jira/browse/YARN-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-745: -- Fix Version/s: (was: 2.5.0) 2.6.0 Move UnmanagedAMLauncher to yarn client package --- Key: YARN-745 URL: https://issues.apache.org/jira/browse/YARN-745 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Fix For: 2.6.0 Its currently sitting in yarn applications project which sounds wrong. client project sounds better since it contains the utilities/libraries that clients use to write and debug yarn applications. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-965) NodeManager Metrics containersRunning is not correct When localizing container process is failed or killed
[ https://issues.apache.org/jira/browse/YARN-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-965: -- Fix Version/s: (was: 2.5.0) 2.6.0 NodeManager Metrics containersRunning is not correct When localizing container process is failed or killed -- Key: YARN-965 URL: https://issues.apache.org/jira/browse/YARN-965 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.4-alpha Environment: suse linux Reporter: Li Yuan Fix For: 2.6.0 When successfully launched a container, container state from LOCALIZED to RUNNING, containersRunning ++. Container state from EXITED_WITH_FAILURE or KILLING to DONE, containersRunning--. However, state EXITED_WITH_FAILURE or KILLING could come from LOCALIZING(LOCALIZED), not RUNNING, which caused containersRunningis less than the actual number. Further more, Metrics is wrong, containersLaunched != containersCompleted + containersFailed + containersKilled + containersRunning + containersIniting -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-650) User guide for preemption
[ https://issues.apache.org/jira/browse/YARN-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-650: -- Fix Version/s: (was: 2.5.0) 2.6.0 User guide for preemption - Key: YARN-650 URL: https://issues.apache.org/jira/browse/YARN-650 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Chris Douglas Priority: Minor Fix For: 2.6.0 Attachments: Y650-0.patch YARN-45 added a protocol for the RM to ask back resources. The docs on writing YARN applications should include a section on how to interpret this message. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-113) WebAppProxyServlet must use SSLFactory for the HttpClient connections
[ https://issues.apache.org/jira/browse/YARN-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-113: -- Fix Version/s: (was: 2.5.0) 2.6.0 WebAppProxyServlet must use SSLFactory for the HttpClient connections - Key: YARN-113 URL: https://issues.apache.org/jira/browse/YARN-113 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.6.0 The HttpClient must be configured to use the SSLFactory when the web UIs are over HTTPS, otherwise the proxy servlet fails to connect to the AM because of unknown (self-signed) certificates. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1334) YARN should give more info on errors when running failed distributed shell command
[ https://issues.apache.org/jira/browse/YARN-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1334: --- Fix Version/s: (was: 2.5.0) 2.6.0 YARN should give more info on errors when running failed distributed shell command -- Key: YARN-1334 URL: https://issues.apache.org/jira/browse/YARN-1334 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Affects Versions: 2.3.0 Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-1334.1.patch Run incorrect command such as: /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distributedshell jar -shell_command ./test1.sh -shell_script ./ would show shell exit code exception with no useful message. It should print out sysout/syserr of containers/AM of why it is failing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2280) Resource manager web service fields are not accessible
[ https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2280: --- Fix Version/s: (was: 2.5.0) 2.6.0 Resource manager web service fields are not accessible -- Key: YARN-2280 URL: https://issues.apache.org/jira/browse/YARN-2280 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0, 2.4.1 Reporter: Krisztian Horvath Assignee: Krisztian Horvath Priority: Minor Fix For: 2.6.0 Attachments: YARN-2280.patch Using the resource manager's rest api (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some rest call returns a class where the fields after the unmarshal cannot be accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same classes on client side these fields only accessible via reflection. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1234) Container localizer logs are not created in secured cluster
[ https://issues.apache.org/jira/browse/YARN-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1234: --- Fix Version/s: (was: 2.5.0) 2.6.0 Container localizer logs are not created in secured cluster Key: YARN-1234 URL: https://issues.apache.org/jira/browse/YARN-1234 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Fix For: 2.6.0 When we are running ContainerLocalizer in secured cluster we potentially are not creating any log file to track log messages. This will be helpful in potentially identifying ContainerLocalization issues in secured cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
[ https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1514: --- Fix Version/s: (was: 2.5.0) 2.6.0 Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA Key: YARN-1514 URL: https://issues.apache.org/jira/browse/YARN-1514 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.6.0 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.wip-2.patch, YARN-1514.wip.patch ZKRMStateStore is very sensitive to ZNode-related operations as discussed in YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is called when RM-HA cluster does failover. Therefore, its execution time impacts failover time of RM-HA. We need utility to benchmark time execution time of ZKRMStateStore#loadStore as development tool. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-153) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS
[ https://issues.apache.org/jira/browse/YARN-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-153: -- Fix Version/s: (was: 2.5.0) 2.6.0 PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS Key: YARN-153 URL: https://issues.apache.org/jira/browse/YARN-153 Project: Hadoop YARN Issue Type: New Feature Reporter: Jacob Jaigak Song Assignee: Jacob Jaigak Song Fix For: 2.6.0 Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE4393.patch, MAPREDUCE4393.patch Original Estimate: 336h Time Spent: 336h Remaining Estimate: 0h This application is to demonstrate that YARN can be used for non-mapreduce applications. As Hadoop has already been adopted and deployed widely and its deployment in future will be highly increased, we thought that it's a good potential to be used as PaaS. I have implemented a proof of concept to demonstrate that YARN can be used as a PaaS (Platform as a Service). I have done a gap analysis against VMware's Cloud Foundry and tried to achieve as many PaaS functionalities as possible on YARN. I'd like to check in this POC as a YARN example application. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1723) AMRMClientAsync missing blacklist addition and removal functionality
[ https://issues.apache.org/jira/browse/YARN-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1723: --- Fix Version/s: (was: 2.5.0) 2.6.0 AMRMClientAsync missing blacklist addition and removal functionality Key: YARN-1723 URL: https://issues.apache.org/jira/browse/YARN-1723 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Bikas Saha Fix For: 2.6.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098201#comment-14098201 ] Hadoop QA commented on YARN-2378: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661993/YARN-2378-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4626//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4626//console This message is automatically generated. Adding support for moving apps between queues in Capacity Scheduler --- Key: YARN-2378 URL: https://issues.apache.org/jira/browse/YARN-2378 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Subramaniam Venkatraman Krishnan Assignee: Subramaniam Venkatraman Krishnan Labels: capacity-scheduler Attachments: YARN-2378-1.patch, YARN-2378.patch, YARN-2378.patch, YARN-2378.patch, YARN-2378.patch As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 to smaller patches for manageability. This JIRA will address adding support for moving apps between queues in Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)