[jira] [Commented] (YARN-2001) Persist NMs info for RM restart
[ https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985215#comment-13985215 ] Jian He commented on YARN-2001: --- Persisting all the Nodes info definitely brings more overhead in a large cluster. It's a question of whether persisting all the nodes or simply introducing some kind of 'safe period', only after then AM requests are accepted and the NMs registered after that period are treated as new NM. Persist NMs info for RM restart --- Key: YARN-2001 URL: https://issues.apache.org/jira/browse/YARN-2001 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He RM should not accept allocate requests from AMs until all the NMs have registered with RM. For that, RM needs to remember the previous NMs and wait for all the NMs to register. This is also useful for remembering decommissioned nodes across restarts. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1368) Common work to re-populate containers’ state into scheduler
[ https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1368: -- Attachment: YARN-1368.preliminary.patch Preliminary patch to re-populate RMContainer, schedulerNode, schedulerApplicationAttempt, appSchedulingInfo and Queue states. - ResourceTrackerService receives the containers info and send them to RMNode, which in turn sends container statuses to scheduler to do the recovery. - the majority of the recovery logic is AbstractYarnScheduler#recoverContainersOnNode() which recovers RMContainer, SchedulerNode,Queue. SchedulerApplicationAttempt, appSchedulingInfo accordingly. To do: - Noticed that FiCaSchedulerNode and FSSchedulerNode are almost the same. Any reason for keeping both ? thinking to merge the common methods into SchedulerNode. - RM_WORK_PRESERVING_RECOVERY_ENABLED will be used to guard against the new changes. - ContainerStatus sent in NM registration doesn’t capture enough information for re-constructing the containers. we may replace that with a new object or just adding more fields to encapsulate all the necessary information for re-constructing the container. - More changes on recover interfaces, edge cases and the transition logic in RMApp/RMAppAttempt - more thorough test cases. RMContainer, SchedulerNode and SchedulerApplicationAttempt, AppSchedulingInfo can be recovered in a common way. CSQueue and FSQueue may need to implements its own recoverContainer method Common work to re-populate containers’ state into scheduler --- Key: YARN-1368 URL: https://issues.apache.org/jira/browse/YARN-1368 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1368.preliminary.patch YARN-1367 adds support for the NM to tell the RM about all currently running containers upon registration. The RM needs to send this information to the schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover the current allocation state of the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1999) Move HistoryServerRest.apt.vm into the Mapreduce section
[ https://issues.apache.org/jira/browse/YARN-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985268#comment-13985268 ] Tsuyoshi OZAWA commented on YARN-1999: -- +1 for the change(non-binding). IMO, we should create MapReduce API section and MapredAppMasterRest.apt.vm should be also moved into the section. Move HistoryServerRest.apt.vm into the Mapreduce section Key: YARN-1999 URL: https://issues.apache.org/jira/browse/YARN-1999 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.4.0 Reporter: Ravi Prakash Now that we have the YARN HistoryServer, perhaps we should move HistoryServerRest.apt.vm into the MapReduce section where it really belongs? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1999) Move HistoryServerRest.apt.vm into the Mapreduce section
[ https://issues.apache.org/jira/browse/YARN-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1999: - Attachment: YARN-1999.1.patch Created MapReduce API section and moved MapRedAppMasterRest into hadoop-mapreduce-client-core and HistoryServerRest into hadoop-mapreduce-client-fs. Move HistoryServerRest.apt.vm into the Mapreduce section Key: YARN-1999 URL: https://issues.apache.org/jira/browse/YARN-1999 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.4.0 Reporter: Ravi Prakash Attachments: YARN-1999.1.patch Now that we have the YARN HistoryServer, perhaps we should move HistoryServerRest.apt.vm into the MapReduce section where it really belongs? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1999) Move HistoryServerRest.apt.vm into the Mapreduce section
[ https://issues.apache.org/jira/browse/YARN-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985313#comment-13985313 ] Hadoop QA commented on YARN-1999: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642614/YARN-1999.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3663//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3663//console This message is automatically generated. Move HistoryServerRest.apt.vm into the Mapreduce section Key: YARN-1999 URL: https://issues.apache.org/jira/browse/YARN-1999 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.4.0 Reporter: Ravi Prakash Assignee: Tsuyoshi OZAWA Attachments: YARN-1999.1.patch Now that we have the YARN HistoryServer, perhaps we should move HistoryServerRest.apt.vm into the MapReduce section where it really belongs? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985318#comment-13985318 ] Sunil G commented on YARN-1963: --- Thank you Sandy for the review. As you have mentioned, I will create these subtasks and will handle seperately. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Arun C Murthy It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2002) Support for passing Job priority through Application Submission Context in Mapreduce Side
Sunil G created YARN-2002: - Summary: Support for passing Job priority through Application Submission Context in Mapreduce Side Key: YARN-2002 URL: https://issues.apache.org/jira/browse/YARN-2002 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sunil G Job Prioirty can be set from client side as below [Configuration and api]. a. JobConf.getJobPriority() and Job.setPriority(JobPriority priority) b. We can also use configuration mapreduce.job.priority. Now this Job priority can be passed in Application Submission context from Client side. Here we can reuse the MRJobConfig.PRIORITY configuration. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2003) Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side]
Sunil G created YARN-2003: - Summary: Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side] Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2004) Job Priority scheduling support in Capacity scheduler
Sunil G created YARN-2004: - Summary: Job Priority scheduling support in Capacity scheduler Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1. Check for Application priority. If priority is available, then return the highest priority job. 2. Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2003) Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side]
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985340#comment-13985340 ] Sunil G commented on YARN-2003: --- ApplicationSubmissionContext already has an implementation for Priority. I am reusing the same code, hence no protocol changes are required. Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side] -- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2002) Support for passing Job priority through Application Submission Context in Mapreduce Side
[ https://issues.apache.org/jira/browse/YARN-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2002: -- Attachment: Yarn-2002.1.patch Test cases are not added here, intended to cover as part of Yarn-2004 Support for passing Job priority through Application Submission Context in Mapreduce Side - Key: YARN-2002 URL: https://issues.apache.org/jira/browse/YARN-2002 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Reporter: Sunil G Attachments: Yarn-2002.1.patch Job Prioirty can be set from client side as below [Configuration and api]. a. JobConf.getJobPriority() and Job.setPriority(JobPriority priority) b. We can also use configuration mapreduce.job.priority. Now this Job priority can be passed in Application Submission context from Client side. Here we can reuse the MRJobConfig.PRIORITY configuration. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1929) DeadLock in RM when automatic failover is enabled.
[ https://issues.apache.org/jira/browse/YARN-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985387#comment-13985387 ] Hudson commented on YARN-1929: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #556 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/556/]) YARN-1929. Fixed a deadlock in ResourceManager that occurs when failover happens right at the time of shutdown. Contributed by Karthik Kambatla. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1591071) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/service/CompositeService.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java DeadLock in RM when automatic failover is enabled. -- Key: YARN-1929 URL: https://issues.apache.org/jira/browse/YARN-1929 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Environment: Yarn HA cluster Reporter: Rohith Assignee: Karthik Kambatla Priority: Blocker Fix For: 2.4.1 Attachments: yarn-1929-1.patch, yarn-1929-2.patch Dead lock detected in RM when automatic failover is enabled. {noformat} Found one Java-level deadlock: = Thread-2: waiting to lock monitor 0x7fb514303cf0 (object 0xef153fd0, a org.apache.hadoop.ha.ActiveStandbyElector), which is held by main-EventThread main-EventThread: waiting to lock monitor 0x7fb514750a48 (object 0xef154020, a org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService), which is held by Thread-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-738) TestClientRMTokens is failing irregularly while running all yarn tests
[ https://issues.apache.org/jira/browse/YARN-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985386#comment-13985386 ] Hudson commented on YARN-738: - SUCCESS: Integrated in Hadoop-Yarn-trunk #556 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/556/]) YARN-738. TestClientRMTokens is failing irregularly while running all yarn tests. Contributed by Ming Ma (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1591030) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java TestClientRMTokens is failing irregularly while running all yarn tests -- Key: YARN-738 URL: https://issues.apache.org/jira/browse/YARN-738 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Ming Ma Fix For: 3.0.0, 2.5.0 Attachments: YARN-738.patch Running org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 16.787 sec FAILURE! testShortCircuitRenewCancel(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) Time elapsed: 186 sec ERROR! java.lang.RuntimeException: getProxy at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens$YarnBadRPC.getProxy(TestClientRMTokens.java:334) at org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:157) at org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:102) at org.apache.hadoop.security.token.Token.renew(Token.java:372) at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:306) at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancel(TestClientRMTokens.java:240) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1973) Make showing the timestamp consistently in the WebUI
[ https://issues.apache.org/jira/browse/YARN-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985455#comment-13985455 ] Tsuyoshi OZAWA commented on YARN-1973: -- Maybe is this a duplication ticket of YARN-570? Make showing the timestamp consistently in the WebUI Key: YARN-1973 URL: https://issues.apache.org/jira/browse/YARN-1973 Project: Hadoop YARN Issue Type: Improvement Components: webapp Affects Versions: 2.4.0 Reporter: Akira AJISAKA For example, the start time and finish time of an application in ResourceManager WebUI shows GMT timestamp like Wed, 23 Apr 2014 08:28:11 GMT but the start time of ResourceManager shows JST time like 23-4-2014 17:11:51. I want to make them like MapReduce JobHistory Server, which shows the timestamp in the user's default locale like Wed Apr 23 17:13:56 JST 2014. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1998) Change the time zone on the RM web UI to the local time zone
[ https://issues.apache.org/jira/browse/YARN-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985453#comment-13985453 ] Tsuyoshi OZAWA commented on YARN-1998: -- Thank you for taking this JIRA [~azuryy]. YARN-570 is tracking the same issue. How about closing this JIRA as duplicated and resume to work on YARN-570 because Harsh reviewed and commented against the same patch? Change the time zone on the RM web UI to the local time zone Key: YARN-1998 URL: https://issues.apache.org/jira/browse/YARN-1998 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: YARN-1998.patch It shows GMT time zone for 'startTime' and 'finishTime' on the RM web UI, we should show the local time zone. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-1973) Make showing the timestamp consistently in the WebUI
[ https://issues.apache.org/jira/browse/YARN-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA resolved YARN-1973. - Resolution: Duplicate Thanks [~ozawa] for pointing out. Closing this issue as duplicate. Make showing the timestamp consistently in the WebUI Key: YARN-1973 URL: https://issues.apache.org/jira/browse/YARN-1973 Project: Hadoop YARN Issue Type: Improvement Components: webapp Affects Versions: 2.4.0 Reporter: Akira AJISAKA For example, the start time and finish time of an application in ResourceManager WebUI shows GMT timestamp like Wed, 23 Apr 2014 08:28:11 GMT but the start time of ResourceManager shows JST time like 23-4-2014 17:11:51. I want to make them like MapReduce JobHistory Server, which shows the timestamp in the user's default locale like Wed Apr 23 17:13:56 JST 2014. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1999) Creating MapReduce REST API section
[ https://issues.apache.org/jira/browse/YARN-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1999: - Summary: Creating MapReduce REST API section (was: Move HistoryServerRest.apt.vm into the Mapreduce section) Creating MapReduce REST API section --- Key: YARN-1999 URL: https://issues.apache.org/jira/browse/YARN-1999 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.4.0 Reporter: Ravi Prakash Assignee: Tsuyoshi OZAWA Attachments: YARN-1999.1.patch Now that we have the YARN HistoryServer, perhaps we should move HistoryServerRest.apt.vm into the MapReduce section where it really belongs? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1999) Creating MapReduce REST API section
[ https://issues.apache.org/jira/browse/YARN-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1999: - Description: Now that we have the YARN HistoryServer, perhaps we should move HistoryServerRest.apt.vm and MapRedAppMasterRest.apt.vm into the MapReduce section where it really belongs? (was: Now that we have the YARN HistoryServer, perhaps we should move HistoryServerRest.apt.vm into the MapReduce section where it really belongs?) Creating MapReduce REST API section --- Key: YARN-1999 URL: https://issues.apache.org/jira/browse/YARN-1999 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.4.0 Reporter: Ravi Prakash Assignee: Tsuyoshi OZAWA Attachments: YARN-1999.1.patch Now that we have the YARN HistoryServer, perhaps we should move HistoryServerRest.apt.vm and MapRedAppMasterRest.apt.vm into the MapReduce section where it really belongs? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2002) Support for passing Job priority through Application Submission Context in Mapreduce Side
[ https://issues.apache.org/jira/browse/YARN-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985485#comment-13985485 ] Jason Lowe commented on YARN-2002: -- Moving this to MAPREDUCE since that's where the changes need to be made. Will link this issue back to YARN-1963. I think a small unit test should be added as part of this change to verify that when a priority is set the resulting submission context from YARNRunner has the appropriate priority setting. I suspect the tests in YARN-2004 will be more of an integration test rather than a unit test. Support for passing Job priority through Application Submission Context in Mapreduce Side - Key: YARN-2002 URL: https://issues.apache.org/jira/browse/YARN-2002 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Reporter: Sunil G Attachments: Yarn-2002.1.patch Job Prioirty can be set from client side as below [Configuration and api]. a. JobConf.getJobPriority() and Job.setPriority(JobPriority priority) b. We can also use configuration mapreduce.job.priority. Now this Job priority can be passed in Application Submission context from Client side. Here we can reuse the MRJobConfig.PRIORITY configuration. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2002) Support for passing Job priority through Application Submission Context in Mapreduce Side
[ https://issues.apache.org/jira/browse/YARN-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-2002: - Issue Type: Improvement (was: Sub-task) Parent: (was: YARN-1963) Support for passing Job priority through Application Submission Context in Mapreduce Side - Key: YARN-2002 URL: https://issues.apache.org/jira/browse/YARN-2002 Project: Hadoop YARN Issue Type: Improvement Components: api, resourcemanager Reporter: Sunil G Attachments: Yarn-2002.1.patch Job Prioirty can be set from client side as below [Configuration and api]. a. JobConf.getJobPriority() and Job.setPriority(JobPriority priority) b. We can also use configuration mapreduce.job.priority. Now this Job priority can be passed in Application Submission context from Client side. Here we can reuse the MRJobConfig.PRIORITY configuration. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1998) Change the time zone on the RM web UI to the local time zone
[ https://issues.apache.org/jira/browse/YARN-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985506#comment-13985506 ] Fengdong Yu commented on YARN-1998: --- oh,Thanks Tsuyoshi. I closed it as duplicate. Change the time zone on the RM web UI to the local time zone Key: YARN-1998 URL: https://issues.apache.org/jira/browse/YARN-1998 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: YARN-1998.patch It shows GMT time zone for 'startTime' and 'finishTime' on the RM web UI, we should show the local time zone. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-570) Time strings are formated in different timezone
[ https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985512#comment-13985512 ] Hadoop QA commented on YARN-570: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577997/MAPREDUCE-5141.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3664//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3664//console This message is automatically generated. Time strings are formated in different timezone --- Key: YARN-570 URL: https://issues.apache.org/jira/browse/YARN-570 Project: Hadoop YARN Issue Type: Bug Reporter: PengZhang Assignee: PengZhang Attachments: MAPREDUCE-5141.patch Time strings on different page are displayed in different timezone. If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as Wed, 10 Apr 2013 08:29:56 GMT If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 16:29:56 Same value, but different timezone. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-738) TestClientRMTokens is failing irregularly while running all yarn tests
[ https://issues.apache.org/jira/browse/YARN-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985541#comment-13985541 ] Hudson commented on YARN-738: - FAILURE: Integrated in Hadoop-Hdfs-trunk #1747 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1747/]) YARN-738. TestClientRMTokens is failing irregularly while running all yarn tests. Contributed by Ming Ma (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1591030) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java TestClientRMTokens is failing irregularly while running all yarn tests -- Key: YARN-738 URL: https://issues.apache.org/jira/browse/YARN-738 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Ming Ma Fix For: 3.0.0, 2.5.0 Attachments: YARN-738.patch Running org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 16.787 sec FAILURE! testShortCircuitRenewCancel(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) Time elapsed: 186 sec ERROR! java.lang.RuntimeException: getProxy at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens$YarnBadRPC.getProxy(TestClientRMTokens.java:334) at org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:157) at org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:102) at org.apache.hadoop.security.token.Token.renew(Token.java:372) at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:306) at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancel(TestClientRMTokens.java:240) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1929) DeadLock in RM when automatic failover is enabled.
[ https://issues.apache.org/jira/browse/YARN-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985542#comment-13985542 ] Hudson commented on YARN-1929: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1747 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1747/]) YARN-1929. Fixed a deadlock in ResourceManager that occurs when failover happens right at the time of shutdown. Contributed by Karthik Kambatla. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1591071) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/service/CompositeService.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java DeadLock in RM when automatic failover is enabled. -- Key: YARN-1929 URL: https://issues.apache.org/jira/browse/YARN-1929 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Environment: Yarn HA cluster Reporter: Rohith Assignee: Karthik Kambatla Priority: Blocker Fix For: 2.4.1 Attachments: yarn-1929-1.patch, yarn-1929-2.patch Dead lock detected in RM when automatic failover is enabled. {noformat} Found one Java-level deadlock: = Thread-2: waiting to lock monitor 0x7fb514303cf0 (object 0xef153fd0, a org.apache.hadoop.ha.ActiveStandbyElector), which is held by main-EventThread main-EventThread: waiting to lock monitor 0x7fb514750a48 (object 0xef154020, a org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService), which is held by Thread-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2005) Blacklisting support for scheduling AMs
Jason Lowe created YARN-2005: Summary: Blacklisting support for scheduling AMs Key: YARN-2005 URL: https://issues.apache.org/jira/browse/YARN-2005 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0, 0.23.10 Reporter: Jason Lowe It would be nice if the RM supported blacklisting a node for an AM launch after the same node fails a configurable number of AM attempts. This would be similar to the blacklisting support for scheduling task attempts in the MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985575#comment-13985575 ] Jason Lowe commented on YARN-2005: -- This is particularly helpful on a busy cluster where one node happens to be in a state where it can't launch containers for some reason but hasn't self-declared an UNHEALTHY state. In that scenario the only place with spare capacity is a node that fails every container attempt, and apps can fail due to the RM not realizing that repeated AM attempts on the same node aren't working. In that sense a fix for YARN-1073 could help quite a bit, but there could still be scenarios where a particular app's AMs end up failing on certain nodes but other containers run just fine. Blacklisting support for scheduling AMs --- Key: YARN-2005 URL: https://issues.apache.org/jira/browse/YARN-2005 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 0.23.10, 2.4.0 Reporter: Jason Lowe It would be nice if the RM supported blacklisting a node for an AM launch after the same node fails a configurable number of AM attempts. This would be similar to the blacklisting support for scheduling task attempts in the MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2006) Estimate Job Endtime
Maysam Yabandeh created YARN-2006: - Summary: Estimate Job Endtime Key: YARN-2006 URL: https://issues.apache.org/jira/browse/YARN-2006 Project: Hadoop YARN Issue Type: Sub-task Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler. As a prerequisite step, the AppMaster should estimate its end time and send it to the RM via the heartbeat. This jira focuses on how the AppMaster performs this estimation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2007) AM expressing the estimated endtime to RM
Maysam Yabandeh created YARN-2007: - Summary: AM expressing the estimated endtime to RM Key: YARN-2007 URL: https://issues.apache.org/jira/browse/YARN-2007 Project: Hadoop YARN Issue Type: Sub-task Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler, which requires RM to know about the estimated end time of jobs. The endtime is estimated by the AppMaster as part of YARN-2006. This jira focuses on API updates that allow AM to express this estimated value to the RM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2006) Estimate Job Endtime
[ https://issues.apache.org/jira/browse/YARN-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985582#comment-13985582 ] Maysam Yabandeh commented on YARN-2006: --- We built on top of the already existing estimator that the speculator uses to estimate the task end time, and add the logic to estimate the job end time based on the estimated end time of its tasks. The estimation has two steps: i) estimate the end time of the tasks that are not run yet, ii) estimate the end time of the running tasks. For the former we reuse the already existing logic in Speculator that estimates based on the mean of the previously executed tasks (if there is any). For the later, we again reuse the already existing logic that estimates the end time based on the current progress of the task attempts, and compute the minimum end time between the concurrent attempts that are being run speculatively. The end time would be the maximum end time of all the tasks. The overhead of estimation is O(tasks). To lower the overhead, we reuse the computed estimation in the last call (j_prev_end) unless it is not valid anymore. Upon each progress report for a task, we compare the task end-time estimation (t_end) with last estimation of its end time (t_prev_end). 1) if t_end j_prev_end = j_end = t_end 2) if t_end j_prev_end t_end = t_prev_end = j_end = j_prev_end 3) if t_end j_prev_end t_end t_prev_end = j_end = ? Only in case 3 we mark the job estimation invalid to be recomputed the next time on demand. Estimate Job Endtime Key: YARN-2006 URL: https://issues.apache.org/jira/browse/YARN-2006 Project: Hadoop YARN Issue Type: Sub-task Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler. As a prerequisite step, the AppMaster should estimate its end time and send it to the RM via the heartbeat. This jira focuses on how the AppMaster performs this estimation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2006) Estimate Job Endtime
[ https://issues.apache.org/jira/browse/YARN-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated YARN-2006: -- Attachment: YARN-1969.patch Estimate Job Endtime Key: YARN-2006 URL: https://issues.apache.org/jira/browse/YARN-2006 Project: Hadoop YARN Issue Type: Sub-task Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: YARN-1969.patch YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler. As a prerequisite step, the AppMaster should estimate its end time and send it to the RM via the heartbeat. This jira focuses on how the AppMaster performs this estimation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2006) Estimate Job Endtime
[ https://issues.apache.org/jira/browse/YARN-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985619#comment-13985619 ] Hadoop QA commented on YARN-2006: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642650/YARN-1969.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3665//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3665//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3665//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3665//console This message is automatically generated. Estimate Job Endtime Key: YARN-2006 URL: https://issues.apache.org/jira/browse/YARN-2006 Project: Hadoop YARN Issue Type: Sub-task Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: YARN-1969.patch YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler. As a prerequisite step, the AppMaster should estimate its end time and send it to the RM via the heartbeat. This jira focuses on how the AppMaster performs this estimation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1338) Recover localized resource cache state upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1338: - Attachment: YARN-1338v3-and-YARN-1987.patch Updating the patch to address the DBException handling that was brought up in the MAPREDUCE-5652 review and applies here. Note that this now depends upon YARN-1987 as that provides the utility wrapper for the leveldb iterator to translate raw RuntimeException to the more helpful DBException so we can act accordingly when errors occur. The other notable change in the patch is renaming LevelDB to Leveldb for consistency with the existing LeveldbTimelineStore naming convention. This latest patch includes the necessary pieces of YARN-1987 so it can compile and Jenkins can comment. Recover localized resource cache state upon nodemanager restart --- Key: YARN-1338 URL: https://issues.apache.org/jira/browse/YARN-1338 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1338.patch, YARN-1338v2.patch, YARN-1338v3-and-YARN-1987.patch Today when node manager restarts we clean up all the distributed cache files from disk. This is definitely not ideal from 2 aspects. * For work preserving restart we definitely want them as running containers are using them * For even non work preserving restart this will be useful in the sense that we don't have to download them again if needed by future tasks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-738) TestClientRMTokens is failing irregularly while running all yarn tests
[ https://issues.apache.org/jira/browse/YARN-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985630#comment-13985630 ] Hudson commented on YARN-738: - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1773 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1773/]) YARN-738. TestClientRMTokens is failing irregularly while running all yarn tests. Contributed by Ming Ma (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1591030) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java TestClientRMTokens is failing irregularly while running all yarn tests -- Key: YARN-738 URL: https://issues.apache.org/jira/browse/YARN-738 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Ming Ma Fix For: 3.0.0, 2.5.0 Attachments: YARN-738.patch Running org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 16.787 sec FAILURE! testShortCircuitRenewCancel(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) Time elapsed: 186 sec ERROR! java.lang.RuntimeException: getProxy at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens$YarnBadRPC.getProxy(TestClientRMTokens.java:334) at org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:157) at org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:102) at org.apache.hadoop.security.token.Token.renew(Token.java:372) at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:306) at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancel(TestClientRMTokens.java:240) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1929) DeadLock in RM when automatic failover is enabled.
[ https://issues.apache.org/jira/browse/YARN-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985631#comment-13985631 ] Hudson commented on YARN-1929: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1773 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1773/]) YARN-1929. Fixed a deadlock in ResourceManager that occurs when failover happens right at the time of shutdown. Contributed by Karthik Kambatla. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1591071) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/service/CompositeService.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java DeadLock in RM when automatic failover is enabled. -- Key: YARN-1929 URL: https://issues.apache.org/jira/browse/YARN-1929 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Environment: Yarn HA cluster Reporter: Rohith Assignee: Karthik Kambatla Priority: Blocker Fix For: 2.4.1 Attachments: yarn-1929-1.patch, yarn-1929-2.patch Dead lock detected in RM when automatic failover is enabled. {noformat} Found one Java-level deadlock: = Thread-2: waiting to lock monitor 0x7fb514303cf0 (object 0xef153fd0, a org.apache.hadoop.ha.ActiveStandbyElector), which is held by main-EventThread main-EventThread: waiting to lock monitor 0x7fb514750a48 (object 0xef154020, a org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService), which is held by Thread-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
Chen He created YARN-2008: - Summary: CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure Key: YARN-2008 URL: https://issues.apache.org/jira/browse/YARN-2008 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chen He -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy
Devaraj K created YARN-2009: --- Summary: Priority support for preemption in ProportionalCapacityPreemptionPolicy Key: YARN-2009 URL: https://issues.apache.org/jira/browse/YARN-2009 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Devaraj K While preempting containers based on the queue ideal assignment, we may need to consider preempting the low priority application containers first. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2001) Persist NMs info for RM restart
[ https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985709#comment-13985709 ] Bikas Saha commented on YARN-2001: -- Requiring all NM's to re-register might to too constraining because after a full code rollout, it may be common for some NM's to not come back. If the RM gets stuck for a minority of NM's not re-registering then it would effectively be loss of HA. I like the idea of waiting for a time period before considering the cluster fully up. However this timeout has to be small or else we will have a lot of downtime. Can this timeout be less than the AM liveliness period? If not then how do we treat AMs that are running on NM's that have not re-registered within the NM timeout? Persist NMs info for RM restart --- Key: YARN-2001 URL: https://issues.apache.org/jira/browse/YARN-2001 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He RM should not accept allocate requests from AMs until all the NMs have registered with RM. For that, RM needs to remember the previous NMs and wait for all the NMs to register. This is also useful for remembering decommissioned nodes across restarts. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2006) Estimate Job Endtime
[ https://issues.apache.org/jira/browse/YARN-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985715#comment-13985715 ] Bikas Saha commented on YARN-2006: -- This needs to be in MAPREDUCE project. Estimate Job Endtime Key: YARN-2006 URL: https://issues.apache.org/jira/browse/YARN-2006 Project: Hadoop YARN Issue Type: Sub-task Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: YARN-1969.patch YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler. As a prerequisite step, the AppMaster should estimate its end time and send it to the RM via the heartbeat. This jira focuses on how the AppMaster performs this estimation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2006) Estimate Job Endtime
[ https://issues.apache.org/jira/browse/YARN-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated YARN-2006: -- Issue Type: Improvement (was: Sub-task) Parent: (was: YARN-1969) Estimate Job Endtime Key: YARN-2006 URL: https://issues.apache.org/jira/browse/YARN-2006 Project: Hadoop YARN Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: YARN-1969.patch YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler. As a prerequisite step, the AppMaster should estimate its end time and send it to the RM via the heartbeat. This jira focuses on how the AppMaster performs this estimation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1338) Recover localized resource cache state upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985732#comment-13985732 ] Hadoop QA commented on YARN-1338: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642657/YARN-1338v3-and-YARN-1987.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 15 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3666//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3666//console This message is automatically generated. Recover localized resource cache state upon nodemanager restart --- Key: YARN-1338 URL: https://issues.apache.org/jira/browse/YARN-1338 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1338.patch, YARN-1338v2.patch, YARN-1338v3-and-YARN-1987.patch Today when node manager restarts we clean up all the distributed cache files from disk. This is definitely not ideal from 2 aspects. * For work preserving restart we definitely want them as running containers are using them * For even non work preserving restart this will be useful in the sense that we don't have to download them again if needed by future tasks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2007) AM expressing the estimated endtime to RM
[ https://issues.apache.org/jira/browse/YARN-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated YARN-2007: -- Description: YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler, which requires RM to know about the estimated end time of jobs. The endtime is estimated by the AppMaster as part of MAPREDUCE-5871. This jira focuses on API updates that allow AM to express this estimated value to the RM. (was: YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler, which requires RM to know about the estimated end time of jobs. The endtime is estimated by the AppMaster as part of YARN-2006. This jira focuses on API updates that allow AM to express this estimated value to the RM.) AM expressing the estimated endtime to RM - Key: YARN-2007 URL: https://issues.apache.org/jira/browse/YARN-2007 Project: Hadoop YARN Issue Type: Sub-task Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler, which requires RM to know about the estimated end time of jobs. The endtime is estimated by the AppMaster as part of MAPREDUCE-5871. This jira focuses on API updates that allow AM to express this estimated value to the RM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1341) Recover NMTokens upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1341: - Attachment: YARN-1341v4-and-YARN-1987.patch Updating the patch to address the DBException handling that was brought up in the MAPREDUCE-5652 review and applies here. Note that this now depends upon YARN-1987 as that provides the utility wrapper for the leveldb iterator to translate raw RuntimeException to the more helpful DBException so we can act accordingly when errors occur. The other notable change in the patch is renaming LevelDB to Leveldb for consistency with the existing LeveldbTimelineStore naming convention. This latest patch includes the necessary pieces of YARN-1987 so it can compile and Jenkins can comment. Recover NMTokens upon nodemanager restart - Key: YARN-1341 URL: https://issues.apache.org/jira/browse/YARN-1341 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, YARN-1341v4-and-YARN-1987.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1987) Wrapper for leveldb DBIterator to aid in handling database exceptions
[ https://issues.apache.org/jira/browse/YARN-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985894#comment-13985894 ] Ming Ma commented on YARN-1987: --- Jason, 1. LeveldbIterator.close rethrows IOException instead of DBException. Just wonder which is better, given JniDBFactory.factory.open throws DBException. 2. It seems store open via JniDBFactory.factory.open can also be useful to put into a wrapper class, to take care of catch the exception if the store doesn't exist and create a new one. Perhaps that will be another jira. Wrapper for leveldb DBIterator to aid in handling database exceptions - Key: YARN-1987 URL: https://issues.apache.org/jira/browse/YARN-1987 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1987.patch Per discussions in YARN-1984 and MAPREDUCE-5652, it would be nice to have a utility wrapper around leveldb's DBIterator to translate the raw RuntimeExceptions it can throw into DBExceptions to make it easier to handle database errors while iterating. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985900#comment-13985900 ] Vinod Kumar Vavilapalli commented on YARN-1963: --- [~sunilg], thanks for taking this up. This is a really useful feature! Before we jump into patches, we should consider writing up a small design doc that describes the requirements and the mechanism of implementation - not necessarily class-level design. There are few things to consider on the top of my head: - Values of priorities - static values like you described before or few known priorities backed by integers leaving gaps for more powerful interaction with priorities - ACLs on priorities - If we don't have some such mechanism, users will all be incentivized to submit apps all with the highest priority. - The classic priority inversion problem: MAPREDUCE-314 I am sure there are more things to consider once we start thinking through this. I can help write this down, let me know what you think. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Arun C Murthy It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2010) RM can't transition to active if it can't recover an app attempt
bc Wong created YARN-2010: - Summary: RM can't transition to active if it can't recover an app attempt Key: YARN-2010 URL: https://issues.apache.org/jira/browse/YARN-2010 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: bc Wong If the RM fails to recover an app attempt, it won't come up. We should make it more resilient. Specifically, the underlying error is that the app was submitted before Kerberos security got turned on. Makes sense for the app to fail in this case. But YARN should still start. {noformat} 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116) ... 4 more Caused by: org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.exceptions.YarnException: java.lang.IllegalArgumentException: Missing argument at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265) ... 5 more Caused by: org.apache.hadoop.yarn.exceptions.YarnException: java.lang.IllegalArgumentException: Missing argument at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) ... 8 more Caused by: java.lang.IllegalArgumentException: Missing argument at javax.crypto.spec.SecretKeySpec.init(SecretKeySpec.java:93) at org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188) at org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369) ... 13 more {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985928#comment-13985928 ] Hadoop QA commented on YARN-1341: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642696/YARN-1341v4-and-YARN-1987.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3667//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3667//console This message is automatically generated. Recover NMTokens upon nodemanager restart - Key: YARN-1341 URL: https://issues.apache.org/jira/browse/YARN-1341 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, YARN-1341v4-and-YARN-1987.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-2008: -- Affects Version/s: 2.3.0 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure - Key: YARN-2008 URL: https://issues.apache.org/jira/browse/YARN-2008 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Chen He -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1987) Wrapper for leveldb DBIterator to aid in handling database exceptions
[ https://issues.apache.org/jira/browse/YARN-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985977#comment-13985977 ] Jason Lowe commented on YARN-1987: -- Thanks for the feedback, Ming! bq. LeveldbIterator.close rethrows IOException instead of DBException. Just wonder which is better, given JniDBFactory.factory.open throws DBException. JniDBFactory.factory.open throws NativeDB.DBException which is an IOException rather than the runtime DBException. Also since close() already declares that it can throw IOException which callers either have to handle or propagate it seemed better to leverage that declared exception than a runtime exception which callers can easily overlook. bq. It seems store open via JniDBFactory.factory.open can also be useful to put into a wrapper class, to take care of catch the exception if the store doesn't exist and create a new one. If all one cares about is to make sure the database is created even if it doesn't exist then that's already covered by the leveldb interfaces by calling createIfMissing() on the options passed to the open call. In the NM restart case I wanted to know when the database was being created so the code can either check the existing schema version or set the schema version, respectively. If that's something that needs to be put in a utility method then I agree it's a separate JIRA. Wrapper for leveldb DBIterator to aid in handling database exceptions - Key: YARN-1987 URL: https://issues.apache.org/jira/browse/YARN-1987 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1987.patch Per discussions in YARN-1984 and MAPREDUCE-5652, it would be nice to have a utility wrapper around leveldb's DBIterator to translate the raw RuntimeExceptions it can throw into DBExceptions to make it easier to handle database errors while iterating. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-2008: -- Description: If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example rootQueue / \ L1ParentQueue1L1ParentQueue2 (allowed to use up 80% of its parent) (allowed to use 20% in minimum of its parent) /\ L2LeafQueue1 L2LeafQueue2 (50% of its parent) (50% of its parent in minimum) When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure - Key: YARN-2008 URL: https://issues.apache.org/jira/browse/YARN-2008 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Chen He If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example rootQueue / \ L1ParentQueue1L1ParentQueue2 (allowed to use up 80% of its parent) (allowed to use 20% in minimum of its parent) /\ L2LeafQueue1 L2LeafQueue2 (50% of its parent) (50% of its parent in minimum) When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-2008: -- Description: If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||rootQueue|| || | /| \ | | L1ParentQueue1 | L1ParentQueue2| | (allowed to use up 80% of its parent)| (allowed to use 20% in minimum of its parent)| |/ | \|| | | | L2LeafQueue1 |L2LeafQueue2 | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). was: If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example rootQueue / \ L1ParentQueue1L1ParentQueue2 (allowed to use up 80% of its parent) (allowed to use 20% in minimum of its parent) /\ L2LeafQueue1 L2LeafQueue2 (50% of its parent) (50% of its parent in minimum) When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure - Key: YARN-2008 URL: https://issues.apache.org/jira/browse/YARN-2008 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Chen He If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||rootQueue|| || | /| \ | | L1ParentQueue1 | L1ParentQueue2| | (allowed to use up 80% of its parent)| (allowed to use 20% in minimum of its parent)| |/ | \|| | | | L2LeafQueue1 |L2LeafQueue2 | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-2008: -- Description: If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||rootQueue|| || | /| \ | | L1ParentQueue1 | L1ParentQueue2| | (allowed to use up 80% of its parent)| (allowed to use 20% in minimum of its parent)| |/ | \ || | | | L2LeafQueue1 |L2LeafQueue2 | | | | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). was: If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||rootQueue|| || | /| \ | | L1ParentQueue1 | L1ParentQueue2| | (allowed to use up 80% of its parent)| (allowed to use 20% in minimum of its parent)| |/ | \|| | | | L2LeafQueue1 |L2LeafQueue2 | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure - Key: YARN-2008 URL: https://issues.apache.org/jira/browse/YARN-2008 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Chen He If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||rootQueue|| || | /| \ | | L1ParentQueue1 | L1ParentQueue2| | (allowed to use up 80% of its parent)| (allowed to use 20% in minimum of its parent)| |/ | \ || | | | L2LeafQueue1 |L2LeafQueue2 | | | | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-2008: -- Description: If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| || | /| || \ | | L1ParentQueue1 | || L1ParentQueue2| | (allowed to use up 80% of its parent)| | | (allowed to use 20% in minimum of its parent)| |/ | \ || | | | L2LeafQueue1 |L2LeafQueue2 | | | | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). was: If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| |||| | /| || \ | | L1ParentQueue1 | || L1ParentQueue2| | (allowed to use up 80% of its parent)| | | (allowed to use 20% in minimum of its parent)| |/ | \ || | | | L2LeafQueue1 |L2LeafQueue2 | | | | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure - Key: YARN-2008 URL: https://issues.apache.org/jira/browse/YARN-2008 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Chen He If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| || | /| || \ | | L1ParentQueue1 | || L1ParentQueue2| | (allowed to use up 80% of its parent)| | | (allowed to use 20% in minimum of its parent)| |/ | \ || | | | L2LeafQueue1 |L2LeafQueue2 | | | | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-2008: -- Description: If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||rootQueue|| || | /| || \ | | L1ParentQueue1 | || L1ParentQueue2| | (allowed to use up 80% of its parent)| | | (allowed to use 20% in minimum of its parent)| |/ | \ || | | | L2LeafQueue1 |L2LeafQueue2 | | | | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). was: If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||rootQueue|| || | /| \ | | L1ParentQueue1 | L1ParentQueue2| | (allowed to use up 80% of its parent)| (allowed to use 20% in minimum of its parent)| |/ | \ || | | | L2LeafQueue1 |L2LeafQueue2 | | | | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure - Key: YARN-2008 URL: https://issues.apache.org/jira/browse/YARN-2008 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Chen He If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||rootQueue|| || | /| || \ | | L1ParentQueue1 | || L1ParentQueue2| | (allowed to use up 80% of its parent)| | | (allowed to use 20% in minimum of its parent)| |/ | \ || | | | L2LeafQueue1 |L2LeafQueue2 | | | | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-2008: -- Description: If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| || | /| | \ | | L1ParentQueue1 | | L1ParentQueue2| | (allowed to use up 80% of its parent)| | (allowed to use 20% in minimum of its parent)| |/ | \ || | | | L2LeafQueue1 |L2LeafQueue2 | | | | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). was: If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| || | /| || \ | | L1ParentQueue1 | || L1ParentQueue2| | (allowed to use up 80% of its parent)| | | (allowed to use 20% in minimum of its parent)| |/ | \ || | | | L2LeafQueue1 |L2LeafQueue2 | | | | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure - Key: YARN-2008 URL: https://issues.apache.org/jira/browse/YARN-2008 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Chen He If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| || | /| | \ | | L1ParentQueue1 | | L1ParentQueue2| | (allowed to use up 80% of its parent)| | (allowed to use 20% in minimum of its parent)| |/ | \ || | | | L2LeafQueue1 |L2LeafQueue2 | | | | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-2008: -- Description: If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| |||| | /| || \ | | L1ParentQueue1 | || L1ParentQueue2| | (allowed to use up 80% of its parent)| | | (allowed to use 20% in minimum of its parent)| |/ | \ || | | | L2LeafQueue1 |L2LeafQueue2 | | | | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). was: If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||rootQueue|| || | /| || \ | | L1ParentQueue1 | || L1ParentQueue2| | (allowed to use up 80% of its parent)| | | (allowed to use 20% in minimum of its parent)| |/ | \ || | | | L2LeafQueue1 |L2LeafQueue2 | | | | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure - Key: YARN-2008 URL: https://issues.apache.org/jira/browse/YARN-2008 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Chen He If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| |||| | /| || \ | | L1ParentQueue1 | || L1ParentQueue2| | (allowed to use up 80% of its parent)| | | (allowed to use 20% in minimum of its parent)| |/ | \ || | | | L2LeafQueue1 |L2LeafQueue2 | | | | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-2008: -- Description: If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| || | | / | \ | | L1ParentQueue1 | | L1ParentQueue2| | (allowed to use up 80% of its parent)| | (allowed to use 20% in minimum of its parent)| |/ | \ || | L2LeafQueue1 |L2LeafQueue2 | | |(50% of its parent) | (50% of its parent in minimum) | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). was: If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| || | | / | \ | | L1ParentQueue1 | | L1ParentQueue2| | (allowed to use up 80% of its parent)| | (allowed to use 20% in minimum of its parent)| |/ | \ || | | | L2LeafQueue1 |L2LeafQueue2 | | | | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure - Key: YARN-2008 URL: https://issues.apache.org/jira/browse/YARN-2008 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Chen He If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| || | | / | \ | | L1ParentQueue1 | | L1ParentQueue2| | (allowed to use up 80% of its parent)| | (allowed to use 20% in minimum of its parent)| |/ | \ || | L2LeafQueue1 |L2LeafQueue2 | | |(50% of its parent) | (50% of its parent in minimum) | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-2008: -- Description: If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| || | | / | \ | | L1ParentQueue1 | | L1ParentQueue2| | (allowed to use up 80% of its parent)| | (allowed to use 20% in minimum of its parent)| |/ | \ || | | | L2LeafQueue1 |L2LeafQueue2 | | | | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). was: If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| || | /| | \ | | L1ParentQueue1 | | L1ParentQueue2| | (allowed to use up 80% of its parent)| | (allowed to use 20% in minimum of its parent)| |/ | \ || | | | L2LeafQueue1 |L2LeafQueue2 | | | | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure - Key: YARN-2008 URL: https://issues.apache.org/jira/browse/YARN-2008 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Chen He If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| || | | / | \ | | L1ParentQueue1 | | L1ParentQueue2| | (allowed to use up 80% of its parent)| | (allowed to use 20% in minimum of its parent)| |/ | \ || | | | L2LeafQueue1 |L2LeafQueue2 | | | | |(50% of its parent) | (50% of its parent in minimum) | | | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1542) Add unit test for public resource on viewfs
[ https://issues.apache.org/jira/browse/YARN-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated YARN-1542: Attachment: YARN-1542.v03.patch reuploading v03 again to make sure that it now passes given that the blocker is committed. Add unit test for public resource on viewfs --- Key: YARN-1542 URL: https://issues.apache.org/jira/browse/YARN-1542 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1542.v01.patch, YARN-1542.v02.patch, YARN-1542.v03.patch, YARN-1542.v03.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986128#comment-13986128 ] Carlo Curino commented on YARN-2009: If I am not mistaken this is what is happening... The policy picks from the queue in reverse order (i.e., least priority first, and for now since this is FIFO, means youngest App is picked as a first victim), next it tries to unreserve containers (as this is a free metadata-only operation), and then picks containers in reverse priority order. This is in getContainersToPreempt(..). Is this what you meant? Am I missing something? Priority support for preemption in ProportionalCapacityPreemptionPolicy --- Key: YARN-2009 URL: https://issues.apache.org/jira/browse/YARN-2009 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Devaraj K While preempting containers based on the queue ideal assignment, we may need to consider preempting the low priority application containers first. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1342) Recover container tokens upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1342: - Attachment: YARN-1342v3-and-YARN-1987.patch Updating the patch to address the DBException handling that was brought up in the MAPREDUCE-5652 review and applies here. Note that this now depends upon YARN-1987 as that provides the utility wrapper for the leveldb iterator to translate raw RuntimeException to the more helpful DBException so we can act accordingly when errors occur. The other notable change in the patch is renaming LevelDB to Leveldb for consistency with the existing LeveldbTimelineStore naming convention. This latest patch includes the necessary pieces of YARN-1987 so it can compile and Jenkins can comment. Recover container tokens upon nodemanager restart - Key: YARN-1342 URL: https://issues.apache.org/jira/browse/YARN-1342 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1342.patch, YARN-1342v2.patch, YARN-1342v3-and-YARN-1987.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1963: Assignee: Sunil G (was: Arun C Murthy) Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986163#comment-13986163 ] Arun C Murthy commented on YARN-1963: - [~sunilg] thanks for taking this up! As [~vinodkv] mentioned; a short writeup will help - look forward to helping get this in; thanks again! Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2011) Typo in TestLeafQueue
Chen He created YARN-2011: - Summary: Typo in TestLeafQueue Key: YARN-2011 URL: https://issues.apache.org/jira/browse/YARN-2011 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Chen He Assignee: Chen He Priority: Trivial a.assignContainers(clusterResource, node_0); assertEquals(2*GB, a.getUsedResources().getMemory()); assertEquals(2*GB, app_0.getCurrentConsumption().getMemory()); assertEquals(0*GB, app_1.getCurrentConsumption().getMemory()); assertEquals(0*GB, app_0.getHeadroom().getMemory()); // User limit = 2G assertEquals(0*GB, app_0.getHeadroom().getMemory()); // User limit = 2G // Again one to user_0 since he hasn't exceeded user limit yet a.assignContainers(clusterResource, node_0); assertEquals(3*GB, a.getUsedResources().getMemory()); assertEquals(2*GB, app_0.getCurrentConsumption().getMemory()); assertEquals(1*GB, app_1.getCurrentConsumption().getMemory()); assertEquals(0*GB, app_0.getHeadroom().getMemory()); // 3G - 2G assertEquals(0*GB, app_0.getHeadroom().getMemory()); // 3G - 2G -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2011) Typo in TestLeafQueue
[ https://issues.apache.org/jira/browse/YARN-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-2011: -- Attachment: YARN-2011.patch Typo in TestLeafQueue - Key: YARN-2011 URL: https://issues.apache.org/jira/browse/YARN-2011 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Chen He Assignee: Chen He Priority: Trivial Attachments: YARN-2011.patch a.assignContainers(clusterResource, node_0); assertEquals(2*GB, a.getUsedResources().getMemory()); assertEquals(2*GB, app_0.getCurrentConsumption().getMemory()); assertEquals(0*GB, app_1.getCurrentConsumption().getMemory()); assertEquals(0*GB, app_0.getHeadroom().getMemory()); // User limit = 2G assertEquals(0*GB, app_0.getHeadroom().getMemory()); // User limit = 2G // Again one to user_0 since he hasn't exceeded user limit yet a.assignContainers(clusterResource, node_0); assertEquals(3*GB, a.getUsedResources().getMemory()); assertEquals(2*GB, app_0.getCurrentConsumption().getMemory()); assertEquals(1*GB, app_1.getCurrentConsumption().getMemory()); assertEquals(0*GB, app_0.getHeadroom().getMemory()); // 3G - 2G assertEquals(0*GB, app_0.getHeadroom().getMemory()); // 3G - 2G -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1813) Better error message for yarn logs when permission denied
[ https://issues.apache.org/jira/browse/YARN-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1813: - Attachment: YARN-1813.3.patch Better error message for yarn logs when permission denied --- Key: YARN-1813 URL: https://issues.apache.org/jira/browse/YARN-1813 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Tsuyoshi OZAWA Priority: Minor Attachments: YARN-1813.1.patch, YARN-1813.2.patch, YARN-1813.2.patch, YARN-1813.3.patch I ran some MR jobs as the hdfs user, and then forgot to sudo -u when grabbing the logs. yarn logs prints an error message like the following: {noformat} [andrew.wang@a2402 ~]$ yarn logs -applicationId application_1394482121761_0010 14/03/10 16:05:10 INFO client.RMProxy: Connecting to ResourceManager at a2402.halxg.cloudera.com/10.20.212.10:8032 Logs not available at /tmp/logs/andrew.wang/logs/application_1394482121761_0010 Log aggregation has not completed or is not enabled. {noformat} It'd be nicer if it said Permission denied or AccessControlException or something like that instead, since that's the real issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1813) Better error message for yarn logs when permission denied
[ https://issues.apache.org/jira/browse/YARN-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1813: - Attachment: (was: YARN-1813.3.patch) Better error message for yarn logs when permission denied --- Key: YARN-1813 URL: https://issues.apache.org/jira/browse/YARN-1813 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Tsuyoshi OZAWA Priority: Minor Attachments: YARN-1813.1.patch, YARN-1813.2.patch, YARN-1813.2.patch, YARN-1813.3.patch I ran some MR jobs as the hdfs user, and then forgot to sudo -u when grabbing the logs. yarn logs prints an error message like the following: {noformat} [andrew.wang@a2402 ~]$ yarn logs -applicationId application_1394482121761_0010 14/03/10 16:05:10 INFO client.RMProxy: Connecting to ResourceManager at a2402.halxg.cloudera.com/10.20.212.10:8032 Logs not available at /tmp/logs/andrew.wang/logs/application_1394482121761_0010 Log aggregation has not completed or is not enabled. {noformat} It'd be nicer if it said Permission denied or AccessControlException or something like that instead, since that's the real issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1813) Better error message for yarn logs when permission denied
[ https://issues.apache.org/jira/browse/YARN-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1813: - Attachment: YARN-1813.3.patch Better error message for yarn logs when permission denied --- Key: YARN-1813 URL: https://issues.apache.org/jira/browse/YARN-1813 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Tsuyoshi OZAWA Priority: Minor Attachments: YARN-1813.1.patch, YARN-1813.2.patch, YARN-1813.2.patch, YARN-1813.3.patch I ran some MR jobs as the hdfs user, and then forgot to sudo -u when grabbing the logs. yarn logs prints an error message like the following: {noformat} [andrew.wang@a2402 ~]$ yarn logs -applicationId application_1394482121761_0010 14/03/10 16:05:10 INFO client.RMProxy: Connecting to ResourceManager at a2402.halxg.cloudera.com/10.20.212.10:8032 Logs not available at /tmp/logs/andrew.wang/logs/application_1394482121761_0010 Log aggregation has not completed or is not enabled. {noformat} It'd be nicer if it said Permission denied or AccessControlException or something like that instead, since that's the real issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1813) Better error message for yarn logs when permission denied
[ https://issues.apache.org/jira/browse/YARN-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986201#comment-13986201 ] Tsuyoshi OZAWA commented on YARN-1813: -- Thanks for your review, Vinod! * Added a full-stop to LogsCLIHelpers.PERMISSION_DENIED_MESSAGE. * Fixed AggregatedLogsBlock to handle permission exception. * Fixed Formatting. * TestLogsCLI: renamed refreshSysOutputs to refreshFileStreams(). About AccessControlExceptionFileSystem, I tried to create mock more smartly and found that it's a bit difficult to use mockito in this case because FileContext is final class and we can inject mock class only via configuration property fs.AbstractFileSystem.*.impl. If AbstraceFileSystem#newInstance() or createFileSystem() are not static method, we can override it, but they are not. Please let me know if you have better idea. Better error message for yarn logs when permission denied --- Key: YARN-1813 URL: https://issues.apache.org/jira/browse/YARN-1813 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Tsuyoshi OZAWA Priority: Minor Attachments: YARN-1813.1.patch, YARN-1813.2.patch, YARN-1813.2.patch, YARN-1813.3.patch I ran some MR jobs as the hdfs user, and then forgot to sudo -u when grabbing the logs. yarn logs prints an error message like the following: {noformat} [andrew.wang@a2402 ~]$ yarn logs -applicationId application_1394482121761_0010 14/03/10 16:05:10 INFO client.RMProxy: Connecting to ResourceManager at a2402.halxg.cloudera.com/10.20.212.10:8032 Logs not available at /tmp/logs/andrew.wang/logs/application_1394482121761_0010 Log aggregation has not completed or is not enabled. {noformat} It'd be nicer if it said Permission denied or AccessControlException or something like that instead, since that's the real issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler
[ https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986213#comment-13986213 ] Wangda Tan commented on YARN-1368: -- Thanks [~jianhe] for this proposal, I think recover container from NM heartbeat is a reasonable way, +1 for general ideas, Some minor comments, bq. Noticed that FiCaSchedulerNode and FSSchedulerNode are almost the same. Any reason for keeping both ? thinking to merge the common methods into SchedulerNode. Currently IMO, we'd better keep both. To avoid involving too much parts in this JIRA, we can separate the merge common logic of them to a new task. bq. ContainerStatus sent in NM registration doesn’t capture enough information for re-constructing the containers. we may replace that with a new object or just adding more fields to encapsulate all the necessary information for re-constructing the container. Personally I think create a new type specialized for container recovering is better, ContainerStatus is also used in node heartbeat. Including too much fields in each heartbeat isn't safe or efficient Common work to re-populate containers’ state into scheduler --- Key: YARN-1368 URL: https://issues.apache.org/jira/browse/YARN-1368 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1368.preliminary.patch YARN-1367 adds support for the NM to tell the RM about all currently running containers upon registration. The RM needs to send this information to the schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover the current allocation state of the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler
[ https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986222#comment-13986222 ] Anubhav Dhoot commented on YARN-1368: - Hi [~jianhe], I have spent a bunch time of time thinking on this and related issues and have covered a bunch of these in the prototype on [YARN-556|https://issues.apache.org/jira/browse/YARN-556]. Lets sync up so we avoid duplicated effort. Common work to re-populate containers’ state into scheduler --- Key: YARN-1368 URL: https://issues.apache.org/jira/browse/YARN-1368 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1368.preliminary.patch YARN-1367 adds support for the NM to tell the RM about all currently running containers upon registration. The RM needs to send this information to the schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover the current allocation state of the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2012) Fair Scheduler : Default rule in queue placement policy can take a queue as an optional attribute
Ashwin Shankar created YARN-2012: Summary: Fair Scheduler : Default rule in queue placement policy can take a queue as an optional attribute Key: YARN-2012 URL: https://issues.apache.org/jira/browse/YARN-2012 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Ashwin Shankar Currently 'default' rule in queue placement policy,if applied,puts the app in root.default queue. It would be great if we can make 'default' rule optionally point to a different queue as default queue . This queue should be an existing queue,if not we fall back to root.default queue hence keeping this rule as terminal. This default queue can be a leaf queue or it can also be an parent queue if the 'default' rule is nested inside nestedUserQueue rule(YARN-1864). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2012) Fair Scheduler : Default rule in queue placement policy can take a queue as an optional attribute
[ https://issues.apache.org/jira/browse/YARN-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986230#comment-13986230 ] Ashwin Shankar commented on YARN-2012: -- Will post a patch by tomorrow. Fair Scheduler : Default rule in queue placement policy can take a queue as an optional attribute - Key: YARN-2012 URL: https://issues.apache.org/jira/browse/YARN-2012 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Ashwin Shankar Labels: scheduler Currently 'default' rule in queue placement policy,if applied,puts the app in root.default queue. It would be great if we can make 'default' rule optionally point to a different queue as default queue . This queue should be an existing queue,if not we fall back to root.default queue hence keeping this rule as terminal. This default queue can be a leaf queue or it can also be an parent queue if the 'default' rule is nested inside nestedUserQueue rule(YARN-1864). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler
[ https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986270#comment-13986270 ] Jian He commented on YARN-1368: --- Hi [~adhoot], sure. Thanks for sharing the prototype. I looked at the patch. I think the prototype is doing as a scheduler-specifc fashion and only for the FairScheduler. It'll be a maintenance overhead if we implement each scheduler separately. The prototype recovers a portion of the FairScheduler by reusing some existing APIs, but I think missed some other state for the scheduler attempt, queue metrics, schedulerNode. The patch submitted here is to experiment a generic approach for recovering all scheduler state including all entities(metrics, schedulerNode etc,) with no or minimum scheduler-specific changes. Common work to re-populate containers’ state into scheduler --- Key: YARN-1368 URL: https://issues.apache.org/jira/browse/YARN-1368 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1368.preliminary.patch YARN-1367 adds support for the NM to tell the RM about all currently running containers upon registration. The RM needs to send this information to the schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover the current allocation state of the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2011) Typo in TestLeafQueue
[ https://issues.apache.org/jira/browse/YARN-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986332#comment-13986332 ] Akira AJISAKA commented on YARN-2011: - +1 (non-binding) Typo in TestLeafQueue - Key: YARN-2011 URL: https://issues.apache.org/jira/browse/YARN-2011 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Chen He Assignee: Chen He Priority: Trivial Attachments: YARN-2011.patch a.assignContainers(clusterResource, node_0); assertEquals(2*GB, a.getUsedResources().getMemory()); assertEquals(2*GB, app_0.getCurrentConsumption().getMemory()); assertEquals(0*GB, app_1.getCurrentConsumption().getMemory()); assertEquals(0*GB, app_0.getHeadroom().getMemory()); // User limit = 2G assertEquals(0*GB, app_0.getHeadroom().getMemory()); // User limit = 2G // Again one to user_0 since he hasn't exceeded user limit yet a.assignContainers(clusterResource, node_0); assertEquals(3*GB, a.getUsedResources().getMemory()); assertEquals(2*GB, app_0.getCurrentConsumption().getMemory()); assertEquals(1*GB, app_1.getCurrentConsumption().getMemory()); assertEquals(0*GB, app_0.getHeadroom().getMemory()); // 3G - 2G assertEquals(0*GB, app_0.getHeadroom().getMemory()); // 3G - 2G -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-570) Time strings are formated in different timezone
[ https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986335#comment-13986335 ] Akira AJISAKA commented on YARN-570: Hi [~peng.zhang], what's going on this issue? I'd like to take it over. Time strings are formated in different timezone --- Key: YARN-570 URL: https://issues.apache.org/jira/browse/YARN-570 Project: Hadoop YARN Issue Type: Bug Reporter: PengZhang Assignee: PengZhang Attachments: MAPREDUCE-5141.patch Time strings on different page are displayed in different timezone. If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as Wed, 10 Apr 2013 08:29:56 GMT If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 16:29:56 Same value, but different timezone. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1696) Document RM HA
[ https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986351#comment-13986351 ] Hudson commented on YARN-1696: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5589 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5589/]) YARN-1696. Added documentation for ResourceManager fail-over. Contributed by Karthik Kambatla, Masatake Iwasaki, Tsuyoshi OZAWA. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1591416) * /hadoop/common/trunk/hadoop-project/src/site/site.xml * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images/rm-ha-overview.png Document RM HA -- Key: YARN-1696 URL: https://issues.apache.org/jira/browse/YARN-1696 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Priority: Blocker Fix For: 2.4.1 Attachments: YARN-1676.5.patch, YARN-1696-3.patch, YARN-1696.2.patch, YARN-1696.4.patch, YARN-1696.6.patch, rm-ha-overview.png, rm-ha-overview.svg, yarn-1696-1.patch Add documentation for RM HA. Marking this a blocker for 2.4 as this is required to call RM HA Stable and ready for public consumption. -- This message was sent by Atlassian JIRA (v6.2#6252)