[jira] [Commented] (YARN-1531) Update yarn command document
[ https://issues.apache.org/jira/browse/YARN-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886410#comment-13886410 ] Akira AJISAKA commented on YARN-1531: - [~kkambatl], would you please review the v2 patch? > Update yarn command document > > > Key: YARN-1531 > URL: https://issues.apache.org/jira/browse/YARN-1531 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Labels: documentaion > Attachments: YARN-1531.2.patch, YARN-1531.patch > > > There are some options which are not written to Yarn Command document. > For example, "yarn rmadmin" command options are as follows: > {code} > Usage: yarn rmadmin >-refreshQueues >-refreshNodes >-refreshSuperUserGroupsConfiguration >-refreshUserToGroupsMappings >-refreshAdminAcls >-refreshServiceAcl >-getGroups [username] >-help [cmd] >-transitionToActive >-transitionToStandby >-failover [--forcefence] [--forceactive] >-getServiceState >-checkHealth > {code} > But some of the new options such as "-getGroups", "-transitionToActive", and > "-transitionToStandby" are not documented. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1618) Fix invalid RMApp transition from NEW to FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886496#comment-13886496 ] Hudson commented on YARN-1618: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #466 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/466/]) YARN-1618. Fix invalid RMApp transition from NEW to FINAL_SAVING (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562529) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java > Fix invalid RMApp transition from NEW to FINAL_SAVING > - > > Key: YARN-1618 > URL: https://issues.apache.org/jira/browse/YARN-1618 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Fix For: 2.3.0 > > Attachments: yarn-1618-1.patch, yarn-1618-2.patch, yarn-1618-3.patch, > yarn-1618-branch-2.3.patch > > > YARN-891 augments the RMStateStore to store information on completed > applications. In the process, it adds transitions from NEW to FINAL_SAVING. > This leads to the RM trying to update entries in the state-store that do not > exist. On ZKRMStateStore, this leads to the RM crashing. > Previous description: > ZKRMStateStore fails to handle updates to znodes that don't exist. For > instance, this can happen when an app transitions from NEW to FINAL_SAVING. > In these cases, the store should create the missing znode and handle the > update. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1600) RM does not startup when security is enabled without spnego configured
[ https://issues.apache.org/jira/browse/YARN-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886500#comment-13886500 ] Hudson commented on YARN-1600: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #466 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/466/]) YARN-1600. RM does not startup when security is enabled without spnego configured. Contributed by Haohui Mai (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562482) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/WebApps.java > RM does not startup when security is enabled without spnego configured > -- > > Key: YARN-1600 > URL: https://issues.apache.org/jira/browse/YARN-1600 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Assignee: Haohui Mai >Priority: Blocker > Fix For: 3.0.0, 2.3.0 > > Attachments: YARN-1600.000.patch > > > We have a custom auth filter in front of our various UI pages that handles > user authentication. However currently the RM assumes that if security is > enabled then the user must have configured spnego as well for the RM web > pages which is not true in our case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1677) Potential bugs in exception handlers
Ding Yuan created YARN-1677: --- Summary: Potential bugs in exception handlers Key: YARN-1677 URL: https://issues.apache.org/jira/browse/YARN-1677 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Reporter: Ding Yuan Hi Yarn developers, We are a group of researchers on software reliability, and recently we did a study and found that majority of the most severe failures in hadoop are caused by bugs in exception handling logic. Therefore we built a simple checking tool that automatically detects some bug patterns that have caused some very severe failures. I am reporting some of the results for Yarn here. Any feedback is much appreciated! == Case 1: Line: 551, File: "org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java" {noformat} switch (monitoringEvent.getType()) { case START_MONITORING_CONTAINER: .. .. default: // TODO: Wrong event. } {noformat} The switch fall-through (handling any potential unexpected event) is empty. Should we at least print an error message here? == == Case 2: Line: 491, File: "org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java" {noformat} } catch (Throwable e) { // TODO Better error handling. Thread can die with the rest of the // NM still running. LOG.error("Caught exception in status-updater", e); } {noformat} The handler of this very general exception only logs the error. The TODO seems to indicate it is not sufficient. == == Case 3: Line: 861, File: "org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java" for (LocalResourceStatus stat : remoteResourceStatuses) { LocalResource rsrc = stat.getResource(); LocalResourceRequest req = null; try { req = new LocalResourceRequest(rsrc); } catch (URISyntaxException e) { // TODO fail? Already translated several times... } The handler for URISyntaxException is empty, and the TODO seems to indicate it is not sufficient. The same code pattern can also be found at: Line: 901, File: "org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java" Line: 838, File: "org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java" Line: 878, File: "org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java" At line: 803, File: org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java, the handler of URISyntaxException also seems not sufficient: {noformat} try { shellRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI( shellScriptPath))); } catch (URISyntaxException e) { LOG.error("Error when trying to use shell script path specified" + " in env, path=" + shellScriptPath); e.printStackTrace(); // A failure scenario on bad input such as invalid shell script path // We know we cannot continue launching the container // so we should release it. // TODO numCompletedContainers.incrementAndGet(); numFailedContainers.incrementAndGet(); return; } {noformat} == == Case 4: Line: 627, File: "org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java" {noformat} try { /* keep the master in sync with the state machine */ this.stateMachine.doTransition(event.getType(), event); } catch (InvalidStateTransitonException e) { LOG.error("Can't handle this event at current state", e); /* TODO fail the application on the failed transition */ } {noformat} The handler of this exception only logs the error. The TODO seems to indicate it is not sufficient. This exact same code pattern can also be found at: Line: 573, File: "org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java" == == Case 5: empty handler for exception: java.lang.InterruptedException Line: 123, File: "org/apache/hadoop/yarn/server/webproxy/WebAppProxy.java" {noformat} public void join() { if(proxyServer != null) { try { proxyServer.join(); } catch (InterruptedException e) { } } } {noformat} The InterruptedException is completely ignored. As a result, any events causing this interrupt will be lost. More info on why InterruptedException shouldn't be ignored: http://stackoverflow.com/questions/1087475/when-does-javas-thread-sleep-throw-int
[jira] [Updated] (YARN-1677) Potential bugs in exception handlers
[ https://issues.apache.org/jira/browse/YARN-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ding Yuan updated YARN-1677: Description: Hi Yarn developers, We are a group of researchers on software reliability, and recently we did a study and found that majority of the most severe failures in hadoop are caused by bugs in exception handling logic. Therefore we built a simple checking tool that automatically detects some bug patterns that have caused some very severe failures. I am reporting some of the results for Yarn here. Any feedback is much appreciated! == Case 1: Line: 551, File: "org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java" {noformat} switch (monitoringEvent.getType()) { case START_MONITORING_CONTAINER: .. .. default: // TODO: Wrong event. } {noformat} The switch fall-through (handling any potential unexpected event) is empty. Should we at least print an error message here? == == Case 2: Line: 491, File: "org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java" {noformat} } catch (Throwable e) { // TODO Better error handling. Thread can die with the rest of the // NM still running. LOG.error("Caught exception in status-updater", e); } {noformat} The handler of this very general exception only logs the error. The TODO seems to indicate it is not sufficient. == == Case 3: Line: 861, File: "org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java" for (LocalResourceStatus stat : remoteResourceStatuses) { LocalResource rsrc = stat.getResource(); LocalResourceRequest req = null; try { req = new LocalResourceRequest(rsrc); } catch (URISyntaxException e) { // TODO fail? Already translated several times... } The handler for URISyntaxException is empty, and the TODO seems to indicate it is not sufficient. The same code pattern can also be found at: Line: 901, File: "org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java" Line: 838, File: "org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java" Line: 878, File: "org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java" At line: 803, File: org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java, the handler of URISyntaxException also seems not sufficient: {noformat} try { shellRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI( shellScriptPath))); } catch (URISyntaxException e) { LOG.error("Error when trying to use shell script path specified" + " in env, path=" + shellScriptPath); e.printStackTrace(); // A failure scenario on bad input such as invalid shell script path // We know we cannot continue launching the container // so we should release it. // TODO numCompletedContainers.incrementAndGet(); numFailedContainers.incrementAndGet(); return; } {noformat} == == Case 4: Line: 627, File: "org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java" {noformat} try { /* keep the master in sync with the state machine */ this.stateMachine.doTransition(event.getType(), event); } catch (InvalidStateTransitonException e) { LOG.error("Can't handle this event at current state", e); /* TODO fail the application on the failed transition */ } {noformat} The handler of this exception only logs the error. The TODO seems to indicate it is not sufficient. This exact same code pattern can also be found at: Line: 573, File: "org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java" == == Case 5: empty handler for exception: java.lang.InterruptedException Line: 123, File: "org/apache/hadoop/yarn/server/webproxy/WebAppProxy.java" {noformat} public void join() { if(proxyServer != null) { try { proxyServer.join(); } catch (InterruptedException e) { } } } {noformat} The InterruptedException is completely ignored. As a result, any events causing this interrupt will be lost. More info on why InterruptedException shouldn't be ignored: http://stackoverflow.com/questions/1087475/when-does-javas-thread-sleep-throw-interruptedexception This pattern of handling InterruptedException can be found in a few other places: Line: 434, File: org/apache/hadoop/yarn/server/resourcemanager/ResourceM
[jira] [Updated] (YARN-1677) Potential bugs in exception handlers
[ https://issues.apache.org/jira/browse/YARN-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ding Yuan updated YARN-1677: Description: Hi Yarn developers, We are a group of researchers on software reliability, and recently we did a study and found that majority of the most severe failures in hadoop are caused by bugs in exception handling logic. Therefore we built a simple checking tool that automatically detects some bug patterns that have caused some very severe failures. I am reporting some of the results for Yarn here. Any feedback is much appreciated! == Case 1: Line: 551, File: "org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java" {noformat} switch (monitoringEvent.getType()) { case START_MONITORING_CONTAINER: .. .. default: // TODO: Wrong event. } {noformat} The switch fall-through (handling any potential unexpected event) is empty. Should we at least print an error message here? == == Case 2: Line: 491, File: "org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java" {noformat} } catch (Throwable e) { // TODO Better error handling. Thread can die with the rest of the // NM still running. LOG.error("Caught exception in status-updater", e); } {noformat} The handler of this very general exception only logs the error. The TODO seems to indicate it is not sufficient. == == Case 3: Line: 861, File: "org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java" for (LocalResourceStatus stat : remoteResourceStatuses) { LocalResource rsrc = stat.getResource(); LocalResourceRequest req = null; try { req = new LocalResourceRequest(rsrc); } catch (URISyntaxException e) { // TODO fail? Already translated several times... } The handler for URISyntaxException is empty, and the TODO seems to indicate it is not sufficient. The same code pattern can also be found at: Line: 901, File: "org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java" Line: 838, File: "org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java" Line: 878, File: "org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java" At line: 803, File: org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java, the handler of URISyntaxException also seems not sufficient: {noformat} try { shellRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI( shellScriptPath))); } catch (URISyntaxException e) { LOG.error("Error when trying to use shell script path specified" + " in env, path=" + shellScriptPath); e.printStackTrace(); // A failure scenario on bad input such as invalid shell script path // We know we cannot continue launching the container // so we should release it. // TODO numCompletedContainers.incrementAndGet(); numFailedContainers.incrementAndGet(); return; } {noformat} == == Case 4: Line: 627, File: "org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java" {noformat} try { /* keep the master in sync with the state machine */ this.stateMachine.doTransition(event.getType(), event); } catch (InvalidStateTransitonException e) { LOG.error("Can't handle this event at current state", e); /* TODO fail the application on the failed transition */ } {noformat} The handler of this exception only logs the error. The TODO seems to indicate it is not sufficient. This exact same code pattern can also be found at: Line: 573, File: "org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java" == == Case 5: empty handler for exception: java.lang.InterruptedException Line: 123, File: "org/apache/hadoop/yarn/server/webproxy/WebAppProxy.java" {noformat} public void join() { if(proxyServer != null) { try { proxyServer.join(); } catch (InterruptedException e) { } } } {noformat} The InterruptedException is completely ignored. As a result, any events causing this interrupt will be lost. More info on why InterruptedException shouldn't be ignored: http://stackoverflow.com/questions/1087475/when-does-javas-thread-sleep-throw-interruptedexception This pattern of handling InterruptedException can be found in a few other places: Line: 434, File: org/apache/hadoop/yarn/server/resourcemanager/ResourceM
[jira] [Commented] (YARN-1600) RM does not startup when security is enabled without spnego configured
[ https://issues.apache.org/jira/browse/YARN-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886566#comment-13886566 ] Hudson commented on YARN-1600: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1683 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1683/]) YARN-1600. RM does not startup when security is enabled without spnego configured. Contributed by Haohui Mai (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562482) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/WebApps.java > RM does not startup when security is enabled without spnego configured > -- > > Key: YARN-1600 > URL: https://issues.apache.org/jira/browse/YARN-1600 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Assignee: Haohui Mai >Priority: Blocker > Fix For: 3.0.0, 2.3.0 > > Attachments: YARN-1600.000.patch > > > We have a custom auth filter in front of our various UI pages that handles > user authentication. However currently the RM assumes that if security is > enabled then the user must have configured spnego as well for the RM web > pages which is not true in our case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1618) Fix invalid RMApp transition from NEW to FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886562#comment-13886562 ] Hudson commented on YARN-1618: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1683 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1683/]) YARN-1618. Fix invalid RMApp transition from NEW to FINAL_SAVING (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562529) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java > Fix invalid RMApp transition from NEW to FINAL_SAVING > - > > Key: YARN-1618 > URL: https://issues.apache.org/jira/browse/YARN-1618 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Fix For: 2.3.0 > > Attachments: yarn-1618-1.patch, yarn-1618-2.patch, yarn-1618-3.patch, > yarn-1618-branch-2.3.patch > > > YARN-891 augments the RMStateStore to store information on completed > applications. In the process, it adds transitions from NEW to FINAL_SAVING. > This leads to the RM trying to update entries in the state-store that do not > exist. On ZKRMStateStore, this leads to the RM crashing. > Previous description: > ZKRMStateStore fails to handle updates to znodes that don't exist. For > instance, this can happen when an app transitions from NEW to FINAL_SAVING. > In these cases, the store should create the missing znode and handle the > update. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1600) RM does not startup when security is enabled without spnego configured
[ https://issues.apache.org/jira/browse/YARN-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886584#comment-13886584 ] Hudson commented on YARN-1600: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1658 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1658/]) YARN-1600. RM does not startup when security is enabled without spnego configured. Contributed by Haohui Mai (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562482) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/WebApps.java > RM does not startup when security is enabled without spnego configured > -- > > Key: YARN-1600 > URL: https://issues.apache.org/jira/browse/YARN-1600 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Assignee: Haohui Mai >Priority: Blocker > Fix For: 3.0.0, 2.3.0 > > Attachments: YARN-1600.000.patch > > > We have a custom auth filter in front of our various UI pages that handles > user authentication. However currently the RM assumes that if security is > enabled then the user must have configured spnego as well for the RM web > pages which is not true in our case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1618) Fix invalid RMApp transition from NEW to FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886580#comment-13886580 ] Hudson commented on YARN-1618: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1658 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1658/]) YARN-1618. Fix invalid RMApp transition from NEW to FINAL_SAVING (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562529) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java > Fix invalid RMApp transition from NEW to FINAL_SAVING > - > > Key: YARN-1618 > URL: https://issues.apache.org/jira/browse/YARN-1618 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Fix For: 2.3.0 > > Attachments: yarn-1618-1.patch, yarn-1618-2.patch, yarn-1618-3.patch, > yarn-1618-branch-2.3.patch > > > YARN-891 augments the RMStateStore to store information on completed > applications. In the process, it adds transitions from NEW to FINAL_SAVING. > This leads to the RM trying to update entries in the state-store that do not > exist. On ZKRMStateStore, this leads to the RM crashing. > Previous description: > ZKRMStateStore fails to handle updates to znodes that don't exist. For > instance, this can happen when an app transitions from NEW to FINAL_SAVING. > In these cases, the store should create the missing znode and handle the > update. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-1670: Priority: Critical (was: Major) > aggregated log writer can write more log data then it says is the log length > > > Key: YARN-1670 > URL: https://issues.apache.org/jira/browse/YARN-1670 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 0.23.10, 2.2.0 >Reporter: Thomas Graves >Priority: Critical > > We have seen exceptions when using 'yarn logs' to read log files. > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) >at java.lang.Long.parseLong(Long.java:441) >at java.lang.Long.parseLong(Long.java:483) >at > org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) >at > org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) >at > org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) >at > org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) > We traced it down to the reader trying to read the file type of the next file > but where it reads is still log data from the previous file. What happened > was the Log Length was written as a certain size but the log data was > actually longer then that. > Inside of the write() routine in LogValue it first writes what the logfile > length is, but then when it goes to write the log itself it just goes to the > end of the file. There is a race condition here where if someone is still > writing to the file when it goes to be aggregated the length written could be > to small. > We should have the write() routine stop when it writes whatever it said was > the length. It would be nice if we could somehow tell the user it might be > truncated but I'm not sure of a good way to do this. > We also noticed that a bug in readAContainerLogsForALogType where it is using > an int for curRead whereas it should be using a long. > while (len != -1 && curRead < fileLength) { > This isn't actually a problem right now as it looks like the underlying > decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1461) RM API and RM changes to handle tags for running jobs
[ https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886847#comment-13886847 ] Karthik Kambatla commented on YARN-1461: Thanks for the review, [~zjshen]. Sorry for the delay in following up on the comments. bq. The previous pattern of defining enum in proto is to have non proto corresponding enum, and map them one-to-one. It avoid using proto object in GetApplicationsRequest. I am sorry, I didn't quite get that. Are you suggesting not having methods to get and set Scope in GetApplicationsRequest? If yes, how do you propose we allow users to set the Scope to the non-default value? > RM API and RM changes to handle tags for running jobs > - > > Key: YARN-1461 > URL: https://issues.apache.org/jira/browse/YARN-1461 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch, > yarn-1461-4.patch, yarn-1461-5.patch, yarn-1461-6.patch, yarn-1461-6.patch, > yarn-1461-7.patch, yarn-1461-8.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1669) Make admin refreshServiceAcls work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1669: Attachment: YARN-1669.1.patch create the patch based on YARN-1611 for admin refreshServiceAcls changes > Make admin refreshServiceAcls work across RM failover > - > > Key: YARN-1669 > URL: https://issues.apache.org/jira/browse/YARN-1669 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-1669.1.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1504) RM changes for moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1504: - Attachment: YARN-1504-1.patch > RM changes for moving apps between queues > - > > Key: YARN-1504 > URL: https://issues.apache.org/jira/browse/YARN-1504 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1504-1.patch, YARN-1504.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1504) RM changes for moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886921#comment-13886921 ] Sandy Ryza commented on YARN-1504: -- Attached a patch that addresses Karthik's comments. Regarding the tests, most of the error cases from ClientRMService#move*() were covered in TestMoveApplication. The updated patch covers the one that was missing: checking permissions. > RM changes for moving apps between queues > - > > Key: YARN-1504 > URL: https://issues.apache.org/jira/browse/YARN-1504 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1504-1.patch, YARN-1504.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data
[ https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-1530: - Attachment: application timeline design-20140130.pdf > [Umbrella] Store, manage and serve per-framework application-timeline data > -- > > Key: YARN-1530 > URL: https://issues.apache.org/jira/browse/YARN-1530 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli > Attachments: application timeline design-20140108.pdf, application > timeline design-20140116.pdf, application timeline design-20140130.pdf > > > This is a sibling JIRA for YARN-321. > Today, each application/framework has to do store, and serve per-framework > data all by itself as YARN doesn't have a common solution. This JIRA attempts > to solve the storage, management and serving of per-framework data from > various applications, both running and finished. The aim is to change YARN to > collect and store data in a generic manner with plugin points for frameworks > to do their own thing w.r.t interpretation and serving. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1461) RM API and RM changes to handle tags for running jobs
[ https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1461: --- Attachment: yarn-1461-9.patch Patch that sets the default scope to ALL, and addresses [~zjshen]'s review comments. > RM API and RM changes to handle tags for running jobs > - > > Key: YARN-1461 > URL: https://issues.apache.org/jira/browse/YARN-1461 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch, > yarn-1461-4.patch, yarn-1461-5.patch, yarn-1461-6.patch, yarn-1461-6.patch, > yarn-1461-7.patch, yarn-1461-8.patch, yarn-1461-9.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1634) Define an in-memory implementation of ApplicationTimelineStore
[ https://issues.apache.org/jira/browse/YARN-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1634: -- Attachment: YARN-1634.1.patch Upload a patch of in-memory implementation of ApplicationTimelineStore wit the test cases available. > Define an in-memory implementation of ApplicationTimelineStore > -- > > Key: YARN-1634 > URL: https://issues.apache.org/jira/browse/YARN-1634 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > Attachments: YARN-1634.1.patch > > > As per the design doc, the store needs to pluggable. We need a base > interface, and an in-memory implementation for testing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1659) Define ApplicationTimelineStore interface and store-facing entity, entity-info and event objects
[ https://issues.apache.org/jira/browse/YARN-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-1659: - Attachment: YARN-1659-3.patch > Define ApplicationTimelineStore interface and store-facing entity, > entity-info and event objects > > > Key: YARN-1659 > URL: https://issues.apache.org/jira/browse/YARN-1659 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi > Attachments: YARN-1659-1.patch, YARN-1659-3.patch, YARN-1659.2.patch > > > These will be used by ApplicationTimelineStore interface. The web services > will convert the store-facing obects to the user-facing objects. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1634) Define an in-memory implementation of ApplicationTimelineStore
[ https://issues.apache.org/jira/browse/YARN-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886969#comment-13886969 ] Hadoop QA commented on YARN-1634: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626156/YARN-1634.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2968//console This message is automatically generated. > Define an in-memory implementation of ApplicationTimelineStore > -- > > Key: YARN-1634 > URL: https://issues.apache.org/jira/browse/YARN-1634 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > Attachments: YARN-1634.1.patch > > > As per the design doc, the store needs to pluggable. We need a base > interface, and an in-memory implementation for testing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1504) RM changes for moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886968#comment-13886968 ] Hadoop QA commented on YARN-1504: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626147/YARN-1504-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2966//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2966//console This message is automatically generated. > RM changes for moving apps between queues > - > > Key: YARN-1504 > URL: https://issues.apache.org/jira/browse/YARN-1504 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1504-1.patch, YARN-1504.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1461) RM API and RM changes to handle tags for running jobs
[ https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887004#comment-13887004 ] Hadoop QA commented on YARN-1461: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626151/yarn-1461-9.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2967//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2967//console This message is automatically generated. > RM API and RM changes to handle tags for running jobs > - > > Key: YARN-1461 > URL: https://issues.apache.org/jira/browse/YARN-1461 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch, > yarn-1461-4.patch, yarn-1461-5.patch, yarn-1461-6.patch, yarn-1461-6.patch, > yarn-1461-7.patch, yarn-1461-8.patch, yarn-1461-9.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-713: Target Version/s: 2.3.0 > ResourceManager can exit unexpectedly if DNS is unavailable > --- > > Key: YARN-713 > URL: https://issues.apache.org/jira/browse/YARN-713 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe >Assignee: Omkar Vinit Joshi >Priority: Critical > Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, > YARN-713.1.patch, YARN-713.2.patch, YARN-713.20130910.1.patch, > YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch > > > As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could > lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and > that ultimately would cause the RM to exit. The RM should not exit during > DNS hiccups. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1635) Implement a Leveldb based ApplicationTimelineStore
[ https://issues.apache.org/jira/browse/YARN-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-1635: - Attachment: YARN-1635.1.patch > Implement a Leveldb based ApplicationTimelineStore > -- > > Key: YARN-1635 > URL: https://issues.apache.org/jira/browse/YARN-1635 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Billie Rinaldi > Attachments: YARN-1635.1.patch > > > As per the design doc, we need a levelDB + local-filesystem based > implementation to start with and for small deployments. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887036#comment-13887036 ] Hudson commented on YARN-321: - SUCCESS: Integrated in Hadoop-trunk-Commit #5074 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5074/]) Updating trunk's YARN CHANGES.txt after YARN-321 merge. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562950) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1659) Define ApplicationTimelineStore interface and store-facing entity, entity-info and event objects
[ https://issues.apache.org/jira/browse/YARN-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887047#comment-13887047 ] Hadoop QA commented on YARN-1659: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626159/YARN-1659-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2969//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2969//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-applicationhistoryservice.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2969//console This message is automatically generated. > Define ApplicationTimelineStore interface and store-facing entity, > entity-info and event objects > > > Key: YARN-1659 > URL: https://issues.apache.org/jira/browse/YARN-1659 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi > Attachments: YARN-1659-1.patch, YARN-1659-3.patch, YARN-1659.2.patch > > > These will be used by ApplicationTimelineStore interface. The web services > will convert the store-facing obects to the user-facing objects. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1498) Common scheduler changes for moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887043#comment-13887043 ] Sandy Ryza commented on YARN-1498: -- bq. AppSchedulingInfo: Not sure I understand the relevance of the following change to this JIRA. Am I missing something or is it just cleanup? This isn't required, but I thought it made the code clearer, as we're adding to the places that incrPendingResources and decrPendingResources get called. AppSchedulingInfo#move is confusing already, and I wanted to avoid having a monstrosity like {code} -metrics.incrPendingResources(user, request.getNumContainers() -- lastRequestContainers, Resources.subtractFrom( // save a clone -Resources.multiply(request.getCapability(), request -.getNumContainers()), Resources.multiply(lastRequestCapability, -lastRequestContainers))); {code} in it. Can revert it if you think it's not worth it. bq. Can we throw an Exception instead of returning null. I copied this from the Capacity Scheduler, so would rather keep it consistent. > Common scheduler changes for moving apps between queues > --- > > Key: YARN-1498 > URL: https://issues.apache.org/jira/browse/YARN-1498 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1498-1.patch, YARN-1498.patch, YARN-1498.patch > > > This JIRA is to track changes that aren't in particular schedulers but that > help them support moving apps between queues. In particular, it makes sure > that QueueMetrics are properly updated when an app changes queue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1659) Define ApplicationTimelineStore interface and store-facing entity, entity-info and event objects
[ https://issues.apache.org/jira/browse/YARN-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-1659: - Attachment: YARN-1659-4.patch Fixed findbugs warnings. > Define ApplicationTimelineStore interface and store-facing entity, > entity-info and event objects > > > Key: YARN-1659 > URL: https://issues.apache.org/jira/browse/YARN-1659 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi > Attachments: YARN-1659-1.patch, YARN-1659-3.patch, YARN-1659-4.patch, > YARN-1659.2.patch > > > These will be used by ApplicationTimelineStore interface. The web services > will convert the store-facing obects to the user-facing objects. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1678) Fair scheduler gabs incessantly about reservations
Sandy Ryza created YARN-1678: Summary: Fair scheduler gabs incessantly about reservations Key: YARN-1678 URL: https://issues.apache.org/jira/browse/YARN-1678 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Come on FS. We really don't need to know every time a node with a reservation on it heartbeats. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1666) Make admin refreshNodes work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1666: Attachment: YARN-1666.1.patch create the patch based on YARN-1611 for refreshNodes changes > Make admin refreshNodes work across RM failover > --- > > Key: YARN-1666 > URL: https://issues.apache.org/jira/browse/YARN-1666 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-1666.1.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1659) Define ApplicationTimelineStore interface and store-facing entity, entity-info and event objects
[ https://issues.apache.org/jira/browse/YARN-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887168#comment-13887168 ] Hadoop QA commented on YARN-1659: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626191/YARN-1659-4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2970//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2970//console This message is automatically generated. > Define ApplicationTimelineStore interface and store-facing entity, > entity-info and event objects > > > Key: YARN-1659 > URL: https://issues.apache.org/jira/browse/YARN-1659 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi > Attachments: YARN-1659-1.patch, YARN-1659-3.patch, YARN-1659-4.patch, > YARN-1659.2.patch > > > These will be used by ApplicationTimelineStore interface. The web services > will convert the store-facing obects to the user-facing objects. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1678) Fair scheduler gabs incessantly about reservations
[ https://issues.apache.org/jira/browse/YARN-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1678: - Attachment: YARN-1678.patch > Fair scheduler gabs incessantly about reservations > -- > > Key: YARN-1678 > URL: https://issues.apache.org/jira/browse/YARN-1678 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1678.patch > > > Come on FS. We really don't need to know every time a node with a reservation > on it heartbeats. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1678) Fair scheduler gabs incessantly about reservations
[ https://issues.apache.org/jira/browse/YARN-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887192#comment-13887192 ] Sandy Ryza commented on YARN-1678: -- Attached patch avoids unnecessary info messages and documents some of the reserve code in AppSchedulable > Fair scheduler gabs incessantly about reservations > -- > > Key: YARN-1678 > URL: https://issues.apache.org/jira/browse/YARN-1678 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1678.patch > > > Come on FS. We really don't need to know every time a node with a reservation > on it heartbeats. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1617) Remove ancient comment and surround LOG.debug in AppSchedulingInfo.allocate
[ https://issues.apache.org/jira/browse/YARN-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887193#comment-13887193 ] Sandy Ryza commented on YARN-1617: -- Thanks for the reviews Akira and Karthik. Committing this. > Remove ancient comment and surround LOG.debug in AppSchedulingInfo.allocate > --- > > Key: YARN-1617 > URL: https://issues.apache.org/jira/browse/YARN-1617 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1617.patch > > > {code} > synchronized private void allocate(Container container) { > // Update consumption and track allocations > //TODO: fixme sharad > /* try { > store.storeContainer(container); > } catch (IOException ie) { > // TODO fix this. we shouldnt ignore > }*/ > > LOG.debug("allocate: applicationId=" + applicationId + " container=" > + container.getId() + " host=" > + container.getNodeId().toString()); > } > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1617) Remove ancient comment and surround LOG.debug in AppSchedulingInfo.allocate
[ https://issues.apache.org/jira/browse/YARN-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887207#comment-13887207 ] Hudson commented on YARN-1617: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5076 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5076/]) YARN-1617. Remove ancient comment and surround LOG.debug in AppSchedulingInfo.allocate (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1563004) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java > Remove ancient comment and surround LOG.debug in AppSchedulingInfo.allocate > --- > > Key: YARN-1617 > URL: https://issues.apache.org/jira/browse/YARN-1617 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 2.4.0 > > Attachments: YARN-1617.patch > > > {code} > synchronized private void allocate(Container container) { > // Update consumption and track allocations > //TODO: fixme sharad > /* try { > store.storeContainer(container); > } catch (IOException ie) { > // TODO fix this. we shouldnt ignore > }*/ > > LOG.debug("allocate: applicationId=" + applicationId + " container=" > + container.getId() + " host=" > + container.getNodeId().toString()); > } > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1611) Make admin refresh of configuration work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887221#comment-13887221 ] Sandy Ryza commented on YARN-1611: -- Thanks Xuan. One other thing is that the current patch won't work for refreshing queues for the Fair Scheduler, which does not get its settings from a Configuration object. The fair-scheduler.xml file is in a different format than a typical Hadoop configuration file. > Make admin refresh of configuration work across RM failover > --- > > Key: YARN-1611 > URL: https://issues.apache.org/jira/browse/YARN-1611 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-1611.1.patch, YARN-1611.2.patch, YARN-1611.2.patch, > YARN-1611.3.patch, YARN-1611.3.patch, YARN-1611.4.patch, YARN-1611.5.patch > > > Currently, If we do refresh* for a standby RM, it will failover to the > current active RM, and do the refresh* based on the local configuration file > of the active RM. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1678) Fair scheduler gabs incessantly about reservations
[ https://issues.apache.org/jira/browse/YARN-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887232#comment-13887232 ] Hadoop QA commented on YARN-1678: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626205/YARN-1678.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2971//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2971//console This message is automatically generated. > Fair scheduler gabs incessantly about reservations > -- > > Key: YARN-1678 > URL: https://issues.apache.org/jira/browse/YARN-1678 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1678.patch > > > Come on FS. We really don't need to know every time a node with a reservation > on it heartbeats. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1611) Make admin refresh of scheduler configuration work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1611: -- Summary: Make admin refresh of scheduler configuration work across RM failover (was: Make admin refresh of configuration work across RM failover) Editing title as we are only focusing on scheduler configuration in this ticket. > Make admin refresh of scheduler configuration work across RM failover > - > > Key: YARN-1611 > URL: https://issues.apache.org/jira/browse/YARN-1611 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-1611.1.patch, YARN-1611.2.patch, YARN-1611.2.patch, > YARN-1611.3.patch, YARN-1611.3.patch, YARN-1611.4.patch, YARN-1611.5.patch > > > Currently, If we do refresh* for a standby RM, it will failover to the > current active RM, and do the refresh* based on the local configuration file > of the active RM. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1504) RM changes for moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887252#comment-13887252 ] Karthik Kambatla commented on YARN-1504: Looks good to me, +1. Let us leave this open for a day, in case anyone else wants to take a look at it. > RM changes for moving apps between queues > - > > Key: YARN-1504 > URL: https://issues.apache.org/jira/browse/YARN-1504 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1504-1.patch, YARN-1504.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1461) RM API and RM changes to handle tags for running jobs
[ https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887257#comment-13887257 ] Zhijie Shen commented on YARN-1461: --- bq. I am sorry, I didn't quite get that. Are you suggesting not having methods to get and set Scope in GetApplicationsRequest? If yes, how do you propose we allow users to set the Scope to the non-default value? No, I mean the generated proto class is not supposed to be used in API records. Usually, what we do is to define a Java enum, use it in API records, and map it to the corresponding proto class. The mapping is invoked in PBImpl of the API records. Please take a look at YarnApplicationState and YarnApplicationStateProto. > RM API and RM changes to handle tags for running jobs > - > > Key: YARN-1461 > URL: https://issues.apache.org/jira/browse/YARN-1461 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch, > yarn-1461-4.patch, yarn-1461-5.patch, yarn-1461-6.patch, yarn-1461-6.patch, > yarn-1461-7.patch, yarn-1461-8.patch, yarn-1461-9.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1498) Common scheduler changes for moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887260#comment-13887260 ] Karthik Kambatla commented on YARN-1498: The arithmetic change in AppSchedulingInfo seems to be tested by several tests - assuming the change is correct. +1. > Common scheduler changes for moving apps between queues > --- > > Key: YARN-1498 > URL: https://issues.apache.org/jira/browse/YARN-1498 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1498-1.patch, YARN-1498.patch, YARN-1498.patch > > > This JIRA is to track changes that aren't in particular schedulers but that > help them support moving apps between queues. In particular, it makes sure > that QueueMetrics are properly updated when an app changes queue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1678) Fair scheduler gabs incessantly about reservations
[ https://issues.apache.org/jira/browse/YARN-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1678: - Description: Come on FS. We really don't need to know every time a node with a reservation on it heartbeats. {code} 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Trying to fulfill reservation for application appattempt_1390547864213_0347_01 on node: host: a2330.halxg.cloudera.com:8041 #containers=8 available= used= 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: Making reservation: node=a2330.halxg.cloudera.com app_id=application_1390547864213_0347 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: Application application_1390547864213_0347 reserved container container_1390547864213_0347_01_03 on node host: a2330.halxg.cloudera.com:8041 #containers=8 available= used=, currently has 6 at priority 0; currentReservation 6144 2014-01-29 03:48:16,044 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: Updated reserved container container_1390547864213_0347_01_03 on node host: a2330.halxg.cloudera.com:8041 #containers=8 available= used= for application org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@1cb01d20 {code} was:Come on FS. We really don't need to know every time a node with a reservation on it heartbeats. > Fair scheduler gabs incessantly about reservations > -- > > Key: YARN-1678 > URL: https://issues.apache.org/jira/browse/YARN-1678 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1678.patch > > > Come on FS. We really don't need to know every time a node with a reservation > on it heartbeats. > {code} > 2014-01-29 03:48:16,043 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Trying to fulfill reservation for application > appattempt_1390547864213_0347_01 on node: host: > a2330.halxg.cloudera.com:8041 #containers=8 available= > used= > 2014-01-29 03:48:16,043 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: > Making reservation: node=a2330.halxg.cloudera.com > app_id=application_1390547864213_0347 > 2014-01-29 03:48:16,043 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > Application application_1390547864213_0347 reserved container > container_1390547864213_0347_01_03 on node host: > a2330.halxg.cloudera.com:8041 #containers=8 available= > used=, currently has 6 at priority 0; > currentReservation 6144 > 2014-01-29 03:48:16,044 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: > Updated reserved container container_1390547864213_0347_01_03 on node > host: a2330.halxg.cloudera.com:8041 #containers=8 available= vCores:8> used= for application > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@1cb01d20 > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1678) Fair scheduler gabs incessantly about reservations
[ https://issues.apache.org/jira/browse/YARN-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887277#comment-13887277 ] Sandy Ryza commented on YARN-1678: -- Looks like I was missing an exclamation point. > Fair scheduler gabs incessantly about reservations > -- > > Key: YARN-1678 > URL: https://issues.apache.org/jira/browse/YARN-1678 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1678-1.patch, YARN-1678.patch > > > Come on FS. We really don't need to know every time a node with a reservation > on it heartbeats. > {code} > 2014-01-29 03:48:16,043 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Trying to fulfill reservation for application > appattempt_1390547864213_0347_01 on node: host: > a2330.halxg.cloudera.com:8041 #containers=8 available= > used= > 2014-01-29 03:48:16,043 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: > Making reservation: node=a2330.halxg.cloudera.com > app_id=application_1390547864213_0347 > 2014-01-29 03:48:16,043 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > Application application_1390547864213_0347 reserved container > container_1390547864213_0347_01_03 on node host: > a2330.halxg.cloudera.com:8041 #containers=8 available= > used=, currently has 6 at priority 0; > currentReservation 6144 > 2014-01-29 03:48:16,044 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: > Updated reserved container container_1390547864213_0347_01_03 on node > host: a2330.halxg.cloudera.com:8041 #containers=8 available= vCores:8> used= for application > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@1cb01d20 > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1678) Fair scheduler gabs incessantly about reservations
[ https://issues.apache.org/jira/browse/YARN-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1678: - Attachment: YARN-1678-1.patch > Fair scheduler gabs incessantly about reservations > -- > > Key: YARN-1678 > URL: https://issues.apache.org/jira/browse/YARN-1678 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1678-1.patch, YARN-1678.patch > > > Come on FS. We really don't need to know every time a node with a reservation > on it heartbeats. > {code} > 2014-01-29 03:48:16,043 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Trying to fulfill reservation for application > appattempt_1390547864213_0347_01 on node: host: > a2330.halxg.cloudera.com:8041 #containers=8 available= > used= > 2014-01-29 03:48:16,043 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: > Making reservation: node=a2330.halxg.cloudera.com > app_id=application_1390547864213_0347 > 2014-01-29 03:48:16,043 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > Application application_1390547864213_0347 reserved container > container_1390547864213_0347_01_03 on node host: > a2330.halxg.cloudera.com:8041 #containers=8 available= > used=, currently has 6 at priority 0; > currentReservation 6144 > 2014-01-29 03:48:16,044 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: > Updated reserved container container_1390547864213_0347_01_03 on node > host: a2330.halxg.cloudera.com:8041 #containers=8 available= vCores:8> used= for application > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@1cb01d20 > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1498) Common scheduler changes for moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887289#comment-13887289 ] Sandy Ryza commented on YARN-1498: -- Committed this to trunk. > Common scheduler changes for moving apps between queues > --- > > Key: YARN-1498 > URL: https://issues.apache.org/jira/browse/YARN-1498 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 3.0.0 > > Attachments: YARN-1498-1.patch, YARN-1498.patch, YARN-1498.patch > > > This JIRA is to track changes that aren't in particular schedulers but that > help them support moving apps between queues. In particular, it makes sure > that QueueMetrics are properly updated when an app changes queue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1498) Common scheduler changes for moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887301#comment-13887301 ] Hudson commented on YARN-1498: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5078 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5078/]) YARN-1498. Common scheduler changes for moving apps between queues (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1563021) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/Queue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestQueueMetrics.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerApplicationAttempt.java > Common scheduler changes for moving apps between queues > --- > > Key: YARN-1498 > URL: https://issues.apache.org/jira/browse/YARN-1498 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 3.0.0 > > Attachments: YARN-1498-1.patch, YARN-1498.patch, YARN-1498.patch > > > This JIRA is to track changes that aren't in particular schedulers but that > help them support moving apps between queues. In particular, it makes sure > that QueueMetrics are properly updated when an app changes queue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1678) Fair scheduler gabs incessantly about reservations
[ https://issues.apache.org/jira/browse/YARN-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887319#comment-13887319 ] Hadoop QA commented on YARN-1678: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626219/YARN-1678-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2972//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2972//console This message is automatically generated. > Fair scheduler gabs incessantly about reservations > -- > > Key: YARN-1678 > URL: https://issues.apache.org/jira/browse/YARN-1678 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1678-1.patch, YARN-1678.patch > > > Come on FS. We really don't need to know every time a node with a reservation > on it heartbeats. > {code} > 2014-01-29 03:48:16,043 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Trying to fulfill reservation for application > appattempt_1390547864213_0347_01 on node: host: > a2330.halxg.cloudera.com:8041 #containers=8 available= > used= > 2014-01-29 03:48:16,043 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: > Making reservation: node=a2330.halxg.cloudera.com > app_id=application_1390547864213_0347 > 2014-01-29 03:48:16,043 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > Application application_1390547864213_0347 reserved container > container_1390547864213_0347_01_03 on node host: > a2330.halxg.cloudera.com:8041 #containers=8 available= > used=, currently has 6 at priority 0; > currentReservation 6144 > 2014-01-29 03:48:16,044 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: > Updated reserved container container_1390547864213_0347_01_03 on node > host: a2330.halxg.cloudera.com:8041 #containers=8 available= vCores:8> used= for application > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@1cb01d20 > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1461) RM API and RM changes to handle tags for running jobs
[ https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887322#comment-13887322 ] Karthik Kambatla commented on YARN-1461: bq. Usually, what we do is to define a Java enum, use it in API records, and map it to the corresponding proto class. The mapping is invoked in PBImpl of the API records. Please take a look at YarnApplicationState and YarnApplicationStateProto. I see. Just looked at YarnApplicationState and YarnApplicationStateProto. We could do something similar for this too. However, I am curious why having two different enums, one in Java and one in proto, and a converter between the two is preferable to just having one enum and no converter? Particularly, in this case, it is always going to be a 1:1 mapping between the two. > RM API and RM changes to handle tags for running jobs > - > > Key: YARN-1461 > URL: https://issues.apache.org/jira/browse/YARN-1461 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch, > yarn-1461-4.patch, yarn-1461-5.patch, yarn-1461-6.patch, yarn-1461-6.patch, > yarn-1461-7.patch, yarn-1461-8.patch, yarn-1461-9.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1611) Make admin refresh of scheduler configuration work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1611: Attachment: YARN-1611.6.patch > Make admin refresh of scheduler configuration work across RM failover > - > > Key: YARN-1611 > URL: https://issues.apache.org/jira/browse/YARN-1611 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-1611.1.patch, YARN-1611.2.patch, YARN-1611.2.patch, > YARN-1611.3.patch, YARN-1611.3.patch, YARN-1611.4.patch, YARN-1611.5.patch, > YARN-1611.6.patch > > > Currently, If we do refresh* for a standby RM, it will failover to the > current active RM, and do the refresh* based on the local configuration file > of the active RM. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1611) Make admin refresh of scheduler configuration work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887339#comment-13887339 ] Xuan Gong commented on YARN-1611: - bq. Fix formatting in the code. There are serveral non-standard instances of formatting, and lines crossing 80 char boundary. DONE bq. conf.store -> remote-configuration.store changed bq. Change DEFAULT_RM_CONF_STORE to be something like /yarn/conf? changed bq.Add javadoc for RemoteConfiguration the class and all the methods. added bq. Same for RemoteConfigurationFactory. added bq. Move RC and RCF to yarn.conf package Moved bq. FileSystemBasedRemoteConfiguration: If path doesn't exist, we should not silently log it. Throw an exception. changed bq.CapacityScheduler.java: Remote conf is loaded only on refresh but not on init? Yes, we need to do this. But this will be fixed in YARN-1459 bq. Instead of getConfigurationFileName(), create static constants for each config-file and directly code them into the caller. DONE. Just create CS_CONFIGURATION_FILE in this ticket. Will create other conf file within related jira tickets bq. When can conf be null in HA mode? Even if it can, the response to refresh to indicate an exception. If we throw out exception in FileSystemBasedRemoteConfiguration If path doesn't exist. So, the conf will not be null. Changed. > Make admin refresh of scheduler configuration work across RM failover > - > > Key: YARN-1611 > URL: https://issues.apache.org/jira/browse/YARN-1611 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-1611.1.patch, YARN-1611.2.patch, YARN-1611.2.patch, > YARN-1611.3.patch, YARN-1611.3.patch, YARN-1611.4.patch, YARN-1611.5.patch, > YARN-1611.6.patch > > > Currently, If we do refresh* for a standby RM, it will failover to the > current active RM, and do the refresh* based on the local configuration file > of the active RM. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1611) Make admin refresh of capacity scheduler configuration work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1611: Summary: Make admin refresh of capacity scheduler configuration work across RM failover (was: Make admin refresh of scheduler configuration work across RM failover) > Make admin refresh of capacity scheduler configuration work across RM failover > -- > > Key: YARN-1611 > URL: https://issues.apache.org/jira/browse/YARN-1611 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-1611.1.patch, YARN-1611.2.patch, YARN-1611.2.patch, > YARN-1611.3.patch, YARN-1611.3.patch, YARN-1611.4.patch, YARN-1611.5.patch, > YARN-1611.6.patch > > > Currently, If we do refresh* for a standby RM, it will failover to the > current active RM, and do the refresh* based on the local configuration file > of the active RM. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1679) Make admin refresh of Fair scheduler configuration work across RM failover
Xuan Gong created YARN-1679: --- Summary: Make admin refresh of Fair scheduler configuration work across RM failover Key: YARN-1679 URL: https://issues.apache.org/jira/browse/YARN-1679 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1611) Make admin refresh of capacity scheduler configuration work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887382#comment-13887382 ] Hadoop QA commented on YARN-1611: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626235/YARN-1611.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2973//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2973//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2973//console This message is automatically generated. > Make admin refresh of capacity scheduler configuration work across RM failover > -- > > Key: YARN-1611 > URL: https://issues.apache.org/jira/browse/YARN-1611 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-1611.1.patch, YARN-1611.2.patch, YARN-1611.2.patch, > YARN-1611.3.patch, YARN-1611.3.patch, YARN-1611.4.patch, YARN-1611.5.patch, > YARN-1611.6.patch > > > Currently, If we do refresh* for a standby RM, it will failover to the > current active RM, and do the refresh* based on the local configuration file > of the active RM. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1611) Make admin refresh of capacity scheduler configuration work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887391#comment-13887391 ] Xuan Gong commented on YARN-1611: - This -1 on findbug is unrelated. > Make admin refresh of capacity scheduler configuration work across RM failover > -- > > Key: YARN-1611 > URL: https://issues.apache.org/jira/browse/YARN-1611 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-1611.1.patch, YARN-1611.2.patch, YARN-1611.2.patch, > YARN-1611.3.patch, YARN-1611.3.patch, YARN-1611.4.patch, YARN-1611.5.patch, > YARN-1611.6.patch > > > Currently, If we do refresh* for a standby RM, it will failover to the > current active RM, and do the refresh* based on the local configuration file > of the active RM. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1461) RM API and RM changes to handle tags for running jobs
[ https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1461: --- Attachment: yarn-1461-10.patch Updated patch adds a new Java enum ApplicationsRequestScope to complement the proto enum ApplicationsRequestScopeProto. > RM API and RM changes to handle tags for running jobs > - > > Key: YARN-1461 > URL: https://issues.apache.org/jira/browse/YARN-1461 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-1461-1.patch, yarn-1461-10.patch, > yarn-1461-2.patch, yarn-1461-3.patch, yarn-1461-4.patch, yarn-1461-5.patch, > yarn-1461-6.patch, yarn-1461-6.patch, yarn-1461-7.patch, yarn-1461-8.patch, > yarn-1461-9.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1461) RM API and RM changes to handle tags for running jobs
[ https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887430#comment-13887430 ] Hadoop QA commented on YARN-1461: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626247/yarn-1461-10.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2974//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2974//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2974//console This message is automatically generated. > RM API and RM changes to handle tags for running jobs > - > > Key: YARN-1461 > URL: https://issues.apache.org/jira/browse/YARN-1461 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-1461-1.patch, yarn-1461-10.patch, > yarn-1461-2.patch, yarn-1461-3.patch, yarn-1461-4.patch, yarn-1461-5.patch, > yarn-1461-6.patch, yarn-1461-6.patch, yarn-1461-7.patch, yarn-1461-8.patch, > yarn-1461-9.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1662) Capacity Scheduler reservation issue cause Job Hang
[ https://issues.apache.org/jira/browse/YARN-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887471#comment-13887471 ] Sunil G commented on YARN-1662: --- A timed reservation logic if we can implement here, then it will be safer for the fresh allocation to try in some other node. I have reviewd the scheduler part and found that without a seperate timer thread, this can be achieved. addReReservation() will be invoked when the same node tries to rereserve the same applications requests in the node. This is a multiset, hence the internal count will increment everytime when this addReReservation() is performed. Also this will be incremented in every 1 sec(node heartbeat interval) only. I wish to add a code like below in LeafQueue::assignContainer() method. If the limit exceeds, i will try unreseve the same from the node. This code will hit when the same application trying to re-reserve again in same node. } else { // Reserve by 'charging' in advance... reserve(application, priority, node, rmContainer, container); // Check for re-reservation limit. In this case, unreserve and try for a // fresh allocation. if (RESERVATION_TIME_LIMIT != 0 && application.getReReservations(priority) > RESERVATION_TIME_LIMIT) { unreserve(application, priority, node, rmContainer); return Resources.none(); } So for the next nodeupdate from some other node, CS can try allocate resource to this application. NB: Reservation is to ensure that same task can stick on to same node where its better to run. A bigger configurable limit which is based on the nature of the tasks running, can still achieve the above behavior. Please share your thoughts. > Capacity Scheduler reservation issue cause Job Hang > --- > > Key: YARN-1662 > URL: https://issues.apache.org/jira/browse/YARN-1662 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.2.0 > Environment: Suse 11 SP1 + Linux >Reporter: Sunil G > > There are 2 node managers in my cluster. > NM1 with 8GB > NM2 with 8GB > I am submitting a Job with below details: > AM with 2GB > Map needs 5GB > Reducer needs 3GB > slowstart is enabled with 0.5 > 10maps and 50reducers are assigned. > 5maps are completed. Now few reducers got scheduled. > Now NM1 has 2GB AM and 3Gb Reducer_1[Used 5GB] > NM2 has 3Gb Reducer_2 [Used 3GB] > A Map has now reserved(5GB) in NM1 which has only 3Gb free. > It hangs forever. > Potential issue is, reservation is now blocked in NM1 for a Map which needs > 5GB. > But the Reducer_1 hangs by waiting for few map ouputs. > Reducer side preemption also not happened as few headroom is still available. -- This message was sent by Atlassian JIRA (v6.1.5#6160)