[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side
[ https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948192#comment-14948192 ] Devaraj K commented on YARN-3964: - Thanks [~leftnoteasy] for review and confirmation, [~Naganarasimha] and [~sunilg] for reviews. Thanks [~dian.fu] for the patch, It mostly looks good to me except these minor comments. 1. Can you update the descriptions for the new configs added in yarn-default.xml {code:xml} +The class to use as the node labels fetcher by ResourceManager. It should +extend org.apache.hadoop.yarn.server.resourcemanager.nodelabels. +RMNodeLabelsMappingProvider. {code} Can you update the description like below, 'When node labels "yarn.node-labels.configuration-type" is of type "delegated-centralized", Administrators can configure the class for fetching node labels by ResourceManager. Configured class needs to extend org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsMappingProvider.' {code:xml} +The interval to use to update node labels by ResourceManager. {code} Can we think of having it like 'This interval is used to update the node labels by ResourceManager.'? And also can we describe here that if the value is '-1' then there will not be any timer task gets created. 2. In TestRMDelegatedNodeLabelsUpdater.java, can we have an assertion in catch block to check the expected exception message? {code:xml} } catch (Exception e) { // expected } {code} 3. Can you file a Jira to update the documentation for this? > Support NodeLabelsProvider at Resource Manager side > --- > > Key: YARN-3964 > URL: https://issues.apache.org/jira/browse/YARN-3964 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Dian Fu >Assignee: Dian Fu > Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, > YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, > YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, > YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, > YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, > YARN-3964.014.patch, YARN-3964.015.patch, YARN-3964.1.patch > > > Currently, CLI/REST API is provided in Resource Manager to allow users to > specify labels for nodes. For labels which may change over time, users will > have to start a cron job to update the labels. This has the following > limitations: > - The cron job needs to be run in the YARN admin user. > - This makes it a little complicate to maintain as users will have to make > sure this service/daemon is alive. > Adding a Node Labels Provider in Resource Manager will provide user more > flexibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3964) Support NodeLabelsProvider at Resource Manager side
[ https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated YARN-3964: -- Attachment: YARN-3964.016.patch > Support NodeLabelsProvider at Resource Manager side > --- > > Key: YARN-3964 > URL: https://issues.apache.org/jira/browse/YARN-3964 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Dian Fu >Assignee: Dian Fu > Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, > YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, > YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, > YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, > YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, > YARN-3964.014.patch, YARN-3964.015.patch, YARN-3964.016.patch, > YARN-3964.1.patch > > > Currently, CLI/REST API is provided in Resource Manager to allow users to > specify labels for nodes. For labels which may change over time, users will > have to start a cron job to update the labels. This has the following > limitations: > - The cron job needs to be run in the YARN admin user. > - This makes it a little complicate to maintain as users will have to make > sure this service/daemon is alive. > Adding a Node Labels Provider in Resource Manager will provide user more > flexibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4240) Add documentation for delegated-centralized node labels feature
Dian Fu created YARN-4240: - Summary: Add documentation for delegated-centralized node labels feature Key: YARN-4240 URL: https://issues.apache.org/jira/browse/YARN-4240 Project: Hadoop YARN Issue Type: Sub-task Reporter: Dian Fu Assignee: Dian Fu As a follow up of YARN-3964, we should add documentation for delegated-centralized node labels feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side
[ https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948280#comment-14948280 ] Dian Fu commented on YARN-3964: --- Hi [~devaraj.k], Thanks a lot for your review. Updated the patch accordingly. Have also created ticket YARN-4240 for the documentation. > Support NodeLabelsProvider at Resource Manager side > --- > > Key: YARN-3964 > URL: https://issues.apache.org/jira/browse/YARN-3964 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Dian Fu >Assignee: Dian Fu > Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, > YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, > YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, > YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, > YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, > YARN-3964.014.patch, YARN-3964.015.patch, YARN-3964.016.patch, > YARN-3964.1.patch > > > Currently, CLI/REST API is provided in Resource Manager to allow users to > specify labels for nodes. For labels which may change over time, users will > have to start a cron job to update the labels. This has the following > limitations: > - The cron job needs to be run in the YARN admin user. > - This makes it a little complicate to maintain as users will have to make > sure this service/daemon is alive. > Adding a Node Labels Provider in Resource Manager will provide user more > flexibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3771) "final" behavior is not honored for YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[]
[ https://issues.apache.org/jira/browse/YARN-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948327#comment-14948327 ] Rohith Sharma K S commented on YARN-3771: - +1 for the fixing security hole. One concern about the patch is backward compatibility since array string is changed to List. If any clients are using this default constant, it would cause compilation error to them. I would like to hear comments from the other folks for doing this change. > "final" behavior is not honored for > YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[] > > > Key: YARN-3771 > URL: https://issues.apache.org/jira/browse/YARN-3771 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel > Attachments: 0001-YARN-3771.patch > > > i was going through some find bugs rules. One issue reported in that is > public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = { > and > public static final String[] > DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH= > is not honoring the final qualifier. The string array contents can be re > assigned ! > Simple test > {code} > public class TestClass { > static final String[] t = { "1", "2" }; > public static void main(String[] args) { > System.out.println(12 < 10); > String[] t1={"u"}; > //t = t1; // this will show compilation error > t (1) = t1 (1) ; // But this works > } > } > {code} > One option is to use Collections.unmodifiableList > any thoughts ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4235) FairScheduler PrimaryGroup does not handle empty groups returned for a user
[ https://issues.apache.org/jira/browse/YARN-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948420#comment-14948420 ] Rohith Sharma K S commented on YARN-4235: - +1 lgtm > FairScheduler PrimaryGroup does not handle empty groups returned for a user > > > Key: YARN-4235 > URL: https://issues.apache.org/jira/browse/YARN-4235 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-4235.001.patch > > > We see NPE if empty groups are returned for a user. This causes a NPE and > cause RM to crash as below > {noformat} > 2015-09-22 16:51:52,780 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type APP_ADDED to the scheduler > java.lang.IndexOutOfBoundsException: Index: 0 > at java.util.Collections$EmptyList.get(Collections.java:3212) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$PrimaryGroup.getQueueForApp(QueuePlacementRule.java:149) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:689) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:595) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) > at java.lang.Thread.run(Thread.java:745) > 2015-09-22 16:51:52,797 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side
[ https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948435#comment-14948435 ] Hadoop QA commented on YARN-3964: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 22m 28s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 8m 42s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 13s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 20s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 6s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 4s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 41s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 37s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 2s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 27s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 8s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 51m 0s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 106m 23s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler | | | hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens | | | hadoop.yarn.server.resourcemanager.TestResourceManager | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication | | | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | | hadoop.yarn.server.resourcemanager.TestRMAdminService | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12765542/YARN-3964.016.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 1107bd3 | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9377/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9377/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9377/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9377/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9377/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9377/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9377/console | This message was automatically generated. > Support NodeLabelsProvider at Resource Manager side > --- > > Key: YARN-3964 > URL: https://issues.apache.org/jira/browse/YARN-3964 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Dian Fu >Assignee: Dian Fu > Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, > YARN-3964.003.patch,
[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side
[ https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948587#comment-14948587 ] Hadoop QA commented on YARN-3964: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 15s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 8m 6s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 27s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 19s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 0s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 5s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 42s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 7s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 56m 52s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 107m 57s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12765542/YARN-3964.016.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 1107bd3 | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9378/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9378/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9378/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9378/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9378/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9378/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9378/console | This message was automatically generated. > Support NodeLabelsProvider at Resource Manager side > --- > > Key: YARN-3964 > URL: https://issues.apache.org/jira/browse/YARN-3964 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Dian Fu >Assignee: Dian Fu > Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, > YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, > YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, > YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, > YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, > YARN-3964.014.patch, YARN-3964.015.patch, YARN-3964.016.patch, > YARN-3964.1.patch > > > Currently, CLI/REST API is provided in Resource Manager to allow users to > specify labels for nodes. For labels which may change over time, users will > have to start a cron job to update the labels. This has the following > limitations: > - The cron job needs to be run in the YARN admin user. > - This makes it a little complicate to maintain as users will have to make > sure this service/daemon is alive. > Adding a Node Labels Provider in Resource Manager will provide user more > flexibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers
[ https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948709#comment-14948709 ] MENG DING commented on YARN-1509: - Hi, [~bikassaha] Thanks a lot for the valuable comments! bq. Why are there separate methods for increase and decrease instead of a single method to change the container resource size? By comparing the existing resource allocation to a container and the new requested resource allocation, it should be clear whether an increase or decrease is being requested. As discussed in the design stage, and also described in the design doc, the reason to separate the increase/decrease requests in the APIs and AMRM protocol is to make sure that users will make a conscious decision when they are making these requests. It is also much easier to catch any potential mistakes that the user could make. For example, if a user intends to increase resource of a container, but for whatever reason mistakenly specifies a target resource that is smaller than the current resource, RM can catch that and throw exception. bq. Also, for completeness, is there a need for a cancelContainerResourceChange()? After a container resource change request has been submitted, what are my options as a user other than to wait for the request to be satisfied by the RM? For container resource decrease request, there is practically no chance (and probably no need) to cancel the request, as it happens immediately when scheduler process the request (this is similar to the release container request). For container resource increase, the user can cancel any pending increase request still sitting in RM by sending a decrease request of the same size of the current container size. I will improve the Javadoc description to make it clear on this. bq. If I release the container, then does it mean all pending change requests for that container should be removed? From a quick look at the patch, it does not look like that is being covered, unless I am missing something. You are right that releasing a container should cancel all pending change requests for that container. This is missing in the current implementation, I will add that. bq. What will happen if the AM restarts after submitting a change request. Does the AM-RM re-register protocol need an update to handle the case of re-synchronizing on the change requests? Whats happens if the RM restarts? If these are explained in a document, then please point me to the document. The patch did not seem to have anything around this area. So I thought I would ask The current implementation handles RM restarts by maintaining a pendingIncrease and pendingDecrease map, just like the pendingRelease list. This is covered in the design doc. For AM restarts, I am not sure what we need to do here. Does AM-RM re-register protocol currently handle the re-synchronize of outstanding new container requests after AM is restarted? Will you be able to elaborate a little bit on this? bq. Also, why have the callback interface methods been made non-public? Would that be an incompatible change? All interface methods are implicitly public and abstract. The existing public modifier on these methods are redundant, so I removed them. > Make AMRMClient support send increase container request and get > increased/decreased containers > -- > > Key: YARN-1509 > URL: https://issues.apache.org/jira/browse/YARN-1509 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan (No longer used) >Assignee: MENG DING > Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, > YARN-1509.4.patch, YARN-1509.5.patch > > > As described in YARN-1197, we need add API in AMRMClient to support > 1) Add increase request > 2) Can get successfully increased/decreased containers from RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4005) Completed container whose app is finished is not removed from NMStateStore
[ https://issues.apache.org/jira/browse/YARN-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-4005: - Fix Version/s: (was: 2.8.0) 2.7.2 I pulled this in to branch-2.7 as well. > Completed container whose app is finished is not removed from NMStateStore > -- > > Key: YARN-4005 > URL: https://issues.apache.org/jira/browse/YARN-4005 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jun Gong >Assignee: Jun Gong > Fix For: 2.7.2 > > Attachments: YARN-4005.01.patch > > > If a container is completed and its corresponding app is finished, NM only > removes it from its context and does not add it to > 'recentlyStoppedContainers' when calling 'getContainerStatuses'. Then NM will > not remove it from NMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4005) Completed container whose app is finished is not removed from NMStateStore
[ https://issues.apache.org/jira/browse/YARN-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-4005: - Fix Version/s: 2.6.2 Also committed to branch-2.6. > Completed container whose app is finished is not removed from NMStateStore > -- > > Key: YARN-4005 > URL: https://issues.apache.org/jira/browse/YARN-4005 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jun Gong >Assignee: Jun Gong > Fix For: 2.7.2, 2.6.2 > > Attachments: YARN-4005.01.patch > > > If a container is completed and its corresponding app is finished, NM only > removes it from its context and does not add it to > 'recentlyStoppedContainers' when calling 'getContainerStatuses'. Then NM will > not remove it from NMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3780) Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition
[ https://issues.apache.org/jira/browse/YARN-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-3780: - Fix Version/s: (was: 2.8.0) 2.6.2 2.7.2 I committed this to branch-2.7 and branch-2.6 as well. > Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition > - > > Key: YARN-3780 > URL: https://issues.apache.org/jira/browse/YARN-3780 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > Fix For: 2.7.2, 2.6.2 > > Attachments: YARN-3780.000.patch > > > Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition > to avoid unnecessary NodeResourceUpdateSchedulerEvent. > The current code use {{!=}} to compare Resource totalCapability, which will > compare reference not the real value in Resource. So we should use equals to > compare Resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948870#comment-14948870 ] zhihai xu commented on YARN-3943: - The checkstyle issues and release audit warnings for the new patch YARN-3943.002.patch were pre-existing. > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3943.000.patch, YARN-3943.001.patch, > YARN-3943.002.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3802) Two RMNodes for the same NodeId are used in RM sometimes after NM is reconnected.
[ https://issues.apache.org/jira/browse/YARN-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-3802: - Fix Version/s: (was: 2.8.0) 2.6.2 2.7.2 I committed this to branch-2.7 and branch-2.6 as well. > Two RMNodes for the same NodeId are used in RM sometimes after NM is > reconnected. > - > > Key: YARN-3802 > URL: https://issues.apache.org/jira/browse/YARN-3802 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu > Fix For: 2.7.2, 2.6.2 > > Attachments: YARN-3802.000.patch, YARN-3802.001.patch > > > Two RMNodes for the same NodeId are used in RM sometimes after NM is > reconnected. Scheduler and RMContext use different RMNode reference for the > same NodeId sometimes after NM is reconnected, which is not correct. > Scheduler and RMContext should always use same RMNode reference for the same > NodeId. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4201) AMBlacklist does not work for minicluster
[ https://issues.apache.org/jira/browse/YARN-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948914#comment-14948914 ] zhihai xu commented on YARN-4201: - Thanks for the patch [~hex108]! It is a good catch. Should we use {{SchedulerNode#getNodeName}} to get the blacklisted node name? We can add {{getSchedulerNode}} to {{YarnScheduler}}, So we can call {{getSchedulerNode}} to look up the the SchedulerNode using NodeId in {{RMAppAttemptImpl}}. > AMBlacklist does not work for minicluster > - > > Key: YARN-4201 > URL: https://issues.apache.org/jira/browse/YARN-4201 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-4021.001.patch > > > For minicluster (scheduler.include-port-in-node-name is set to TRUE), > AMBlacklist does not work. It is because RM just puts host to AMBlacklist > whether scheduler.include-port-in-node-name is set or not. In fact RM should > put "host + port" to AMBlacklist when it is set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4201) AMBlacklist does not work for minicluster
[ https://issues.apache.org/jira/browse/YARN-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948931#comment-14948931 ] zhihai xu commented on YARN-4201: - Currently {{getSchedulerNode}} is defined at {{AbstractYarnScheduler}}. {{SchedulerAppUtils.isBlacklisted}} uses {{node.getNodeName()}} to check blacklisted node. So it will be good to use the same way to get blacklisted node name. All the configuration and format related to node name will be only in SchedulerNode.java. > AMBlacklist does not work for minicluster > - > > Key: YARN-4201 > URL: https://issues.apache.org/jira/browse/YARN-4201 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-4021.001.patch > > > For minicluster (scheduler.include-port-in-node-name is set to TRUE), > AMBlacklist does not work. It is because RM just puts host to AMBlacklist > whether scheduler.include-port-in-node-name is set or not. In fact RM should > put "host + port" to AMBlacklist when it is set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3194) RM should handle NMContainerStatuses sent by NM while registering if NM is Reconnected node
[ https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-3194: - Fix Version/s: 2.6.2 I committed this to branch-2.6 as well. > RM should handle NMContainerStatuses sent by NM while registering if NM is > Reconnected node > --- > > Key: YARN-3194 > URL: https://issues.apache.org/jira/browse/YARN-3194 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: NM restart is enabled >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Blocker > Fix For: 2.7.0, 2.6.2 > > Attachments: 0001-YARN-3194.patch, 0001-yarn-3194-v1.patch > > > On NM restart ,NM sends all the outstanding NMContainerStatus to RM during > registration. The registration can be treated by RM as New node or > Reconnecting node. RM triggers corresponding event on the basis of node added > or node reconnected state. > # Node added event : Again here 2 scenario's can occur > ## New node is registering with different ip:port – NOT A PROBLEM > ## Old node is re-registering because of RESYNC command from RM when RM > restart – NOT A PROBLEM > # Node reconnected event : > ## Existing node is re-registering i.e RM treat it as reconnecting node when > RM is not restarted > ### NM RESTART NOT Enabled – NOT A PROBLEM > ### NM RESTART is Enabled > Some applications are running on this node – *Problem is here* > Zero applications are running on this node – NOT A PROBLEM > Since NMContainerStatus are not handled, RM never get to know about > completedContainer and never release resource held be containers. RM will not > allocate new containers for pending resource request as long as the > completedContainer event is triggered. This results in applications to wait > indefinitly because of pending containers are not served by RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously
[ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-3896: - Fix Version/s: (was: 2.8.0) 2.6.2 2.7.2 I committed this to branch-2.7 and branch-2.6 as well. > RMNode transitioned from RUNNING to REBOOTED because its response id had not > been reset synchronously > - > > Key: YARN-3896 > URL: https://issues.apache.org/jira/browse/YARN-3896 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Labels: resourcemanager > Fix For: 2.7.2, 2.6.2 > > Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, > YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, > YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch > > > {noformat} > 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: > Resolved 10.208.132.153 to /default-rack > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Reconnect from the node at: 10.208.132.153 > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered > with capability: , assigned nodeId > 10.208.132.153:8041 > 2015-07-03 16:49:39,104 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far > behind rm response id:2506413 nm response id:0 > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating > Node 10.208.132.153:8041 as it is now REBOOTED > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED > {noformat} > The node(10.208.132.153) reconnected with RM. When it registered with RM, RM > set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's > heartbeat come before RM succeeded setting the id to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission
[ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948995#comment-14948995 ] Junping Du commented on YARN-3223: -- Sorry for coming late on this as just back from a long leave. Is your patch available for review? If so, can you click the button of "submit patch" to trigger Jenkins test against your patch? Also, please don't delete old/stale patches which could cause us to lose track of full history on patches/discussions. Thx! > Resource update during NM graceful decommission > --- > > Key: YARN-3223 > URL: https://issues.apache.org/jira/browse/YARN-3223 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: 2.7.1 >Reporter: Junping Du >Assignee: Brook Zhou > Attachments: YARN-3223-v0.patch > > > During NM graceful decommission, we should handle resource update properly, > include: make RMNode keep track of old resource for possible rollback, keep > available resource to 0 and used resource get updated when > container finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305
[ https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neelesh Srinivas Salian updated YARN-3996: -- Attachment: YARN-3996.002.patch Resolved the issues with the Capacity, FIFO and SLS schedulers. I am not sure how to approach the testing. Wrote a basic unit test for this at the moment. Trying to think how to make it more robust. Will update if I think of a sturdier approach. In the meantime, requesting some feedback on version 002 of the patch. Thank you. > YARN-789 (Support for zero capabilities in fairscheduler) is broken after > YARN-3305 > --- > > Key: YARN-3996 > URL: https://issues.apache.org/jira/browse/YARN-3996 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, fairscheduler >Reporter: Anubhav Dhoot >Assignee: Neelesh Srinivas Salian >Priority: Critical > Attachments: YARN-3996.001.patch, YARN-3996.002.patch, > YARN-3996.prelim.patch > > > RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest > with mininumResource for the incrementResource. This causes normalize to > return zero if minimum is set to zero as per YARN-789 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4236) Metric for aggregated resources allocation per queue
[ https://issues.apache.org/jira/browse/YARN-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4236: --- Attachment: YARN-4236.patch > Metric for aggregated resources allocation per queue > > > Key: YARN-4236 > URL: https://issues.apache.org/jira/browse/YARN-4236 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4236.patch, YARN-4236.patch > > > We currently track allocated memory and allocated vcores per queue but we > don't have a good rate metric on how fast we're allocating these things. In > other words, a straight line in allocatedmb could equally be one extreme of > no new containers are being allocated or allocating a bunch of containers > where we free exactly what we allocate each time. Adding a resources > allocated per second per queue would give us a better insight into the rate > of resource churn on a queue. Based on this aggregated resource allocation > per queue we can easily have some tools to measure the rate of resource > allocation per queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4237) Support additional queries for ATSv2 Web UI
[ https://issues.apache.org/jira/browse/YARN-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4237: --- Attachment: YARN-4237-YARN-2928.01.patch Fields would specify whether metrics for the flowruns will be returned or not. For a single flowrun, metrics will be returned. Maybe we can decide whether to send them or not on the basis of fields query param as well. > Support additional queries for ATSv2 Web UI > --- > > Key: YARN-4237 > URL: https://issues.apache.org/jira/browse/YARN-4237 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4237-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4238) createdTime is not reported while publishing entities to ATSv2
[ https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949064#comment-14949064 ] Varun Saxena commented on YARN-4238: Yes its there as part of the event inside entity. But its not being set explicitly in entity.setCreatedTime(). You are correct this was not being done in ATSv1 either. In ATSv1 though if start time was not sent for an entity, Leveldb implementation of timeline store will check the entity(if it does not exist) and parse through all the events. And the smallest one will be chosen as start time. And if that was not there, I think current system time was taken(will have to check on that one). Regardless, in ATSv2 we are neither setting created time in the client, nor do we have logic like ATSv1 in the HBase writer. So the end result is that created time is never updated in the backend. Either ways this has to be handled. Either from publishing side or the writer. I am not sure why the approach of fetching it from events was taken in ATSv1. As get and then put on every call can be expensive call from a HBase perspective, I think client can send it whenever it wants to. I do not see any issues around sending it from RM, NM,etc. from where entities are published. Otherwise we will have to check for specific events. I will check if there are any issues centering around sending it from client when I fix this. Now coming to what if client does not send it. This would be an issue if entities have to be returned sorted by created time or filtering on the basis of created time range has to be done. This can be explicitly stated for clients that if you do not report created time then we cannot guarantee order while fetching multiple entities. This will make it simple from an implementation viewpoint. If not, maybe we can cache it and check if entity has gone in to the backed or not and based on that, set created time. But in this case, issue is what if daemon(having the writer) goes down. Maybe we can store this info in a state store. But do we need to do that ? > createdTime is not reported while publishing entities to ATSv2 > -- > > Key: YARN-4238 > URL: https://issues.apache.org/jira/browse/YARN-4238 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Varun Saxena >Assignee: Varun Saxena > > While publishing entities from RM and elsewhere we are not sending created > time. For instance, created time in TimelineServiceV2Publisher class and for > other entities in other such similar classes is not updated. We can easily > update created time when sending application created event. Likewise for > modification time on every write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4241) Typo in yarn-default.xml
Anthony Rojas created YARN-4241: --- Summary: Typo in yarn-default.xml Key: YARN-4241 URL: https://issues.apache.org/jira/browse/YARN-4241 Project: Hadoop YARN Issue Type: Bug Components: documentation, yarn Reporter: Anthony Rojas Assignee: Anthony Rojas Priority: Trivial Typo in description section of yarn-default.xml, under the properties: yarn.nodemanager.disk-health-checker.min-healthy-disks yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb The reference to yarn-nodemanager.local-dirs should be yarn.nodemanager.local-dirs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4241) Typo in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Rojas updated YARN-4241: Attachment: YARN-4241.patch > Typo in yarn-default.xml > > > Key: YARN-4241 > URL: https://issues.apache.org/jira/browse/YARN-4241 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, yarn >Reporter: Anthony Rojas >Assignee: Anthony Rojas >Priority: Trivial > Labels: newbie > Attachments: YARN-4241.patch > > > Typo in description section of yarn-default.xml, under the properties: > yarn.nodemanager.disk-health-checker.min-healthy-disks > yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage > yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb > The reference to yarn-nodemanager.local-dirs should be > yarn.nodemanager.local-dirs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4237) Support additional queries for ATSv2 Web UI
[ https://issues.apache.org/jira/browse/YARN-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949200#comment-14949200 ] Hadoop QA commented on YARN-4237: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 52s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 5s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 15s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 16s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 53s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 2m 49s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 40m 54s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12765639/YARN-4237-YARN-2928.01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 5a3db96 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/9380/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9380/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9380/console | This message was automatically generated. > Support additional queries for ATSv2 Web UI > --- > > Key: YARN-4237 > URL: https://issues.apache.org/jira/browse/YARN-4237 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4237-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission
[ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949223#comment-14949223 ] Brook Zhou commented on YARN-3223: -- Ah okay, sorry about that, will do. It seems to be passing test-patch on my local trunk repo, so I will update with submit patch. > Resource update during NM graceful decommission > --- > > Key: YARN-3223 > URL: https://issues.apache.org/jira/browse/YARN-3223 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: 2.7.1 >Reporter: Junping Du >Assignee: Brook Zhou > Attachments: YARN-3223-v0.patch > > > During NM graceful decommission, we should handle resource update properly, > include: make RMNode keep track of old resource for possible rollback, keep > available resource to 0 and used resource get updated when > container finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4237) Support additional queries for ATSv2 Web UI
[ https://issues.apache.org/jira/browse/YARN-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949267#comment-14949267 ] Varun Saxena commented on YARN-4237: Or maybe mock the row key class. > Support additional queries for ATSv2 Web UI > --- > > Key: YARN-4237 > URL: https://issues.apache.org/jira/browse/YARN-4237 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4237-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305
[ https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949299#comment-14949299 ] Hadoop QA commented on YARN-3996: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 30s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 9m 6s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 47s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 21s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 18s | The applied patch generated 1 new checkstyle issues (total was 279, now 279). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 41s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 37s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | tools/hadoop tests | 0m 56s | Tests passed in hadoop-sls. | | {color:red}-1{color} | yarn tests | 58m 3s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 107m 2s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12765634/YARN-3996.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0841940 | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9379/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9379/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-sls test log | https://builds.apache.org/job/PreCommit-YARN-Build/9379/artifact/patchprocess/testrun_hadoop-sls.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9379/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9379/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9379/console | This message was automatically generated. > YARN-789 (Support for zero capabilities in fairscheduler) is broken after > YARN-3305 > --- > > Key: YARN-3996 > URL: https://issues.apache.org/jira/browse/YARN-3996 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, fairscheduler >Reporter: Anubhav Dhoot >Assignee: Neelesh Srinivas Salian >Priority: Critical > Attachments: YARN-3996.001.patch, YARN-3996.002.patch, > YARN-3996.prelim.patch > > > RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest > with mininumResource for the incrementResource. This causes normalize to > return zero if minimum is set to zero as per YARN-789 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4241) Typo in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949314#comment-14949314 ] Hadoop QA commented on YARN-4241: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 34s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 50s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 21s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 20s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 43s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 41s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | yarn tests | 2m 11s | Tests failed in hadoop-yarn-common. | | | | 42m 45s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.logaggregation.TestAggregatedLogsBlock | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12765653/YARN-4241.patch | | Optional Tests | javadoc javac unit | | git revision | trunk / 0841940 | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9382/artifact/patchprocess/patchReleaseAuditProblems.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/9382/artifact/patchprocess/whitespace.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9382/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9382/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9382/console | This message was automatically generated. > Typo in yarn-default.xml > > > Key: YARN-4241 > URL: https://issues.apache.org/jira/browse/YARN-4241 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, yarn >Reporter: Anthony Rojas >Assignee: Anthony Rojas >Priority: Trivial > Labels: newbie > Attachments: YARN-4241.patch > > > Typo in description section of yarn-default.xml, under the properties: > yarn.nodemanager.disk-health-checker.min-healthy-disks > yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage > yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb > The reference to yarn-nodemanager.local-dirs should be > yarn.nodemanager.local-dirs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4236) Metric for aggregated resources allocation per queue
[ https://issues.apache.org/jira/browse/YARN-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949322#comment-14949322 ] Hadoop QA commented on YARN-4236: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 26s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 58s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 34s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 20s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 49s | The applied patch generated 2 new checkstyle issues (total was 52, now 54). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 30s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 57m 1s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 97m 51s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12765637/YARN-4236.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0841940 | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9381/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9381/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9381/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9381/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9381/console | This message was automatically generated. > Metric for aggregated resources allocation per queue > > > Key: YARN-4236 > URL: https://issues.apache.org/jira/browse/YARN-4236 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4236.patch, YARN-4236.patch > > > We currently track allocated memory and allocated vcores per queue but we > don't have a good rate metric on how fast we're allocating these things. In > other words, a straight line in allocatedmb could equally be one extreme of > no new containers are being allocated or allocating a bunch of containers > where we free exactly what we allocate each time. Adding a resources > allocated per second per queue would give us a better insight into the rate > of resource churn on a queue. Based on this aggregated resource allocation > per queue we can easily have some tools to measure the rate of resource > allocation per queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API
[ https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949338#comment-14949338 ] Jonathan Eagles commented on YARN-4009: --- [~vvasudev], I'm trying hard to find a balance here. On the one hand I want to support backwards compatibility on the other hand I want configuration to be simple. I want to support way that I can enable only RM and timeline CORS support while only specifying the configuration once (not once for common CORS and once for timeline CORS). However, I want to support both the old configuration parameters. Proposals 1) If timeline CORS is enabled () we can have the timeline cors configuration override the common CORS if they are present otherwise use the common configuration. 2) Create a second timeline enabled flag that will only use the new CORS classes, configs and behavior. This will allow the the old way using the old configs with timeline prefix to work, but allow users to migrate to the new way to simplify configuration. What do you think? > CORS support for ResourceManager REST API > - > > Key: YARN-4009 > URL: https://issues.apache.org/jira/browse/YARN-4009 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Varun Vasudev > Attachments: YARN-4009.001.patch, YARN-4009.002.patch, > YARN-4009.003.patch, YARN-4009.004.patch, YARN-4009.005.patch, > YARN-4009.006.patch > > > Currently the REST API's do not have CORS support. This means any UI (running > in browser) cannot consume the REST API's. For ex Tez UI would like to use > the REST API for getting application, application attempt information exposed > by the API's. > It would be very useful if CORS is enabled for the REST API's. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4241) Typo in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Rojas updated YARN-4241: Attachment: YARN-4241.patch.1 Removed trailing whitespaces on lines 19 and 28. > Typo in yarn-default.xml > > > Key: YARN-4241 > URL: https://issues.apache.org/jira/browse/YARN-4241 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, yarn >Reporter: Anthony Rojas >Assignee: Anthony Rojas >Priority: Trivial > Labels: newbie > Attachments: YARN-4241.patch, YARN-4241.patch.1 > > > Typo in description section of yarn-default.xml, under the properties: > yarn.nodemanager.disk-health-checker.min-healthy-disks > yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage > yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb > The reference to yarn-nodemanager.local-dirs should be > yarn.nodemanager.local-dirs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4241) Typo in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949407#comment-14949407 ] Hadoop QA commented on YARN-4241: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | patch | 0m 0s | The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions. | | {color:blue}0{color} | pre-patch | 15m 41s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 13s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 37s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 19s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | yarn tests | 2m 1s | Tests passed in hadoop-yarn-common. | | | | 38m 57s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12765674/YARN-4241.patch.1 | | Optional Tests | javadoc javac unit | | git revision | trunk / 0841940 | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9384/artifact/patchprocess/patchReleaseAuditProblems.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9384/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9384/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9384/console | This message was automatically generated. > Typo in yarn-default.xml > > > Key: YARN-4241 > URL: https://issues.apache.org/jira/browse/YARN-4241 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, yarn >Reporter: Anthony Rojas >Assignee: Anthony Rojas >Priority: Trivial > Labels: newbie > Attachments: YARN-4241.patch, YARN-4241.patch.1 > > > Typo in description section of yarn-default.xml, under the properties: > yarn.nodemanager.disk-health-checker.min-healthy-disks > yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage > yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb > The reference to yarn-nodemanager.local-dirs should be > yarn.nodemanager.local-dirs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission
[ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949483#comment-14949483 ] Hadoop QA commented on YARN-3223: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 43s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 12 new or modified test files. | | {color:green}+1{color} | javac | 9m 13s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 45s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 20s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 17s | The applied patch generated 14 new checkstyle issues (total was 180, now 194). | | {color:red}-1{color} | whitespace | 0m 8s | The patch has 22 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 42s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 40s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | tools/hadoop tests | 0m 55s | Tests passed in hadoop-sls. | | {color:red}-1{color} | yarn tests | 57m 44s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 107m 10s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestClientRMService | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12764643/YARN-3223-v0.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0841940 | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9383/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9383/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/9383/artifact/patchprocess/whitespace.txt | | hadoop-sls test log | https://builds.apache.org/job/PreCommit-YARN-Build/9383/artifact/patchprocess/testrun_hadoop-sls.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9383/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9383/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9383/console | This message was automatically generated. > Resource update during NM graceful decommission > --- > > Key: YARN-3223 > URL: https://issues.apache.org/jira/browse/YARN-3223 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: 2.7.1 >Reporter: Junping Du >Assignee: Brook Zhou > Attachments: YARN-3223-v0.patch > > > During NM graceful decommission, we should handle resource update properly, > include: make RMNode keep track of old resource for possible rollback, keep > available resource to 0 and used resource get updated when > container finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949512#comment-14949512 ] Jason Lowe commented on YARN-3943: -- +1 lgtm. Committing this. > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3943.000.patch, YARN-3943.001.patch, > YARN-3943.002.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949536#comment-14949536 ] zhihai xu commented on YARN-3943: - Thanks [~jlowe] for the review and committing the patch, greatly appreciated! > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-3943.000.patch, YARN-3943.001.patch, > YARN-3943.002.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4207) Add a non-judgemental YARN app completion status
[ https://issues.apache.org/jira/browse/YARN-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949546#comment-14949546 ] Rich Haase commented on YARN-4207: -- This looks like a pretty trivial change. Adding an additional value to the o.a.h.yarn.records.FinalApplicationStatus enum. In a quick search I didn't see anything downstream within Hadoop that would be impacted by such a patch. If no one else is working on this JIRA and the approach I've described is acceptable I will put together a patch. > Add a non-judgemental YARN app completion status > > > Key: YARN-4207 > URL: https://issues.apache.org/jira/browse/YARN-4207 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Sergey Shelukhin > > For certain applications, it doesn't make sense to have SUCCEEDED or FAILED > end state. For example, Tez sessions may include multiple DAGs, some of which > have succeeded and some have failed; there's no clear status for the session > both logically and from user perspective (users are confused either way). > There needs to be a status not implying success or failure, such as > "done"/"ended"/"finished". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949554#comment-14949554 ] Hudson commented on YARN-3943: -- FAILURE: Integrated in Hadoop-trunk-Commit #8596 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8596/]) YARN-3943. Use separate threshold configurations for disk-full detection (jlowe: rev 8d226225d030253152494bda32708377ad0f7af7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-3943.000.patch, YARN-3943.001.patch, > YARN-3943.002.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition
[ https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949586#comment-14949586 ] Wangda Tan commented on YARN-4140: -- Thanks for update, [~bibinchundatt], patch looks good, pending Jenkins. > RM container allocation delayed incase of app submitted to Nodelabel partition > -- > > Key: YARN-4140 > URL: https://issues.apache.org/jira/browse/YARN-4140 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, > 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, > 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, > 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, > 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch > > > Trying to run application on Nodelabel partition I found that the > application execution time is delayed by 5 – 10 min for 500 containers . > Total 3 machines 2 machines were in same partition and app submitted to same. > After enabling debug was able to find the below > # From AM the container ask is for OFF-SWITCH > # RM allocating all containers to NODE_LOCAL as shown in logs below. > # So since I was having about 500 containers time taken was about – 6 minutes > to allocate 1st map after AM allocation. > # Tested with about 1K maps using PI job took 17 minutes to allocate next > container after AM allocation > Once 500 container allocation on NODE_LOCAL is done the next container > allocation is done on OFF_SWITCH > {code} > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: > /default-rack, Relax Locality: true, Node Label Expression: } > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: *, Relax > Locality: true, Node Label Expression: 3} > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: > host-10-19-92-143, Relax Locality: true, Node Label Expression: } > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: > host-10-19-92-117, Relax Locality: true, Node Label Expression: } > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > {code} > > {code} > 2015-09-09 14:35:45,467 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > 2015-09-09 14:35:45,831 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > 2015-09-09 14:35:46,469 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > 2015-09-09 14:35:46,832 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > {code} > {code} > dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1> > cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep > "root.b.b1" | wc -l > 500 > {code} > > (Consumes about 6 minutes) > -- This message was sent by A
[jira] [Created] (YARN-4242) add analyze command to explictly cache file metadata in HBase metastore
Sergey Shelukhin created YARN-4242: -- Summary: add analyze command to explictly cache file metadata in HBase metastore Key: YARN-4242 URL: https://issues.apache.org/jira/browse/YARN-4242 Project: Hadoop YARN Issue Type: Bug Reporter: Sergey Shelukhin ANALYZE TABLE (spec as usual) CACHE METADATA -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4242) add analyze command to explictly cache file metadata in HBase metastore
[ https://issues.apache.org/jira/browse/YARN-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved YARN-4242. Resolution: Invalid Wrong project > add analyze command to explictly cache file metadata in HBase metastore > --- > > Key: YARN-4242 > URL: https://issues.apache.org/jira/browse/YARN-4242 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sergey Shelukhin > > ANALYZE TABLE (spec as usual) CACHE METADATA -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers
[ https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949639#comment-14949639 ] MENG DING commented on YARN-1509: - Had an offline discussion with [~leftnoteasy] and [~bikassaha]. Overall we agreed that we can combine the separate increase/decrease requests into one API in the client: * Combine {{requestContainerResourceIncrease}} and {{requestContainerResourceDecrease}} into one API. For example: {code} /** * Request container resource change before calling allocate. * Any previous pending resource change request of the same container will be * cancelled. * * @param container The container returned from the last successful resource * allocation or resource change * @param capability The target resource capability of the container */ public abstract void requestContainerResourceChange( Container container, Resource capability); {code} User must pass in a container object (instead of just a container ID), and the target resource capability. Because the container object contains the existing container Resource, the AMRMClient can use that information to compare against the target resource to figure out if this is an increase or decrease request. * There is *NO* need to change the AMRM protocol. * For the CallbackHandler methods, we can also combine {{onContainersResourceDecreased}} and {{onContainersResourceIncreased}} into one API: {code} public abstract void onContainersResourceChanged( List containers); {code} The user can compare the passed-in containers with the containers they have remembered to determine if this is an increase or decrease request. Or maybe we can make it even simpler by doing something like the following? Thoughts? {code} public abstract void onContainersResourceChanged( List increasedContainers, List decreasedContainers); {code} * We can *deprecate* the existing CallbackHandler interface and use the AbstractCallbackHandler instead. [~bikassaha], [~leftnoteasy], any comments? > Make AMRMClient support send increase container request and get > increased/decreased containers > -- > > Key: YARN-1509 > URL: https://issues.apache.org/jira/browse/YARN-1509 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan (No longer used) >Assignee: MENG DING > Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, > YARN-1509.4.patch, YARN-1509.5.patch > > > As described in YARN-1197, we need add API in AMRMClient to support > 1) Add increase request > 2) Can get successfully increased/decreased containers from RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4207) Add a non-judgemental YARN app completion status
[ https://issues.apache.org/jira/browse/YARN-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949644#comment-14949644 ] Sergey Shelukhin commented on YARN-4207: It's unassigned, so I gather noone is working on it. This plan sounds good to me (non-binding :)) > Add a non-judgemental YARN app completion status > > > Key: YARN-4207 > URL: https://issues.apache.org/jira/browse/YARN-4207 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Sergey Shelukhin > > For certain applications, it doesn't make sense to have SUCCEEDED or FAILED > end state. For example, Tez sessions may include multiple DAGs, some of which > have succeeded and some have failed; there's no clear status for the session > both logically and from user perspective (users are confused either way). > There needs to be a status not implying success or failure, such as > "done"/"ended"/"finished". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4162) Scheduler info in REST, is currently not displaying partition specific queue information similar to UI
[ https://issues.apache.org/jira/browse/YARN-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949658#comment-14949658 ] Wangda Tan commented on YARN-4162: -- [~Naganarasimha], Thanks a lot for updating! Looked at patch and tried it locally, some minor comments: 1. UserInfo#getResources -> getResourceUsageInfo 2. CapacitySchedulerPage, renderQueueCapacityInfo can be removed? Is it equivalent if using renderQueueCapacityInfo(ri, lqinfo.get(DEFAULT_PARTITION)) instead? 3. Also, For {code} UL ul = html.ul("#pq"); for (CapacitySchedulerQueueInfo info : subQueues) { float used; float absCap; float absMaxCap; float absUsedCap; //... {code} Is it possible to use the same PartitionQueueCapacitiesInfo instead of check if csqinfo.label == null or not? 4. PartitionResourceUsageInfo.amResource -> amUsed 5. Why this isExclusiveNodeLabel check is needed? {code} if (!nodeLabel.equals(NodeLabel.DEFAULT_NODE_LABEL_PARTITION) && csqinfo.isExclusiveNodeLabel {code} 6. Could you update {{}} to {{DEFAULT_PARTITION}}? Since the {{< ... >}} could be a illegal attribute for some xml parser, and I'm not sure if it is a standard XML property. > Scheduler info in REST, is currently not displaying partition specific queue > information similar to UI > -- > > Key: YARN-4162 > URL: https://issues.apache.org/jira/browse/YARN-4162 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: YARN-4162.v1.001.patch, YARN-4162.v2.001.patch, > YARN-4162.v2.002.patch, YARN-4162.v2.003.patch, restAndJsonOutput.zip > > > When Node Labels are enabled then REST Scheduler Information should also > provide partition specific queue information similar to the existing Web UI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949710#comment-14949710 ] Hudson commented on YARN-3943: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #510 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/510/]) YARN-3943. Use separate threshold configurations for disk-full detection (jlowe: rev 8d226225d030253152494bda32708377ad0f7af7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-3943.000.patch, YARN-3943.001.patch, > YARN-3943.002.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition
[ https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949737#comment-14949737 ] Hadoop QA commented on YARN-4140: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 35s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 9m 0s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 40s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 21s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 23s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 46s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 55s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 29s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 63m 31s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 112m 23s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12764796/0014-YARN-4140.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 8d22622 | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9385/artifact/patchprocess/patchReleaseAuditProblems.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9385/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9385/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9385/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9385/console | This message was automatically generated. > RM container allocation delayed incase of app submitted to Nodelabel partition > -- > > Key: YARN-4140 > URL: https://issues.apache.org/jira/browse/YARN-4140 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, > 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, > 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, > 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, > 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch > > > Trying to run application on Nodelabel partition I found that the > application execution time is delayed by 5 – 10 min for 500 containers . > Total 3 machines 2 machines were in same partition and app submitted to same. > After enabling debug was able to find the below > # From AM the container ask is for OFF-SWITCH > # RM allocating all containers to NODE_LOCAL as shown in logs below. > # So since I was having about 500 containers time taken was about – 6 minutes > to allocate 1st map after AM allocation. > # Tested with about 1K maps using PI job took 17 minutes to allocate next > container after AM allocation > Once 500 container allocation on NODE_LOCAL is done the next container > allocation is done on OFF_SWITCH > {code} > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: > /default-rack, Relax Locality: true, Node Label Expression: } > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: *, Relax > Locality: true, Node Label Expression: 3} > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.Sc
[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition
[ https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949747#comment-14949747 ] Bibin A Chundatt commented on YARN-4140: Hi [~leftnoteasy] Thanks for looking into it.Release audit warning not related to current patch {noformat} /home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/util/tree.h Lines that start with ? in the release audit report indicate files that do not have an Apache license header. {noformat} > RM container allocation delayed incase of app submitted to Nodelabel partition > -- > > Key: YARN-4140 > URL: https://issues.apache.org/jira/browse/YARN-4140 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, > 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, > 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, > 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, > 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch > > > Trying to run application on Nodelabel partition I found that the > application execution time is delayed by 5 – 10 min for 500 containers . > Total 3 machines 2 machines were in same partition and app submitted to same. > After enabling debug was able to find the below > # From AM the container ask is for OFF-SWITCH > # RM allocating all containers to NODE_LOCAL as shown in logs below. > # So since I was having about 500 containers time taken was about – 6 minutes > to allocate 1st map after AM allocation. > # Tested with about 1K maps using PI job took 17 minutes to allocate next > container after AM allocation > Once 500 container allocation on NODE_LOCAL is done the next container > allocation is done on OFF_SWITCH > {code} > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: > /default-rack, Relax Locality: true, Node Label Expression: } > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: *, Relax > Locality: true, Node Label Expression: 3} > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: > host-10-19-92-143, Relax Locality: true, Node Label Expression: } > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: > host-10-19-92-117, Relax Locality: true, Node Label Expression: } > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > {code} > > {code} > 2015-09-09 14:35:45,467 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > 2015-09-09 14:35:45,831 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > 2015-09-09 14:35:46,469 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > 2015-09-09 14:35:46,832 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContaine
[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949751#comment-14949751 ] Hudson commented on YARN-3943: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1238 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1238/]) YARN-3943. Use separate threshold configurations for disk-full detection (jlowe: rev 8d226225d030253152494bda32708377ad0f7af7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-3943.000.patch, YARN-3943.001.patch, > YARN-3943.002.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949758#comment-14949758 ] Hudson commented on YARN-3943: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2445 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2445/]) YARN-3943. Use separate threshold configurations for disk-full detection (jlowe: rev 8d226225d030253152494bda32708377ad0f7af7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-3943.000.patch, YARN-3943.001.patch, > YARN-3943.002.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
Xuan Gong created YARN-4243: --- Summary: Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit Key: YARN-4243 URL: https://issues.apache.org/jira/browse/YARN-4243 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Right now, the RM would shut down if the zk connection is down when the RM do the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4243: Attachment: YARN-4243.1.patch > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949760#comment-14949760 ] Xuan Gong commented on YARN-4243: - Override the createConnection() in EmbeddedElectorService to add some retry, and create a Yarn Configuration for the maxAttempts because we have shared code (ActiveStandbyElector)and related configuration with HDFS ZKFC > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949770#comment-14949770 ] Hudson commented on YARN-3943: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #501 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/501/]) YARN-3943. Use separate threshold configurations for disk-full detection (jlowe: rev 8d226225d030253152494bda32708377ad0f7af7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-3943.000.patch, YARN-3943.001.patch, > YARN-3943.002.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4244) BlockPlacementPolicy related logs should contain the details about the filename and blockid
J.Andreina created YARN-4244: Summary: BlockPlacementPolicy related logs should contain the details about the filename and blockid Key: YARN-4244 URL: https://issues.apache.org/jira/browse/YARN-4244 Project: Hadoop YARN Issue Type: Improvement Reporter: J.Andreina Assignee: J.Andreina Currently the user will not get the details about which file/block , the BlockPlacementPolicy is not able to find a replica node , if there is a huge client write operation is going on. For example consider below failure message , which does'nt have details about file/block , which will be difficult to track later. {noformat} final String message = "Failed to place enough replicas, still in need of " + (totalReplicasExpected - results.size()) + " to reach " + totalReplicasExpected + " (unavailableStorages=" + unavailableStorages + ", storagePolicy=" + storagePolicy + ", newBlock=" + newBlock + ")"; String msg = "All required storage types are unavailable: " + " unavailableStorages=" + unavailableStorages + ", storagePolicy=" + storagePolicy.getName(); {noformat} It is better to provide the file/block information in the logs for better debugability . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4240) Add documentation for delegated-centralized node labels feature
[ https://issues.apache.org/jira/browse/YARN-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949848#comment-14949848 ] Naganarasimha G R commented on YARN-4240: - Hi [~dian.fu], Can you please wait, i am working on Distributed Node Labels documentatioin YARN-4100 and waiting for YARN-2729 to be checked in couple of days and once thats done i can push this doc jira and further on top of it, you can update for "Delegated-Centralized" > Add documentation for delegated-centralized node labels feature > --- > > Key: YARN-4240 > URL: https://issues.apache.org/jira/browse/YARN-4240 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Dian Fu >Assignee: Dian Fu > > As a follow up of YARN-3964, we should add documentation for > delegated-centralized node labels feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949849#comment-14949849 ] Hudson commented on YARN-3943: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #474 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/474/]) YARN-3943. Use separate threshold configurations for disk-full detection (jlowe: rev 8d226225d030253152494bda32708377ad0f7af7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-3943.000.patch, YARN-3943.001.patch, > YARN-3943.002.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4162) Scheduler info in REST, is currently not displaying partition specific queue information similar to UI
[ https://issues.apache.org/jira/browse/YARN-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949871#comment-14949871 ] Naganarasimha G R commented on YARN-4162: - Hi [~wangda], Thanks for the review comments and supporting with local testing. bq. Is it possible to use the same PartitionQueueCapacitiesInfo instead of check if csqinfo.label == null or not? well i can avoid the if block but csqinfo.label itself cannot be set to the default Partition as its also been used as flag to determine to show the leaf queue in the normal way or the partition way. bq. Why this isExclusiveNodeLabel check is needed? isExclusiveNodeLabel is the check we had earlier in CapacitySchedulerInfo.getQueues, basically to avoid displaying the queues which is not accessible to a given NodeLabelPartition. {code} 93 for (CSQueue queue : parentQueue.getChildQueues()) {92 for (CSQueue queue : parentQueue.getChildQueues()) { 94if (nodeLabel.getIsExclusive() 95&& !((AbstractCSQueue) queue).accessibleToPartition(nodeLabel 96.getLabelName())) { 97 // Skip displaying the hierarchy for the queues for which the exclusive 98 // labels are not accessible 99 continue; 100 } {code} bq. Could you update to DEFAULT_PARTITION? Well shall i update in all places displayed in UI or only in REST ? For other comments will get it updated in the next patch . > Scheduler info in REST, is currently not displaying partition specific queue > information similar to UI > -- > > Key: YARN-4162 > URL: https://issues.apache.org/jira/browse/YARN-4162 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: YARN-4162.v1.001.patch, YARN-4162.v2.001.patch, > YARN-4162.v2.002.patch, YARN-4162.v2.003.patch, restAndJsonOutput.zip > > > When Node Labels are enabled then REST Scheduler Information should also > provide partition specific queue information similar to the existing Web UI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4240) Add documentation for delegated-centralized node labels feature
[ https://issues.apache.org/jira/browse/YARN-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949881#comment-14949881 ] Dian Fu commented on YARN-4240: --- Hi [~Naganarasimha], Yes, of course. I will update documentation for "Delegated-Centralized" on top of YARN-4100 after it is committed. > Add documentation for delegated-centralized node labels feature > --- > > Key: YARN-4240 > URL: https://issues.apache.org/jira/browse/YARN-4240 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Dian Fu >Assignee: Dian Fu > > As a follow up of YARN-3964, we should add documentation for > delegated-centralized node labels feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4235) FairScheduler PrimaryGroup does not handle empty groups returned for a user
[ https://issues.apache.org/jira/browse/YARN-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-4235: Component/s: fairscheduler > FairScheduler PrimaryGroup does not handle empty groups returned for a user > > > Key: YARN-4235 > URL: https://issues.apache.org/jira/browse/YARN-4235 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.8.0 > > Attachments: YARN-4235.001.patch > > > We see NPE if empty groups are returned for a user. This causes a NPE and > cause RM to crash as below > {noformat} > 2015-09-22 16:51:52,780 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type APP_ADDED to the scheduler > java.lang.IndexOutOfBoundsException: Index: 0 > at java.util.Collections$EmptyList.get(Collections.java:3212) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$PrimaryGroup.getQueueForApp(QueuePlacementRule.java:149) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:689) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:595) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) > at java.lang.Thread.run(Thread.java:745) > 2015-09-22 16:51:52,797 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3286) Cleanup RMNode#ReconnectNodeTransition
[ https://issues.apache.org/jira/browse/YARN-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S resolved YARN-3286. - Resolution: Won't Fix As of now, this JIRA wont be fixing since it changes existing Non HA behavior. Closing as wont fix > Cleanup RMNode#ReconnectNodeTransition > -- > > Key: YARN-3286 > URL: https://issues.apache.org/jira/browse/YARN-3286 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0, 2.7.0 >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-3286.patch, YARN-3286-test-only.patch > > > RMNode#ReconnectNodeTransition has messed up for every ReconnectedEvent. This > part of the code can be clean up where we do not require to remove node and > add new node every time. > Supporting to above point, in the issue discussion YARN-3222 mentioned in the > comment > [link1|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14339799&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14339799] > and > [link2|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14344739&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14344739] > Clean up can do the following things > # It always remove an old node and add a new node. This is not really > required, instead old node can be updated with new values. > # RMNode#totalCapability has stale capability after NM is reconnected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3753) RM failed to come up with "java.io.IOException: Wait for ZKClient creation timed out"
[ https://issues.apache.org/jira/browse/YARN-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949911#comment-14949911 ] Sumit Nigam commented on YARN-3753: --- I had a question. Do I need to explicitly set some yarn-site parameter to control runWithRetries in such a case? If so, which parameter needs to be set? > RM failed to come up with "java.io.IOException: Wait for ZKClient creation > timed out" > - > > Key: YARN-3753 > URL: https://issues.apache.org/jira/browse/YARN-3753 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Fix For: 2.7.1 > > Attachments: YARN-3753.1.patch, YARN-3753.2.patch, YARN-3753.patch > > > RM failed to come up with the following error while submitting an mapreduce > job. > {code:title=RM log} > 015-05-30 03:40:12,190 ERROR recovery.RMStateStore > (RMStateStore.java:transition(179)) - Error storing app: > application_1432956515242_0006 > java.io.IOException: Wait for ZKClient creation timed out > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1098) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:609) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:160) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) > at java.lang.Thread.run(Thread.java:745) > 2015-05-30 03:40:12,194 FATAL resourcemanager.ResourceManager > (ResourceManager.java:handle(750)) - Received a > org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type > STATE_STORE_OP_FAILED. Cause: > java.io.IOException: Wait for ZKClient creation timed out > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1098) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:609) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:160) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTrans
[jira] [Commented] (YARN-4235) FairScheduler PrimaryGroup does not handle empty groups returned for a user
[ https://issues.apache.org/jira/browse/YARN-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949912#comment-14949912 ] Hudson commented on YARN-4235: -- FAILURE: Integrated in Hadoop-trunk-Commit #8599 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8599/]) YARN-4235. FairScheduler PrimaryGroup does not handle empty groups (rohithsharmaks: rev 8f195387a4a4a5a278119bf4c2f15cad61f0e2c7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueuePlacementPolicy.java > FairScheduler PrimaryGroup does not handle empty groups returned for a user > > > Key: YARN-4235 > URL: https://issues.apache.org/jira/browse/YARN-4235 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.8.0 > > Attachments: YARN-4235.001.patch > > > We see NPE if empty groups are returned for a user. This causes a NPE and > cause RM to crash as below > {noformat} > 2015-09-22 16:51:52,780 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type APP_ADDED to the scheduler > java.lang.IndexOutOfBoundsException: Index: 0 > at java.util.Collections$EmptyList.get(Collections.java:3212) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$PrimaryGroup.getQueueForApp(QueuePlacementRule.java:149) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:689) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:595) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) > at java.lang.Thread.run(Thread.java:745) > 2015-09-22 16:51:52,797 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4201) AMBlacklist does not work for minicluster
[ https://issues.apache.org/jira/browse/YARN-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong updated YARN-4201: --- Attachment: YARN-4201.002.patch > AMBlacklist does not work for minicluster > - > > Key: YARN-4201 > URL: https://issues.apache.org/jira/browse/YARN-4201 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-4021.001.patch, YARN-4201.002.patch > > > For minicluster (scheduler.include-port-in-node-name is set to TRUE), > AMBlacklist does not work. It is because RM just puts host to AMBlacklist > whether scheduler.include-port-in-node-name is set or not. In fact RM should > put "host + port" to AMBlacklist when it is set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949921#comment-14949921 ] Hadoop QA commented on YARN-4243: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 22m 44s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 59s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 53s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 22s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 3m 1s | The applied patch generated 2 new checkstyle issues (total was 211, now 212). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 52s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 41s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 36s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 19m 18s | Tests failed in hadoop-common. | | {color:red}-1{color} | yarn tests | 0m 24s | Tests failed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 62m 59s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 138m 8s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.conf.TestYarnConfigurationFields | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | | Timed out tests | org.apache.hadoop.http.TestHttpServerLifecycle | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12765723/YARN-4243.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / e1bf8b3 | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9386/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9386/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9386/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9386/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9386/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9386/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9386/console | This message was automatically generated. > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4201) AMBlacklist does not work for minicluster
[ https://issues.apache.org/jira/browse/YARN-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949923#comment-14949923 ] Jun Gong commented on YARN-4201: Thanks [~zxu] for the review and very valuable suggestion. The code is more clean now. Attach a new patch to address your comment. > AMBlacklist does not work for minicluster > - > > Key: YARN-4201 > URL: https://issues.apache.org/jira/browse/YARN-4201 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-4021.001.patch, YARN-4201.002.patch > > > For minicluster (scheduler.include-port-in-node-name is set to TRUE), > AMBlacklist does not work. It is because RM just puts host to AMBlacklist > whether scheduler.include-port-in-node-name is set or not. In fact RM should > put "host + port" to AMBlacklist when it is set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949936#comment-14949936 ] Hudson commented on YARN-3943: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2412 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2412/]) YARN-3943. Use separate threshold configurations for disk-full detection (jlowe: rev 8d226225d030253152494bda32708377ad0f7af7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-3943.000.patch, YARN-3943.001.patch, > YARN-3943.002.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side
[ https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949941#comment-14949941 ] Devaraj K commented on YARN-3964: - Thanks [~dian.fu] for the updated patch. Latest patch looks good to me. I will commit it tomorrow if there are no further comments/objections. > Support NodeLabelsProvider at Resource Manager side > --- > > Key: YARN-3964 > URL: https://issues.apache.org/jira/browse/YARN-3964 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Dian Fu >Assignee: Dian Fu > Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, > YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, > YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, > YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, > YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, > YARN-3964.014.patch, YARN-3964.015.patch, YARN-3964.016.patch, > YARN-3964.1.patch > > > Currently, CLI/REST API is provided in Resource Manager to allow users to > specify labels for nodes. For labels which may change over time, users will > have to start a cron job to update the labels. This has the following > limitations: > - The cron job needs to be run in the YARN admin user. > - This makes it a little complicate to maintain as users will have to make > sure this service/daemon is alive. > Adding a Node Labels Provider in Resource Manager will provide user more > flexibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4235) FairScheduler PrimaryGroup does not handle empty groups returned for a user
[ https://issues.apache.org/jira/browse/YARN-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949949#comment-14949949 ] Hudson commented on YARN-4235: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #513 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/513/]) YARN-4235. FairScheduler PrimaryGroup does not handle empty groups (rohithsharmaks: rev 8f195387a4a4a5a278119bf4c2f15cad61f0e2c7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueuePlacementPolicy.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java > FairScheduler PrimaryGroup does not handle empty groups returned for a user > > > Key: YARN-4235 > URL: https://issues.apache.org/jira/browse/YARN-4235 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.8.0 > > Attachments: YARN-4235.001.patch > > > We see NPE if empty groups are returned for a user. This causes a NPE and > cause RM to crash as below > {noformat} > 2015-09-22 16:51:52,780 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type APP_ADDED to the scheduler > java.lang.IndexOutOfBoundsException: Index: 0 > at java.util.Collections$EmptyList.get(Collections.java:3212) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$PrimaryGroup.getQueueForApp(QueuePlacementRule.java:149) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:689) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:595) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) > at java.lang.Thread.run(Thread.java:745) > 2015-09-22 16:51:52,797 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4238) createdTime is not reported while publishing entities to ATSv2
[ https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949959#comment-14949959 ] Varun Saxena commented on YARN-4238: [~sjlee0], [~djp], thoughts on this ? > createdTime is not reported while publishing entities to ATSv2 > -- > > Key: YARN-4238 > URL: https://issues.apache.org/jira/browse/YARN-4238 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Varun Saxena >Assignee: Varun Saxena > > While publishing entities from RM and elsewhere we are not sending created > time. For instance, created time in TimelineServiceV2Publisher class and for > other entities in other such similar classes is not updated. We can easily > update created time when sending application created event. Likewise for > modification time on every write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4235) FairScheduler PrimaryGroup does not handle empty groups returned for a user
[ https://issues.apache.org/jira/browse/YARN-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949986#comment-14949986 ] Hudson commented on YARN-4235: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1240 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1240/]) YARN-4235. FairScheduler PrimaryGroup does not handle empty groups (rohithsharmaks: rev 8f195387a4a4a5a278119bf4c2f15cad61f0e2c7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueuePlacementPolicy.java * hadoop-yarn-project/CHANGES.txt > FairScheduler PrimaryGroup does not handle empty groups returned for a user > > > Key: YARN-4235 > URL: https://issues.apache.org/jira/browse/YARN-4235 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.8.0 > > Attachments: YARN-4235.001.patch > > > We see NPE if empty groups are returned for a user. This causes a NPE and > cause RM to crash as below > {noformat} > 2015-09-22 16:51:52,780 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type APP_ADDED to the scheduler > java.lang.IndexOutOfBoundsException: Index: 0 > at java.util.Collections$EmptyList.get(Collections.java:3212) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$PrimaryGroup.getQueueForApp(QueuePlacementRule.java:149) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:689) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:595) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) > at java.lang.Thread.run(Thread.java:745) > 2015-09-22 16:51:52,797 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4017) container-executor overuses PATH_MAX
[ https://issues.apache.org/jira/browse/YARN-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949987#comment-14949987 ] Sidharta Seethana commented on YARN-4017: - It seems to me that using a defined value of 4096 should suffice. I'll upload a patch shortly. > container-executor overuses PATH_MAX > > > Key: YARN-4017 > URL: https://issues.apache.org/jira/browse/YARN-4017 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer > > Lots of places in container-executor are now using PATH_MAX, which is simply > too small on a lot of platforms. We should use a larger buffer size and be > done with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4017) container-executor overuses PATH_MAX
[ https://issues.apache.org/jira/browse/YARN-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sidharta Seethana reassigned YARN-4017: --- Assignee: Sidharta Seethana > container-executor overuses PATH_MAX > > > Key: YARN-4017 > URL: https://issues.apache.org/jira/browse/YARN-4017 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Assignee: Sidharta Seethana > > Lots of places in container-executor are now using PATH_MAX, which is simply > too small on a lot of platforms. We should use a larger buffer size and be > done with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4017) container-executor overuses PATH_MAX
[ https://issues.apache.org/jira/browse/YARN-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sidharta Seethana updated YARN-4017: Attachment: YARN-4017.001.patch uploading a patch with changes to container-executor to remove use to PATH_MAX . > container-executor overuses PATH_MAX > > > Key: YARN-4017 > URL: https://issues.apache.org/jira/browse/YARN-4017 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Assignee: Sidharta Seethana > Attachments: YARN-4017.001.patch > > > Lots of places in container-executor are now using PATH_MAX, which is simply > too small on a lot of platforms. We should use a larger buffer size and be > done with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4201) AMBlacklist does not work for minicluster
[ https://issues.apache.org/jira/browse/YARN-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949991#comment-14949991 ] zhihai xu commented on YARN-4201: - Thanks for the new patch [~hex108], I think it will be better to check {{scheduler.getSchedulerNode(nodeId)}} not null to avoid NPE. If {{scheduler.getSchedulerNode(nodeId)}} return null, it means the blacklisted node is just removed from scheduler, I think it will be ok to not add a removed node to black List. > AMBlacklist does not work for minicluster > - > > Key: YARN-4201 > URL: https://issues.apache.org/jira/browse/YARN-4201 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-4021.001.patch, YARN-4201.002.patch > > > For minicluster (scheduler.include-port-in-node-name is set to TRUE), > AMBlacklist does not work. It is because RM just puts host to AMBlacklist > whether scheduler.include-port-in-node-name is set or not. In fact RM should > put "host + port" to AMBlacklist when it is set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)