[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side
[ https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952188#comment-14952188 ] Hudson commented on YARN-3964: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1247 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1247/]) YARN-3964. Support NodeLabelsProvider at Resource Manager side. (devaraj: rev db9304788187c700647c4d84caeb3b5ad6d868d8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsMappingProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMDelegatedNodeLabelsUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMDelegatedNodeLabelsUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/NodeLabelsUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java > Support NodeLabelsProvider at Resource Manager side > --- > > Key: YARN-3964 > URL: https://issues.apache.org/jira/browse/YARN-3964 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Dian Fu >Assignee: Dian Fu > Fix For: 2.8.0 > > Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, > YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, > YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, > YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, > YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, > YARN-3964.014.patch, YARN-3964.015.patch, YARN-3964.016.patch, > YARN-3964.1.patch > > > Currently, CLI/REST API is provided in Resource Manager to allow users to > specify labels for nodes. For labels which may change over time, users will > have to start a cron job to update the labels. This has the following > limitations: > - The cron job needs to be run in the YARN admin user. > - This makes it a little complicate to maintain as users will have to make > sure this service/daemon is alive. > Adding a Node Labels Provider in Resource Manager will provide user more > flexibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side
[ https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952189#comment-14952189 ] Hudson commented on YARN-3964: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2456 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2456/]) YARN-3964. Support NodeLabelsProvider at Resource Manager side. (devaraj: rev db9304788187c700647c4d84caeb3b5ad6d868d8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/NodeLabelsUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMDelegatedNodeLabelsUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMDelegatedNodeLabelsUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsMappingProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml > Support NodeLabelsProvider at Resource Manager side > --- > > Key: YARN-3964 > URL: https://issues.apache.org/jira/browse/YARN-3964 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Dian Fu >Assignee: Dian Fu > Fix For: 2.8.0 > > Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, > YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, > YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, > YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, > YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, > YARN-3964.014.patch, YARN-3964.015.patch, YARN-3964.016.patch, > YARN-3964.1.patch > > > Currently, CLI/REST API is provided in Resource Manager to allow users to > specify labels for nodes. For labels which may change over time, users will > have to start a cron job to update the labels. This has the following > limitations: > - The cron job needs to be run in the YARN admin user. > - This makes it a little complicate to maintain as users will have to make > sure this service/daemon is alive. > Adding a Node Labels Provider in Resource Manager will provide user more > flexibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4250) NPE in AppSchedulingInfo#isRequestLabelChanged
[ https://issues.apache.org/jira/browse/YARN-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952196#comment-14952196 ] Brahma Reddy Battula commented on YARN-4250: [~bibinchundatt] Thanks for taking a look into this..When reqCopy is created the label expression is not copied, that's correct.After YARN-4140, we need add {{req.getNodeLabelExpression()}} for {{resourcerequest}}. Actually {{TestAMRMClientOnRMRestart}} and {{TestRMContainerAllocator}} both are failing,hence I fixed like that..May be I will raise seperate issue for {{TestRMContainerAllocator}} and add {{req.getNodeLabelExpression()}}. cc to [~leftnoteasy] > NPE in AppSchedulingInfo#isRequestLabelChanged > -- > > Key: YARN-4250 > URL: https://issues.apache.org/jira/browse/YARN-4250 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Critical > Attachments: YARN-4250.patch > > > *Trace* > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.isRequestLabelChanged(AppSchedulingInfo.java:420) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:342) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:300) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.allocate(FifoScheduler.java:350) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyFifoScheduler.allocate(TestAMRMClientOnRMRestart.java:544) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:507) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:277) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientResendsRequestsOnRMRestart(TestAMRMClientOnRMRestart.java:187) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4250) NPE in AppSchedulingInfo#isRequestLabelChanged
[ https://issues.apache.org/jira/browse/YARN-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-4250: --- Attachment: YARN-4250-002.patch > NPE in AppSchedulingInfo#isRequestLabelChanged > -- > > Key: YARN-4250 > URL: https://issues.apache.org/jira/browse/YARN-4250 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Critical > Attachments: YARN-4250-002.patch, YARN-4250.patch > > > *Trace* > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.isRequestLabelChanged(AppSchedulingInfo.java:420) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:342) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:300) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.allocate(FifoScheduler.java:350) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyFifoScheduler.allocate(TestAMRMClientOnRMRestart.java:544) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:507) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:277) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientResendsRequestsOnRMRestart(TestAMRMClientOnRMRestart.java:187) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4250) NPE in AppSchedulingInfo#isRequestLabelChanged
[ https://issues.apache.org/jira/browse/YARN-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952212#comment-14952212 ] Hadoop QA commented on YARN-4250: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 6m 13s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 40s | There were no new javac warning messages. | | {color:red}-1{color} | release audit | 0m 17s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 29s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 26s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 49s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 7m 0s | Tests passed in hadoop-yarn-client. | | | | 24m 30s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12766015/YARN-4250-002.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / db93047 | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9402/artifact/patchprocess/patchReleaseAuditProblems.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/9402/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9402/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9402/console | This message was automatically generated. > NPE in AppSchedulingInfo#isRequestLabelChanged > -- > > Key: YARN-4250 > URL: https://issues.apache.org/jira/browse/YARN-4250 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Critical > Attachments: YARN-4250-002.patch, YARN-4250.patch > > > *Trace* > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.isRequestLabelChanged(AppSchedulingInfo.java:420) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:342) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:300) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.allocate(FifoScheduler.java:350) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyFifoScheduler.allocate(TestAMRMClientOnRMRestart.java:544) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:507) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:277) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientResendsRequestsOnRMRestart(TestAMRMClientOnRMRestart.java:187) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952216#comment-14952216 ] YukunTsang commented on YARN-1021: -- Hey Wei Yan, I tried to run this simulator on my YARN cluster, but some problem occured when I am running the simulator using command "bin/slsrun.sh --input-rumen=sample-data/2jobs2min-rumen-jh.json --output-dir=output/" The logs I get from the cmd are as follows: Exception in thread "main" java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134) at org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:398) at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:250) at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:145) at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:528) Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:126) ... 4 more What should I do to get rid of this problem? P.S. I used the default configuration file > Yarn Scheduler Load Simulator > - > > Key: YARN-1021 > URL: https://issues.apache.org/jira/browse/YARN-1021 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Wei Yan >Assignee: Wei Yan > Fix For: 2.3.0 > > Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf > > > The Yarn Scheduler is a fertile area of interest with different > implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, > several optimizations are also made to improve scheduler performance for > different scenarios and workload. Each scheduler algorithm has its own set of > features, and drives scheduling decisions by many factors, such as fairness, > capacity guarantee, resource availability, etc. It is very important to > evaluate a scheduler algorithm very well before we deploy it in a production > cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling > algorithm. Evaluating in a real cluster is always time and cost consuming, > and it is also very hard to find a large-enough cluster. Hence, a simulator > which can predict how well a scheduler algorithm for some specific workload > would be quite useful. > We want to build a Scheduler Load Simulator to simulate large-scale Yarn > clusters and application loads in a single machine. This would be invaluable > in furthering Yarn by providing a tool for researchers and developers to > prototype new scheduler features and predict their behavior and performance > with reasonable amount of confidence, there-by aiding rapid innovation. > The simulator will exercise the real Yarn ResourceManager removing the > network factor by simulating NodeManagers and ApplicationMasters via handling > and dispatching NM/AMs heartbeat events from within the same JVM. > To keep tracking of scheduler behavior and performance, a scheduler wrapper > will wrap the real scheduler. > The simulator will produce real time metrics while executing, including: > * Resource usages for whole cluster and each queue, which can be utilized to > configure cluster and queue's capacity. > * The detailed application execution trace (recorded in relation to simulated > time), which can be analyzed to understand/validate the scheduler behavior > (individual jobs turn around time, throughput, fairness, capacity guarantee, > etc). > * Several key metrics of scheduler algorithm, such as time cost of each > scheduler operation (allocate, handle, etc), which can be utilized by Hadoop > developers to find the code spots and scalability limits. > The simulator will provide real time charts showing the behavior of the > scheduler and its performance. > A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing > how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side
[ https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952227#comment-14952227 ] Hudson commented on YARN-3964: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #482 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/482/]) YARN-3964. Support NodeLabelsProvider at Resource Manager side. (devaraj: rev db9304788187c700647c4d84caeb3b5ad6d868d8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMDelegatedNodeLabelsUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMDelegatedNodeLabelsUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsMappingProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/NodeLabelsUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java > Support NodeLabelsProvider at Resource Manager side > --- > > Key: YARN-3964 > URL: https://issues.apache.org/jira/browse/YARN-3964 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Dian Fu >Assignee: Dian Fu > Fix For: 2.8.0 > > Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, > YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, > YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, > YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, > YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, > YARN-3964.014.patch, YARN-3964.015.patch, YARN-3964.016.patch, > YARN-3964.1.patch > > > Currently, CLI/REST API is provided in Resource Manager to allow users to > specify labels for nodes. For labels which may change over time, users will > have to start a cron job to update the labels. This has the following > limitations: > - The cron job needs to be run in the YARN admin user. > - This makes it a little complicate to maintain as users will have to make > sure this service/daemon is alive. > Adding a Node Labels Provider in Resource Manager will provide user more > flexibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side
[ https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952246#comment-14952246 ] Hudson commented on YARN-3964: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2420 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2420/]) YARN-3964. Support NodeLabelsProvider at Resource Manager side. (devaraj: rev db9304788187c700647c4d84caeb3b5ad6d868d8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsMappingProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMDelegatedNodeLabelsUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMDelegatedNodeLabelsUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/NodeLabelsUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java * hadoop-yarn-project/CHANGES.txt > Support NodeLabelsProvider at Resource Manager side > --- > > Key: YARN-3964 > URL: https://issues.apache.org/jira/browse/YARN-3964 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Dian Fu >Assignee: Dian Fu > Fix For: 2.8.0 > > Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, > YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, > YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, > YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, > YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, > YARN-3964.014.patch, YARN-3964.015.patch, YARN-3964.016.patch, > YARN-3964.1.patch > > > Currently, CLI/REST API is provided in Resource Manager to allow users to > specify labels for nodes. For labels which may change over time, users will > have to start a cron job to update the labels. This has the following > limitations: > - The cron job needs to be run in the YARN admin user. > - This makes it a little complicate to maintain as users will have to make > sure this service/daemon is alive. > Adding a Node Labels Provider in Resource Manager will provide user more > flexibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4247) Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing events
[ https://issues.apache.org/jira/browse/YARN-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952328#comment-14952328 ] zhihai xu commented on YARN-4247: - [~adhoot], thanks for working on this issue, Is this issue fixed by YARN-3361? YARN-3361 removed {{readLock}} from {{RMAppAttemptImpl #getMasterContainer}}. > Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing > events > - > > Key: YARN-4247 > URL: https://issues.apache.org/jira/browse/YARN-4247 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Blocker > Attachments: YARN-4247.001.patch, YARN-4247.001.patch > > > We see this deadlock in our testing where events do not get processed and we > see this in the logs before the RM dies of OOM {noformat} 2015-10-08 > 04:48:01,918 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of > event-queue is 1488000 2015-10-08 04:48:01,918 INFO > org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 1488000 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
[ https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3446: Attachment: YARN-3446.003.patch > FairScheduler HeadRoom calculation should exclude nodes in the blacklist. > - > > Key: YARN-3446 > URL: https://issues.apache.org/jira/browse/YARN-3446 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-3446.000.patch, YARN-3446.001.patch, > YARN-3446.002.patch, YARN-3446.003.patch > > > FairScheduler HeadRoom calculation should exclude nodes in the blacklist. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes. This makes jobs to > hang forever(ResourceManager does not assign any new containers on > blacklisted nodes but availableResource AM get from RM includes blacklisted > nodes available resource). > This issue is similar as YARN-1680 which is for Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
[ https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3446: Attachment: (was: YARN-3446.003.patch) > FairScheduler HeadRoom calculation should exclude nodes in the blacklist. > - > > Key: YARN-3446 > URL: https://issues.apache.org/jira/browse/YARN-3446 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-3446.000.patch, YARN-3446.001.patch, > YARN-3446.002.patch, YARN-3446.003.patch > > > FairScheduler HeadRoom calculation should exclude nodes in the blacklist. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes. This makes jobs to > hang forever(ResourceManager does not assign any new containers on > blacklisted nodes but availableResource AM get from RM includes blacklisted > nodes available resource). > This issue is similar as YARN-1680 which is for Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3216) Max-AM-Resource-Percentage should respect node labels
[ https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3216: -- Attachment: 0005-YARN-3216.patch Hi [~leftnoteasy] Thank you very much for the comments. Updating patch addressing the comments. bq.Instead of adding AM-used-resource to parentQueue, I think we may only need to calculate AM-used-resource on LeafQueueu and user Yes, I understood your point. I have kept the changes now only to LeafQueue. bq.If you agree, could you merge the am-limit computation logic of default partition and specific partition? Yes. It will be better if we have a default am-limit per-queue when label-am-limit is not present. Agreeing your point on easiness in configuration and to avoid extra *if* checks. One more point. As per YARN-3265, you have introduced {{queueResourceLimitsInfo.getQueueCurrentLimit()}} instead of {{queueHeadroomInfo.getQueueMaxCap()}} and this is used in old {{getAMResourceLimit}} to see the max-capacity per-queue. now {{queueResourceLimitsInfo.getQueueCurrentLimit()}} is common per queue. And a queue may have 2 or 3 accessible-labels. So I feel I may not be able to use this total value always like in {{getAMResourceLimit}}. Hence I think I need to calculate max-capacity based on label-percentage per-queue. How do you feel? > Max-AM-Resource-Percentage should respect node labels > - > > Key: YARN-3216 > URL: https://issues.apache.org/jira/browse/YARN-3216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, > 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch > > > Currently, max-am-resource-percentage considers default_partition only. When > a queue can access multiple partitions, we should be able to compute > max-am-resource-percentage based on that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4250) NPE in AppSchedulingInfo#isRequestLabelChanged
[ https://issues.apache.org/jira/browse/YARN-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952347#comment-14952347 ] Wangda Tan commented on YARN-4250: -- Thanks for looking at this [~bibinchundatt]/[~brahmareddy]. I think we need to fix AppSchedulingInfo side, consider null node-label-expression at {{isRequestLabelChanged}}. I thought ApplicationMasterService normalizes expression, but it cannot avoid a customized scheduler copy and modify resource request before inserting to AppSchedulingInfo. [~bibinchundatt] you are right :). Comment on the patch is: - Update {{isRequestLabelChanged} - Revert changes of {{TestAMRMClientOnRMRestart}} Thoughts? > NPE in AppSchedulingInfo#isRequestLabelChanged > -- > > Key: YARN-4250 > URL: https://issues.apache.org/jira/browse/YARN-4250 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Critical > Attachments: YARN-4250-002.patch, YARN-4250.patch > > > *Trace* > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.isRequestLabelChanged(AppSchedulingInfo.java:420) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:342) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:300) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.allocate(FifoScheduler.java:350) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyFifoScheduler.allocate(TestAMRMClientOnRMRestart.java:544) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:507) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:277) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientResendsRequestsOnRMRestart(TestAMRMClientOnRMRestart.java:187) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4250) NPE in AppSchedulingInfo#isRequestLabelChanged
[ https://issues.apache.org/jira/browse/YARN-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952349#comment-14952349 ] Sunil G commented on YARN-4250: --- Yes [~leftnoteasy]. I also looked this part, and +1 for the approach suggested by Wangda. We need the change in AppSchedulingInfo. > NPE in AppSchedulingInfo#isRequestLabelChanged > -- > > Key: YARN-4250 > URL: https://issues.apache.org/jira/browse/YARN-4250 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Critical > Attachments: YARN-4250-002.patch, YARN-4250.patch > > > *Trace* > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.isRequestLabelChanged(AppSchedulingInfo.java:420) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:342) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:300) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.allocate(FifoScheduler.java:350) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyFifoScheduler.allocate(TestAMRMClientOnRMRestart.java:544) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:507) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:277) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientResendsRequestsOnRMRestart(TestAMRMClientOnRMRestart.java:187) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
[ https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952376#comment-14952376 ] Hadoop QA commented on YARN-3446: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 57s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 8m 20s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 26s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 20s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 48s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 29s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 61m 11s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 102m 41s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12766024/YARN-3446.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / db93047 | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9403/artifact/patchprocess/patchReleaseAuditProblems.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9403/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9403/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9403/console | This message was automatically generated. > FairScheduler HeadRoom calculation should exclude nodes in the blacklist. > - > > Key: YARN-3446 > URL: https://issues.apache.org/jira/browse/YARN-3446 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-3446.000.patch, YARN-3446.001.patch, > YARN-3446.002.patch, YARN-3446.003.patch > > > FairScheduler HeadRoom calculation should exclude nodes in the blacklist. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes. This makes jobs to > hang forever(ResourceManager does not assign any new containers on > blacklisted nodes but availableResource AM get from RM includes blacklisted > nodes available resource). > This issue is similar as YARN-1680 which is for Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels
[ https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952383#comment-14952383 ] Hadoop QA commented on YARN-3216: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 41s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 5 new or modified test files. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 10s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 19s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 49s | The applied patch generated 7 new checkstyle issues (total was 191, now 177). | | {color:red}-1{color} | whitespace | 0m 13s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 27s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 57m 9s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 96m 40s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12766026/0005-YARN-3216.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / db93047 | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9404/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9404/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/9404/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9404/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9404/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9404/console | This message was automatically generated. > Max-AM-Resource-Percentage should respect node labels > - > > Key: YARN-3216 > URL: https://issues.apache.org/jira/browse/YARN-3216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, > 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch > > > Currently, max-am-resource-percentage considers default_partition only. When > a queue can access multiple partitions, we should be able to compute > max-am-resource-percentage based on that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4017) container-executor overuses PATH_MAX
[ https://issues.apache.org/jira/browse/YARN-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952505#comment-14952505 ] Varun Vasudev commented on YARN-4017: - +1. I'll commit this tomorrow if no one objects. > container-executor overuses PATH_MAX > > > Key: YARN-4017 > URL: https://issues.apache.org/jira/browse/YARN-4017 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.8.0 >Reporter: Allen Wittenauer >Assignee: Sidharta Seethana > Attachments: YARN-4017.001.patch > > > Lots of places in container-executor are now using PATH_MAX, which is simply > too small on a lot of platforms. We should use a larger buffer size and be > done with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4250) NPE in AppSchedulingInfo#isRequestLabelChanged
[ https://issues.apache.org/jira/browse/YARN-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952541#comment-14952541 ] Brahma Reddy Battula commented on YARN-4250: [~leftnoteasy] and [~sunilg] thanks for your comments...I too agree change in {{AppSchedulingInfo#isRequestLabelChanged}}.can you please look at first patch ( i.e YARN-4250.patch). > NPE in AppSchedulingInfo#isRequestLabelChanged > -- > > Key: YARN-4250 > URL: https://issues.apache.org/jira/browse/YARN-4250 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Critical > Attachments: YARN-4250-002.patch, YARN-4250.patch > > > *Trace* > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.isRequestLabelChanged(AppSchedulingInfo.java:420) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:342) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:300) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.allocate(FifoScheduler.java:350) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyFifoScheduler.allocate(TestAMRMClientOnRMRestart.java:544) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:507) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:277) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientResendsRequestsOnRMRestart(TestAMRMClientOnRMRestart.java:187) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4247) Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing events
[ https://issues.apache.org/jira/browse/YARN-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952545#comment-14952545 ] Anubhav Dhoot commented on YARN-4247: - Yup. I had tested without that change. Resolving this as not needed. > Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing > events > - > > Key: YARN-4247 > URL: https://issues.apache.org/jira/browse/YARN-4247 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Blocker > Attachments: YARN-4247.001.patch, YARN-4247.001.patch > > > We see this deadlock in our testing where events do not get processed and we > see this in the logs before the RM dies of OOM {noformat} 2015-10-08 > 04:48:01,918 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of > event-queue is 1488000 2015-10-08 04:48:01,918 INFO > org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 1488000 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4250) NPE in AppSchedulingInfo#isRequestLabelChanged
[ https://issues.apache.org/jira/browse/YARN-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952626#comment-14952626 ] Bibin A Chundatt commented on YARN-4250: Hi [~brahmareddy] The check done is incomplete in YARN-4250.patch {code} +return ((null != requestOneLabelExp) && !(requestOneLabelExp +.equals(requestTwoLabelExp))) +|| ((null == requestOneLabelExp) && (null != requestTwoLabelExp)); {code} The check should be like above. > NPE in AppSchedulingInfo#isRequestLabelChanged > -- > > Key: YARN-4250 > URL: https://issues.apache.org/jira/browse/YARN-4250 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Critical > Attachments: YARN-4250-002.patch, YARN-4250.patch > > > *Trace* > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.isRequestLabelChanged(AppSchedulingInfo.java:420) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:342) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:300) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.allocate(FifoScheduler.java:350) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyFifoScheduler.allocate(TestAMRMClientOnRMRestart.java:544) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:507) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:277) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientResendsRequestsOnRMRestart(TestAMRMClientOnRMRestart.java:187) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4252) Log container-executor invocation details when exit code is non-zero
Sidharta Seethana created YARN-4252: --- Summary: Log container-executor invocation details when exit code is non-zero Key: YARN-4252 URL: https://issues.apache.org/jira/browse/YARN-4252 Project: Hadoop YARN Issue Type: Improvement Reporter: Sidharta Seethana Assignee: Sidharta Seethana Priority: Minor It would be useful for debugging/troubleshooting purposes to know the invocation parameters for container-executor (used in LinuxContainerExecutor) if there is a failures. These invocation parameters should be logged in the NM logs at WARN/ERROR level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4252) Log container-executor invocation details when exit code is non-zero
[ https://issues.apache.org/jira/browse/YARN-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sidharta Seethana updated YARN-4252: Component/s: nodemanager > Log container-executor invocation details when exit code is non-zero > > > Key: YARN-4252 > URL: https://issues.apache.org/jira/browse/YARN-4252 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana >Priority: Minor > > It would be useful for debugging/troubleshooting purposes to know the > invocation parameters for container-executor (used in LinuxContainerExecutor) > if there is a failures. These invocation parameters should be logged in the > NM logs at WARN/ERROR level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4253) Standardize on using PrivilegedOperationExecutor for all invocations of container-executor in LinuxContainerExecutor
Sidharta Seethana created YARN-4253: --- Summary: Standardize on using PrivilegedOperationExecutor for all invocations of container-executor in LinuxContainerExecutor Key: YARN-4253 URL: https://issues.apache.org/jira/browse/YARN-4253 Project: Hadoop YARN Issue Type: Improvement Reporter: Sidharta Seethana Assignee: Sidharta Seethana YARN-3443 introduced PrivilegedOperationExecutor and PrivilegedOperation(s) which are meant to wrap invocations to the container-executor binary. However, not all invocations of container-executor in LinuxContainerExecutor use the PrivilegedOperationExecutor. We should change all such invocations to use PrivilegedOperationExecutor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4254) ApplicationAttempt stuck for ever due to UnknowHostexception
Bibin A Chundatt created YARN-4254: -- Summary: ApplicationAttempt stuck for ever due to UnknowHostexception Key: YARN-4254 URL: https://issues.apache.org/jira/browse/YARN-4254 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Scenario === 1. RM HA and 5 NMs available in cluster and are working fine 2. Add one more NM to the same cluster but RM /etc/hosts not updated. 3. Submit application to the same cluster If Am get allocated to the newly added NM the *application attempt will get stuck for ever*.User will not get to know why the same happened. Impact 1.RM logs gets overloaded with exception 2.Application gets stuck for ever. Handling suggestion YARN-261 allows for Fail application attempt . If we fail the same next attempt could get assigned to another NM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)