[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics
[ https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023880#comment-15023880 ] Bibin A Chundatt commented on YARN-4304: Hi [~sunilg] # Could you also check the memory total when container reservation is done for NM. > AM max resource configuration per partition to be displayed/updated correctly > in UI and in various partition related metrics > > > Key: YARN-4304 > URL: https://issues.apache.org/jira/browse/YARN-4304 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4304.patch > > > As we are supporting per-partition level max AM resource percentage > configuration, UI and various metrics also need to display correct > configurations related to same. > For eg: Current UI still shows am-resource percentage per queue level. This > is to be updated correctly when label config is used. > - Display max-am-percentage per-partition in Scheduler UI (label also) and in > ClusterMetrics page > - Update queue/partition related metrics w.r.t per-partition > am-resource-percentage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout
[ https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023870#comment-15023870 ] Tsuyoshi Ozawa commented on YARN-4348: -- {quote} Archiving artifacts [description-setter] Description set: YARN-4348 Recording test results ERROR: Publisher 'Publish JUnit test result report' failed: No test report files were found. Configuration error? Email was triggered for: Failure - Any Sending email for trigger: Failure - Any An attempt to send an e-mail to empty list of recipients, ignored. Finished: FAILURE {quote} Hmm, Jenkins looks to be unhealthy. > ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of > zkSessionTimeout > > > Key: YARN-4348 > URL: https://issues.apache.org/jira/browse/YARN-4348 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.2, 2.6.2 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa >Priority: Blocker > Attachments: YARN-4348-branch-2.7.002.patch, > YARN-4348-branch-2.7.003.patch, YARN-4348.001.patch, YARN-4348.001.patch, > log.txt > > > Jian mentioned that the current internal ZK configuration of ZKRMStateStore > can cause a following situation: > 1. syncInternal timeouts, > 2. but sync succeeded later on. > We should use zkResyncWaitTime as the timeout value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version
[ https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023819#comment-15023819 ] Naganarasimha G R commented on YARN-3623: - Hi [~vinodkv], [~sjlee0], [~gtCarrera] & [~xgong], If the above solution is fine then we can have following steps # make this jira as subjira of 1.5 and introduce the timeline version # As part of YARN-4183, we can create *"yarn.timeline-service.client.require-delegation-token"* so that it fixes depency on *"yarn.timeline-service.enabled"* in the client side to get tokens # YARN-4356 or a *new jira* can handle modifications and updates of timeline version in ATSv2 # Can raise new jira and handle REST interface support to get the supported ATS version and config from the server for ATS 1.5 # Can raise new jira and handle version checking for the ATSv2 interface methods in TimelineClient.and also fetching of version > We should have a config to indicate the Timeline Service version > > > Key: YARN-3623 > URL: https://issues.apache.org/jira/browse/YARN-3623 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Naganarasimha G R > Attachments: YARN-3623-2015-11-19.1.patch > > > So far RM, MR AM, DA AM added/changed new config to enable the feature to > write the timeline data to v2 server. It's good to have a YARN > timeline-service.version config like timeline-service.enable to indicate the > version of the running timeline service with the given YARN cluster. It's > beneficial for users to more smoothly move from v1 to v2, as they don't need > to change the existing config, but switch this config from v1 to v2. And each > framework doesn't need to have their own v1/v2 config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations
[ https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023799#comment-15023799 ] Hudson commented on YARN-4344: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2572 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2572/]) YARN-4344. NMs reconnecting with changed capabilities can lead to wrong (jlowe: rev d36b6e045f317c94e97cb41a163aa974d161a404) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java > NMs reconnecting with changed capabilities can lead to wrong cluster resource > calculations > -- > > Key: YARN-4344 > URL: https://issues.apache.org/jira/browse/YARN-4344 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1, 2.6.2 >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Critical > Fix For: 2.6.3, 2.7.3 > > Attachments: YARN-4344-branch-2.6.001.patch, YARN-4344.001.patch, > YARN-4344.002.patch > > > After YARN-3802, if an NM re-connects to the RM with changed capabilities, > there can arise situations where the overall cluster resource calculation for > the cluster will be incorrect leading to inconsistencies in scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4343) Need to support Application History Server on ATSV2
[ https://issues.apache.org/jira/browse/YARN-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023786#comment-15023786 ] Naganarasimha G R commented on YARN-4343: - Hi [~sjlee0], I think we need to have *"yarn-2928-1st-milestone"* label for this too. As YARNClientImpl tries to search the apps(attempts & Container details) from ATS if its not present in RM. for CLI and AppReportFetcher tries to get the appliction report from ATS for Webservice.So without this i feel its break in functionality so i think for ATS v2 too we need to support it Coming to discussion about the approach, i had offline discussion with [~varun_saxena] and he said that as per the [discussion|https://issues.apache.org/jira/browse/YARN-3047?focusedCommentId=14368563&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14368563] with [~zjshen] in YARN-3047, RPC need not be supported and it can be supported at client side for ATSV2 (have a timelineclient get the timeline entities and convert it into report objects). {quote} For some legacy problem. AHS exposes RPC interface. However, IMHO, we don't need to create the RPC interface again in v2 as we're building the new system from the ground. What we can do is to wrap over the REST APIs in the java client, and provide YARN CLI commands. {quote} May be i can have some kind of factory and instantiate the client based on the configurations and try to do it, but little doubt full on the dependcies part. Let me give a try. > Need to support Application History Server on ATSV2 > --- > > Key: YARN-4343 > URL: https://issues.apache.org/jira/browse/YARN-4343 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > > AHS is used by the CLI and Webproxy(REST), if the application related > information is not found in RM then it tries to fetch from AHS and show -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8
[ https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-4380: - Attachment: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt [~varun_saxena], thank you for the fix. The fix itself looks good me. I got another error though it's rare to happen: {quote} Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) Time elapsed: 0.093 sec <<< FAILURE! org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: Argument(s) are different! Wanted: eventHandler.handle( ); -> at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632) Actual invocation has different arguments: eventHandler.handle( EventType: APPLICATION_INITED ); -> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632) {quote} Attaching a log for the failure. Could you take a look? > TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails > intermittently on branch-2.8 > -- > > Key: YARN-4380 > URL: https://issues.apache.org/jira/browse/YARN-4380 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0, 2.7.1 >Reporter: Tsuyoshi Ozawa >Assignee: Varun Saxena > Attachments: YARN-4380.01.patch, > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt, > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt > > > {quote} > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.109 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > deletionService.delete( > "user0", > null, > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > Actual invocation has different arguments: > deletionService.delete( > "user0", > > /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42 > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4350) TestDistributedShell fails
[ https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023728#comment-15023728 ] Naganarasimha G R commented on YARN-4350: - Hi [~sjlee0], I have rebased YARN-3127, hope you can have a look at it once ? also for YARN-4372 seems like it might take little bit more time(atleast i was not able to crack it). so if its very important then we can take the approach(fixed port got from serversocketutil) as i had mentioned in YARN-2859 and we can continue. And once the YARN-4372 is done we can revert it back to ephemermal port(YARN-2859 solution). Thoughts ? > TestDistributedShell fails > -- > > Key: YARN-4350 > URL: https://issues.apache.org/jira/browse/YARN-4350 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-4350-feature-YARN-2928.001.patch > > > Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. > There seem to be 2 distinct issues. > (1) testDSShellWithoutDomainV2* tests fail sporadically > These test fail more often than not if tested by themselves: > {noformat} > testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 30.998 sec <<< FAILURE! > java.lang.AssertionError: Application created event should be published > atleast once expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207) > {noformat} > They start happening after YARN-4129. I suspect this might have to do with > some timing issue. > (2) the whole test times out > If you run the whole TestDistributedShell test, it times out without fail. > This may or may not have to do with the port change introduced by YARN-2859 > (just a hunch). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4349) Support CallerContext in YARN
[ https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023722#comment-15023722 ] Hudson commented on YARN-4349: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #1442 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1442/]) YARN-4349. Support CallerContext in YARN. Contributed by Wangda Tan (jianhe: rev 8676a118a12165ae5a8b80a2a4596c133471ebc1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAuditLogger.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationStateData.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationStateDataPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java > Support Caller
[jira] [Commented] (YARN-4349) Support CallerContext in YARN
[ https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023671#comment-15023671 ] Hudson commented on YARN-4349: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2650 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2650/]) YARN-4349. Support CallerContext in YARN. Contributed by Wangda Tan (jianhe: rev 8676a118a12165ae5a8b80a2a4596c133471ebc1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationStateData.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationStateDataPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAuditLogger.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java > Supp
[jira] [Commented] (YARN-4349) Support CallerContext in YARN
[ https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023652#comment-15023652 ] Hudson commented on YARN-4349: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #709 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/709/]) YARN-4349. Support CallerContext in YARN. Contributed by Wangda Tan (jianhe: rev 8676a118a12165ae5a8b80a2a4596c133471ebc1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationStateData.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationStateDataPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAuditLogger.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.jav
[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023646#comment-15023646 ] Sangjin Lee commented on YARN-3862: --- I had a chance to go over the latest patch in a little more detail. I think this is now closer to being ready. I do have some comments and suggestions, some major and others minor. (TimelineFilterUtils.java) - createHBaseColQualPrefixFilter(): this is still trying to compute the column prefix by hand. The main point of introducing getColumnPrefixBytes() on ColumnPrefix was to avoid doing this for confs and metrics. Can we rework the signatures of createHBaseFilterList() so that we can rely on ColumnPrefix.getColumnPrefixBytes()? Ideally all computations of qualifier bytes should go through ColumnPrefix.getColumnPrefixBytes(). (TestHBaseTimelineReaderImpl.java) - I'm not too sure about the name; for other tests we basically combined the reader and writer tests. Thoughts on how to make this best fit into the existing tests? (GenericEntityReader.java) - l.139: nit: typo: releated -> related - I keep confusing configFilters and confs. The names are so similar that I have to go check the implementations to distinguish them (configFilters filtering rows we want to return, and confs filters contents of the matching rows). Could there be a better way to name them so that their meanings are clearer? I don't have a great idea at the moment, and you might want to think about better names... - On a related note, this is probably outside the scope of this JIRA, but I see that the configFilter and metricFilter are applied on the client-side. Probably on a separate JIRA, we should see if we can do this on the HBase side. This is just a reminder. - l.156: Why do we need to check if configFilters == null? Is it because if configFilters are specified we implicitly assume we want the config columns returned in the content? Is that a valid assumption? (TimelineReader.java) - Related to one of the points above, at least we should add javadoc that clearly explains confs and metrics and how they are different from configFilters and metricFilters. That will help us a great deal in maintaining this. (FlowRunColumnPrefix.java) - As a result of YARN-4053 being committed, getColumnPrefixBytes(String) already exists. It should be removed from this patch. (TestHBaseStorageFlowRun.java) - testWriteFlowRunMetricsPrefix() and testWriteFlowRunsMetricFields() are failing possibly due to changes in YARN-4053. > Decide which contents to retrieve and send back in response in TimelineReader > - > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-YARN-2928.wip.03.patch, > YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4349) Support CallerContext in YARN
[ https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023581#comment-15023581 ] Hudson commented on YARN-4349: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #719 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/719/]) YARN-4349. Support CallerContext in YARN. Contributed by Wangda Tan (jianhe: rev 8676a118a12165ae5a8b80a2a4596c133471ebc1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationStateDataPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationStateData.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAuditLogger.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java > Supp
[jira] [Commented] (YARN-4349) Support CallerContext in YARN
[ https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023549#comment-15023549 ] Hudson commented on YARN-4349: -- FAILURE: Integrated in Hadoop-trunk-Commit #8868 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8868/]) YARN-4349. Support CallerContext in YARN. Contributed by Wangda Tan (jianhe: rev 8676a118a12165ae5a8b80a2a4596c133471ebc1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationStateData.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationStateDataPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAuditLogger.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java > Support Ca
[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023527#comment-15023527 ] Sangjin Lee commented on YARN-3862: --- Whether we make TimelineFilter part of the object model or not, we'll still need to come up with a way to support filter queries on the URLs, no? While we're at it, today there are no reads done through the TimelineClient API, correct? Today there are only the REST-based queries. Of course this doesn't mean we won't support more programmatic reads via TimelineClient (and RPC?) in the future, and also there may be value in making TimelineFilter part of the common API. I just wanted to understand whether we need to make that call as part of this JIRA. Did I understand this correctly, or did I miss something important? > Decide which contents to retrieve and send back in response in TimelineReader > - > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-YARN-2928.wip.03.patch, > YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations
[ https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023487#comment-15023487 ] Hudson commented on YARN-4344: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #633 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/633/]) YARN-4344. NMs reconnecting with changed capabilities can lead to wrong (jlowe: rev d36b6e045f317c94e97cb41a163aa974d161a404) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * hadoop-yarn-project/CHANGES.txt > NMs reconnecting with changed capabilities can lead to wrong cluster resource > calculations > -- > > Key: YARN-4344 > URL: https://issues.apache.org/jira/browse/YARN-4344 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1, 2.6.2 >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Critical > Fix For: 2.6.3, 2.7.3 > > Attachments: YARN-4344-branch-2.6.001.patch, YARN-4344.001.patch, > YARN-4344.002.patch > > > After YARN-3802, if an NM re-connects to the RM with changed capabilities, > there can arise situations where the overall cluster resource calculation for > the cluster will be incorrect leading to inconsistencies in scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated
[ https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023481#comment-15023481 ] lachisis commented on YARN-4382: Thanks for your reply, Jun Gong. I think it is a good idea to use "release_agent" to clear the empty container hierarchys. But I am afaid that does "release_agent" option suit all the cgroup versions? I just test "release_agent" option, maybe some mistake, it does not work now. > Container hierarchy in cgroup may remain for ever after the container have be > terminated > > > Key: YARN-4382 > URL: https://issues.apache.org/jira/browse/YARN-4382 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.2 >Reporter: lachisis >Assignee: Jun Gong > > If we use LinuxContainerExecutor to executor the containers, this question > may happens. > In the common case, when a container run, a corresponding hierarchy will be > created in cgroup dir. And when the container terminate, the hierarchy will > be delete in some seconds(this time can be configured by > yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms). > In the code, I find that, CgroupsLCEResource send a signal to kill container > process asynchronously, and in the same time, it will try to delete the > container hierarchy in configured "delete-delay-ms" times. > But if the container process be killed for seconds which large than > "delete-delay-ms" time, the container hierarchy will remain for ever. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations
[ https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023435#comment-15023435 ] Hadoop QA commented on YARN-4248: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 58s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 4m 23s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66 with JDK v1.8.0_66 generated 1 new issues (was 2, now 2). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s {color} | {color:red} Patch generated 22 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 40, now 62). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 18s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 35s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 27s {color} | {color:red} Patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 162m 26s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_85 Failed junit tests |
[jira] [Commented] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container
[ https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023415#comment-15023415 ] Lin Yiqun commented on YARN-4381: - Thanks [~djp]! > Add container launchEvent and container localizeFailed metrics in container > --- > > Key: YARN-4381 > URL: https://issues.apache.org/jira/browse/YARN-4381 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4381.001.patch > > > Recently, I found a issue on nodemanager metrics.That's > {{NodeManagerMetrics#containersLaunched}} is not actually means the container > succeed launched times.Because in some time, it will be failed when receiving > the killing command or happening container-localizationFailed.This will lead > to a failed container.But now,this counter value will be increased in these > code whenever the container is started successfully or failed. > {code} > Credentials credentials = parseCredentials(launchContext); > Container container = > new ContainerImpl(getConfig(), this.dispatcher, > context.getNMStateStore(), launchContext, > credentials, metrics, containerTokenIdentifier); > ApplicationId applicationID = > containerId.getApplicationAttemptId().getApplicationId(); > if (context.getContainers().putIfAbsent(containerId, container) != null) { > NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER, > "ContainerManagerImpl", "Container already running on this node!", > applicationID, containerId); > throw RPCUtil.getRemoteException("Container " + containerIdStr > + " already is running on this node!!"); > } > this.readLock.lock(); > try { > if (!serviceStopped) { > // Create the application > Application application = > new ApplicationImpl(dispatcher, user, applicationID, credentials, > context); > if (null == context.getApplications().putIfAbsent(applicationID, > application)) { > LOG.info("Creating a new application reference for app " + > applicationID); > LogAggregationContext logAggregationContext = > containerTokenIdentifier.getLogAggregationContext(); > Map appAcls = > container.getLaunchContext().getApplicationACLs(); > context.getNMStateStore().storeApplication(applicationID, > buildAppProto(applicationID, user, credentials, appAcls, > logAggregationContext)); > dispatcher.getEventHandler().handle( > new ApplicationInitEvent(applicationID, appAcls, > logAggregationContext)); > } > this.context.getNMStateStore().storeContainer(containerId, request); > dispatcher.getEventHandler().handle( > new ApplicationContainerInitEvent(container)); > > this.context.getContainerTokenSecretManager().startContainerSuccessful( > containerTokenIdentifier); > NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER, > "ContainerManageImpl", applicationID, containerId); > // TODO launchedContainer misplaced -> doesn't necessarily mean a > container > // launch. A finished Application will not launch containers. > metrics.launchedContainer(); > metrics.allocateContainer(containerTokenIdentifier.getResource()); > } else { > throw new YarnException( > "Container start failed as the NodeManager is " + > "in the process of shutting down"); > } > {code} > In addition, we are lack of localzationFailed metric in container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4334) Ability to avoid ResourceManager recovery if state store is "too old"
[ https://issues.apache.org/jira/browse/YARN-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023410#comment-15023410 ] Hadoop QA commented on YARN-4334: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 33s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 35s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 45s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 32s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in trunk has 3 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 55s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 17s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 39s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 36s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 36s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 32s {color} | {color:red} Patch generated 6 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 608, now 611). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 39s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager introduced 2 new FindBugs issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 10s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 18s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 16s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 30s {color} | {color:green} hadoop
[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations
[ https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023375#comment-15023375 ] Hudson commented on YARN-4344: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #717 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/717/]) YARN-4344. NMs reconnecting with changed capabilities can lead to wrong (jlowe: rev d36b6e045f317c94e97cb41a163aa974d161a404) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java > NMs reconnecting with changed capabilities can lead to wrong cluster resource > calculations > -- > > Key: YARN-4344 > URL: https://issues.apache.org/jira/browse/YARN-4344 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1, 2.6.2 >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Critical > Fix For: 2.6.3, 2.7.3 > > Attachments: YARN-4344-branch-2.6.001.patch, YARN-4344.001.patch, > YARN-4344.002.patch > > > After YARN-3802, if an NM re-connects to the RM with changed capabilities, > there can arise situations where the overall cluster resource calculation for > the cluster will be incorrect leading to inconsistencies in scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations
[ https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023219#comment-15023219 ] Hudson commented on YARN-4344: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2647 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2647/]) YARN-4344. NMs reconnecting with changed capabilities can lead to wrong (jlowe: rev d36b6e045f317c94e97cb41a163aa974d161a404) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java > NMs reconnecting with changed capabilities can lead to wrong cluster resource > calculations > -- > > Key: YARN-4344 > URL: https://issues.apache.org/jira/browse/YARN-4344 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1, 2.6.2 >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Critical > Fix For: 2.6.3, 2.7.3 > > Attachments: YARN-4344-branch-2.6.001.patch, YARN-4344.001.patch, > YARN-4344.002.patch > > > After YARN-3802, if an NM re-connects to the RM with changed capabilities, > there can arise situations where the overall cluster resource calculation for > the cluster will be incorrect leading to inconsistencies in scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3878: -- Fix Version/s: 2.6.3 +1. Committed it to branch-2.6. Thanks [~varun_saxena]! > AsyncDispatcher can hang while stopping if it is configured for draining > events on stop > --- > > Key: YARN-3878 > URL: https://issues.apache.org/jira/browse/YARN-3878 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena >Priority: Critical > Fix For: 2.7.2, 2.6.3 > > Attachments: YARN-3878-branch-2.6.01.patch, YARN-3878.01.patch, > YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, > YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, > YARN-3878.08.patch, YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h > > > The sequence of events is as under : > # RM is stopped while putting a RMStateStore Event to RMStateStore's > AsyncDispatcher. This leads to an Interrupted Exception being thrown. > # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On > {{serviceStop}}, we will check if all events have been drained and wait for > event queue to drain(as RM State Store dispatcher is configured for queue to > drain on stop). > # This condition never becomes true and AsyncDispatcher keeps on waiting > incessantly for dispatcher event queue to drain till JVM exits. > *Initial exception while posting RM State store event to queue* > {noformat} > 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService > (AbstractService.java:enterState(452)) - Service: Dispatcher entered state > STOPPED > 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) > {noformat} > *JStack of AsyncDispatcher hanging on stop* > {noformat} > "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e > waiting on condition [0x7fb9654e9000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000700b79250> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) > at java.lang.Thread.run(Thread.java:744) > "main" prio=10 tid=0x7fb9800
[jira] [Commented] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent
[ https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023190#comment-15023190 ] Hadoop QA commented on YARN-4358: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s {color} | {color:red} Patch generated 21 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 64, now 85). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 53 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 19s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager introduced 2 new FindBugs issues. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 21s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 2m 44s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_85 with JDK v1.7.0_85 generated 15 new issues (was 2, now 17). {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 22s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 36s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 135m 56s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoo
[jira] [Commented] (YARN-4372) Cannot enable system-metrics-publisher inside MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023169#comment-15023169 ] Vinod Kumar Vavilapalli commented on YARN-4372: --- bq. Even after the patch TestDistributedShell.testDSShellWithoutDomain is failing (test case passes but the in the console logs there were logs for unreachable timlineserver for each smp events). You are right, *sigh*, this is the same bug we ran into at YARN-3087: Guice not letting us run two UI services at the same time. This used to work because Timeline Service started last before this patch. Need to think more, not sure how we can fix this. > Cannot enable system-metrics-publisher inside MiniYARNCluster > - > > Key: YARN-4372 > URL: https://issues.apache.org/jira/browse/YARN-4372 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: YARN-4372-20151119.1.txt > > > [~Naganarasimha] found this at YARN-2859, see [this > comment|https://issues.apache.org/jira/browse/YARN-2859?focusedCommentId=15005746&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15005746]. > The way daemons are started inside MiniYARNCluster, RM is not setup correctly > to send information to TimelineService. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.
[ https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-3925: - Fix Version/s: 2.6.3 I pulled this into 2.6.3 as well. > ContainerLogsUtils#getContainerLogFile fails to read container log files from > full disks. > - > > Key: YARN-3925 > URL: https://issues.apache.org/jira/browse/YARN-3925 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Fix For: 2.7.2, 2.6.3 > > Attachments: YARN-3925.000.patch, YARN-3925.001.patch > > > ContainerLogsUtils#getContainerLogFile fails to read files from full disks. > {{getContainerLogFile}} depends on > {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but > {{LocalDirsHandlerService#getLogPathToRead}} calls > {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses > configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not > include full disks in {{LocalDirsHandlerService#checkDirs}}: > {code} > Configuration conf = getConfig(); > List localDirs = getLocalDirs(); > conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS, > localDirs.toArray(new String[localDirs.size()])); > List logDirs = getLogDirs(); > conf.setStrings(YarnConfiguration.NM_LOG_DIRS, > logDirs.toArray(new String[logDirs.size()])); > {code} > ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and > ContainerLogsPage.ContainersLogsBlock#render to read the log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4360) Improve GreedyReservationAgent to support "early" allocations, and performance improvements
[ https://issues.apache.org/jira/browse/YARN-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023143#comment-15023143 ] Carlo Curino commented on YARN-4360: Rebasing after YARN-3454 got committed. A point of discussion: The configuration could be either choosing "GreedyReservationAgent" and set the allocation direction, or have an other top-level class (e.g., "LeftGreedyReservationAgent") that invokes the same "internals" but configured for left-to-right allocation. One less config param, one more class... thoughts? > Improve GreedyReservationAgent to support "early" allocations, and > performance improvements > > > Key: YARN-4360 > URL: https://issues.apache.org/jira/browse/YARN-4360 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Affects Versions: 2.8.0 >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-4360.2.patch, YARN-4360.patch > > > The GreedyReservationAgent allocates "as late as possible". Per various > conversations, it seems useful to have a mirror behavior that allocates as > early as possible. Also in the process we leverage improvements from > YARN-4358, and implement an RLE-aware StageAllocatorGreedy(RLE), which > significantly speeds up allocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations
[ https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023142#comment-15023142 ] Hudson commented on YARN-4344: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #706 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/706/]) YARN-4344. NMs reconnecting with changed capabilities can lead to wrong (jlowe: rev d36b6e045f317c94e97cb41a163aa974d161a404) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java > NMs reconnecting with changed capabilities can lead to wrong cluster resource > calculations > -- > > Key: YARN-4344 > URL: https://issues.apache.org/jira/browse/YARN-4344 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1, 2.6.2 >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Critical > Fix For: 2.6.3, 2.7.3 > > Attachments: YARN-4344-branch-2.6.001.patch, YARN-4344.001.patch, > YARN-4344.002.patch > > > After YARN-3802, if an NM re-connects to the RM with changed capabilities, > there can arise situations where the overall cluster resource calculation for > the cluster will be incorrect leading to inconsistencies in scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4360) Improve GreedyReservationAgent to support "early" allocations, and performance improvements
[ https://issues.apache.org/jira/browse/YARN-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-4360: --- Attachment: YARN-4360.2.patch > Improve GreedyReservationAgent to support "early" allocations, and > performance improvements > > > Key: YARN-4360 > URL: https://issues.apache.org/jira/browse/YARN-4360 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Affects Versions: 2.8.0 >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-4360.2.patch, YARN-4360.patch > > > The GreedyReservationAgent allocates "as late as possible". Per various > conversations, it seems useful to have a mirror behavior that allocates as > early as possible. Also in the process we leverage improvements from > YARN-4358, and implement an RLE-aware StageAllocatorGreedy(RLE), which > significantly speeds up allocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations
[ https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023127#comment-15023127 ] Hudson commented on YARN-4344: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1439 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1439/]) YARN-4344. NMs reconnecting with changed capabilities can lead to wrong (jlowe: rev d36b6e045f317c94e97cb41a163aa974d161a404) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java > NMs reconnecting with changed capabilities can lead to wrong cluster resource > calculations > -- > > Key: YARN-4344 > URL: https://issues.apache.org/jira/browse/YARN-4344 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1, 2.6.2 >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Critical > Fix For: 2.6.3, 2.7.3 > > Attachments: YARN-4344-branch-2.6.001.patch, YARN-4344.001.patch, > YARN-4344.002.patch > > > After YARN-3802, if an NM re-connects to the RM with changed capabilities, > there can arise situations where the overall cluster resource calculation for > the cluster will be incorrect leading to inconsistencies in scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8
[ https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023097#comment-15023097 ] Hadoop QA commented on YARN-4380: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 1m 59s {color} | {color:red} root in trunk failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 24s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 22s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 51s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 27m 11s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12773897/YARN-4380.01.patch | | JIRA Issue | YARN-4380 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 983711f1122c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d36b6e0 | | mvninstall | http
[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023084#comment-15023084 ] Varun Saxena commented on YARN-3840: Added an image to show sorted app ids' with the script in the patch > Resource Manager web ui issue when sorting application by id (with > application having id > ) > > > Key: YARN-3840 > URL: https://issues.apache.org/jira/browse/YARN-3840 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: LINTE >Assignee: Varun Saxena > Fix For: 2.8.0, 2.7.3 > > Attachments: RMApps.png, RMApps_Sorted.png, YARN-3840-1.patch, > YARN-3840-2.patch, YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, > YARN-3840-6.patch, YARN-3840.reopened.001.patch, yarn-3840-7.patch > > > On the WEBUI, the global main view page : > http://resourcemanager:8088/cluster/apps doesn't display applications over > . > With command line it works (# yarn application -list). > Regards, > Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3840: --- Attachment: RMApps_Sorted.png > Resource Manager web ui issue when sorting application by id (with > application having id > ) > > > Key: YARN-3840 > URL: https://issues.apache.org/jira/browse/YARN-3840 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: LINTE >Assignee: Varun Saxena > Fix For: 2.8.0, 2.7.3 > > Attachments: RMApps.png, RMApps_Sorted.png, YARN-3840-1.patch, > YARN-3840-2.patch, YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, > YARN-3840-6.patch, YARN-3840.reopened.001.patch, yarn-3840-7.patch > > > On the WEBUI, the global main view page : > http://resourcemanager:8088/cluster/apps doesn't display applications over > . > With command line it works (# yarn application -list). > Regards, > Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8
[ https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023041#comment-15023041 ] Varun Saxena commented on YARN-4380: Check for localizer runner thread to finish should be enough for the test to pass. But to avoid InterruptedException in logs(due to race), have added check for localization to begin as well before initiating localizer heartbeats in test. > TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails > intermittently on branch-2.8 > -- > > Key: YARN-4380 > URL: https://issues.apache.org/jira/browse/YARN-4380 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0, 2.7.1 >Reporter: Tsuyoshi Ozawa >Assignee: Varun Saxena > Attachments: YARN-4380.01.patch, > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt > > > {quote} > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.109 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > deletionService.delete( > "user0", > null, > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > Actual invocation has different arguments: > deletionService.delete( > "user0", > > /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42 > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8
[ https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4380: --- Attachment: YARN-4380.01.patch > TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails > intermittently on branch-2.8 > -- > > Key: YARN-4380 > URL: https://issues.apache.org/jira/browse/YARN-4380 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Tsuyoshi Ozawa >Assignee: Varun Saxena > Attachments: YARN-4380.01.patch, > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt > > > {quote} > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.109 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > deletionService.delete( > "user0", > null, > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > Actual invocation has different arguments: > deletionService.delete( > "user0", > > /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42 > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations
[ https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022963#comment-15022963 ] Hudson commented on YARN-4344: -- FAILURE: Integrated in Hadoop-trunk-Commit #8864 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8864/]) YARN-4344. NMs reconnecting with changed capabilities can lead to wrong (jlowe: rev d36b6e045f317c94e97cb41a163aa974d161a404) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java > NMs reconnecting with changed capabilities can lead to wrong cluster resource > calculations > -- > > Key: YARN-4344 > URL: https://issues.apache.org/jira/browse/YARN-4344 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1, 2.6.2 >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Critical > Attachments: YARN-4344-branch-2.6.001.patch, YARN-4344.001.patch, > YARN-4344.002.patch > > > After YARN-3802, if an NM re-connects to the RM with changed capabilities, > there can arise situations where the overall cluster resource calculation for > the cluster will be incorrect leading to inconsistencies in scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4334) Ability to avoid ResourceManager recovery if state store is "too old"
[ https://issues.apache.org/jira/browse/YARN-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4334: --- Attachment: YARN-4334.4.patch .4 patch fix some checkstyle and whitespace issues. TestWebApp is tracked by YARN-4379, not related to my change. TestAMAuthorization and TestClientRMTokens are not caused by my patch either. [~jlowe], please help review the latest patch, thanks! > Ability to avoid ResourceManager recovery if state store is "too old" > - > > Key: YARN-4334 > URL: https://issues.apache.org/jira/browse/YARN-4334 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Jason Lowe >Assignee: Chang Li > Attachments: YARN-4334.2.patch, YARN-4334.3.patch, YARN-4334.4.patch, > YARN-4334.patch, YARN-4334.wip.2.patch, YARN-4334.wip.3.patch, > YARN-4334.wip.4.patch, YARN-4334.wip.patch > > > There are times when a ResourceManager has been down long enough that > ApplicationMasters and potentially external client-side monitoring mechanisms > have given up completely. If the ResourceManager starts back up and tries to > recover we can get into situations where the RM launches new application > attempts for the AMs that gave up, but then the client _also_ launches > another instance of the app because it assumed everything was dead. > It would be nice if the RM could be optionally configured to avoid trying to > recover if the state store was "too old." The RM would come up without any > applications recovered, but we would avoid a double-submission situation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations
[ https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022933#comment-15022933 ] Jason Lowe commented on YARN-4344: -- +1 for branch-2.6 patch as well, committing this. > NMs reconnecting with changed capabilities can lead to wrong cluster resource > calculations > -- > > Key: YARN-4344 > URL: https://issues.apache.org/jira/browse/YARN-4344 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1, 2.6.2 >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Critical > Attachments: YARN-4344-branch-2.6.001.patch, YARN-4344.001.patch, > YARN-4344.002.patch > > > After YARN-3802, if an NM re-connects to the RM with changed capabilities, > there can arise situations where the overall cluster resource calculation for > the cluster will be incorrect leading to inconsistencies in scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022885#comment-15022885 ] Sangjin Lee commented on YARN-3762: --- This should be a good candidate for branch-2.6 (2.6.3). [~kasha], what do you think? > FairScheduler: CME on FSParentQueue#getQueueUserAclInfo > --- > > Key: YARN-3762 > URL: https://issues.apache.org/jira/browse/YARN-3762 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > Fix For: 2.8.0 > > Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch > > > In our testing, we ran into the following ConcurrentModificationException: > {noformat} > halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0 > 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, > queueName=root.testyarnpool3, queueCurrentCapacity=0.0, > queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0 > 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client > java.util.ConcurrentModificationException: > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes
[ https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated YARN-4386: -- Priority: Minor (was: Major) > refreshNodesGracefully() looks at active RMNode list for recommissioning > decommissioned nodes > - > > Key: YARN-4386 > URL: https://issues.apache.org/jira/browse/YARN-4386 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Minor > > In refreshNodesGracefully(), during recommissioning, the entryset from > getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is > used for checking 'decommissioned' nodes which are present in > getInactiveRMNodes() map alone. > {code} > for (Entry entry:rmContext.getRMNodes().entrySet()) { > . > // Recommissioning the nodes > if (entry.getValue().getState() == NodeState.DECOMMISSIONING > || entry.getValue().getState() == NodeState.DECOMMISSIONED) { > this.rmContext.getDispatcher().getEventHandler() > .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION)); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes
[ https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022866#comment-15022866 ] Kuhu Shukla commented on YARN-4386: --- Yes I agree. Missed correlating the DECOMMISSIONED state transition to this check. Changing the Priority to Minor. > refreshNodesGracefully() looks at active RMNode list for recommissioning > decommissioned nodes > - > > Key: YARN-4386 > URL: https://issues.apache.org/jira/browse/YARN-4386 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > > In refreshNodesGracefully(), during recommissioning, the entryset from > getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is > used for checking 'decommissioned' nodes which are present in > getInactiveRMNodes() map alone. > {code} > for (Entry entry:rmContext.getRMNodes().entrySet()) { > . > // Recommissioning the nodes > if (entry.getValue().getState() == NodeState.DECOMMISSIONING > || entry.getValue().getState() == NodeState.DECOMMISSIONED) { > this.rmContext.getDispatcher().getEventHandler() > .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION)); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent
[ https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-4358: --- Attachment: YARN-4358.2.patch > Improve relationship between SharingPolicy and ReservationAgent > --- > > Key: YARN-4358 > URL: https://issues.apache.org/jira/browse/YARN-4358 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 2.8.0 > > Attachments: YARN-4358.2.patch, YARN-4358.patch > > > At the moment an agent places based on available resources, but has no > visibility to extra constraints imposed by the SharingPolicy. While not all > constraints are easily represented some (e.g., max-instantaneous resources) > are easily represented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent
[ https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022842#comment-15022842 ] Carlo Curino commented on YARN-4358: (1)/(2) done (3) as discussed above... agreed to circle back later. (4) was not address as the two fields are long and the diff might exceed int. (5) I implemented the requested changes and refactor a little further the PlanView interface and InMemoryPlan, getting rid of few methods which were not used anymore as we are switching to the more efficient RLE-centric requests to the plan. > Improve relationship between SharingPolicy and ReservationAgent > --- > > Key: YARN-4358 > URL: https://issues.apache.org/jira/browse/YARN-4358 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 2.8.0 > > Attachments: YARN-4358.patch > > > At the moment an agent places based on available resources, but has no > visibility to extra constraints imposed by the SharingPolicy. While not all > constraints are easily represented some (e.g., max-instantaneous resources) > are easily represented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4384) updateNodeResource CLI should not accept negative values for resource
[ https://issues.apache.org/jira/browse/YARN-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022840#comment-15022840 ] Hadoop QA commented on YARN-4384: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 26s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 37s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 114m 7s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.client.TestGetGroups | | JDK v1.8.0_66 Timed out junit tests | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient | | | org.apache.hadoop.yarn.client.api.impl.TestYarnClient | | | org.apache.hadoop.yarn.client.api.impl.TestNMClient | | JDK v1.7.0_85 Failed junit tests | hadoop.yarn.client.TestGetGroups | | JDK v1.7.0_85 Timed out junit tests | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient | | | org.apache.hadoop.yarn.client.api.impl.TestYarnClient | | | org.apache.hadoop.yarn.client.api.impl.TestNMClient | \\ \\ || Subsystem || Report
[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes
[ https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022818#comment-15022818 ] Sunil G commented on YARN-4386: --- Hi [~kshukla], As I see it, we can RECOMMISSION only those nodes which are in DECOMMISSIONING mode. Such nodes are present in {{getRMNodes}} which is correct. Also if you see {{RMNodeImpl}}, RECOMMISSION event is not present from DECOMMISSIONED state. Hence even if it hits the code, it will throw an InvalidState Exception. So looping only in {{rmContext.getRMNodes()}} looks fine for me, however I also feel we do not need that extra if check which does for DECOMMISSIONED. cc/ [~djp] > refreshNodesGracefully() looks at active RMNode list for recommissioning > decommissioned nodes > - > > Key: YARN-4386 > URL: https://issues.apache.org/jira/browse/YARN-4386 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > > In refreshNodesGracefully(), during recommissioning, the entryset from > getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is > used for checking 'decommissioned' nodes which are present in > getInactiveRMNodes() map alone. > {code} > for (Entry entry:rmContext.getRMNodes().entrySet()) { > . > // Recommissioning the nodes > if (entry.getValue().getState() == NodeState.DECOMMISSIONING > || entry.getValue().getState() == NodeState.DECOMMISSIONED) { > this.rmContext.getDispatcher().getEventHandler() > .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION)); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4365) FileSystemNodeLabelStore should check for root dir existence on startup
[ https://issues.apache.org/jira/browse/YARN-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022816#comment-15022816 ] Kuhu Shukla commented on YARN-4365: --- Test failure is irreproducible locally and is unrelated to the patch as far as I can see. Findbugs warnings are coming from {{org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl}}, which are not related to this patch. > FileSystemNodeLabelStore should check for root dir existence on startup > --- > > Key: YARN-4365 > URL: https://issues.apache.org/jira/browse/YARN-4365 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Jason Lowe >Assignee: Kuhu Shukla > Attachments: YARN-4365-1.patch > > > If the namenode is in safe mode for some reason then FileSystemNodeLabelStore > will prevent the RM from starting since it unconditionally tries to create > the root directory for the label store. If the root directory already exists > and no labels are changing then we shouldn't prevent the RM from starting > even if the namenode is in safe mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4204) ConcurrentModificationException in FairSchedulerQueueInfo
[ https://issues.apache.org/jira/browse/YARN-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022808#comment-15022808 ] Sangjin Lee commented on YARN-4204: --- This looks like a great candidate for branch-2.6. [~adhoot]? > ConcurrentModificationException in FairSchedulerQueueInfo > - > > Key: YARN-4204 > URL: https://issues.apache.org/jira/browse/YARN-4204 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.8.0 > > Attachments: YARN-4204.001.patch, YARN-4204.002.patch > > > Saw this exception which caused RM to go down > {noformat} > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1042) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.FairSchedulerQueueInfo.(FairSchedulerQueueInfo.java:100) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.FairSchedulerInfo.(FairSchedulerInfo.java:46) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getSchedulerInfo(RMWebServices.java:229) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) > at > com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) > at > com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) > at > com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) > at > com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) > at > com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) > at > com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) > at > com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) > at > com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) > at > com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > at > com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) > at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:589) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:291) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:552) > at > org.apache.hadoop.yarn.server.security.http.RMAuthe
[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks
[ https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022796#comment-15022796 ] Sangjin Lee commented on YARN-2975: --- I think we should backport this to branch-2.6. This is a very important follow-up fix to YARN-2910. [~kasha]? > FSLeafQueue app lists are accessed without required locks > - > > Key: YARN-2975 > URL: https://issues.apache.org/jira/browse/YARN-2975 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Fix For: 2.7.0 > > Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch > > > YARN-2910 adds explicit locked access to runnable and non-runnable apps in > FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed > without locks in other places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8
[ https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022782#comment-15022782 ] Varun Saxena commented on YARN-4380: Thanks [~ozawa]. InterruptedException indicates that there is a race. Because LocalizerRunner#interrupt has been called while credential file is being written. > TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails > intermittently on branch-2.8 > -- > > Key: YARN-4380 > URL: https://issues.apache.org/jira/browse/YARN-4380 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Tsuyoshi Ozawa >Assignee: Varun Saxena > Attachments: > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt > > > {quote} > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.109 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > deletionService.delete( > "user0", > null, > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > Actual invocation has different arguments: > deletionService.delete( > "user0", > > /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42 > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes
Kuhu Shukla created YARN-4386: - Summary: refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes Key: YARN-4386 URL: https://issues.apache.org/jira/browse/YARN-4386 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Kuhu Shukla Assignee: Kuhu Shukla In refreshNodesGracefully(), during recommissioning, the entryset from getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is used for checking 'decommissioned' nodes which are present in getInactiveRMNodes() map alone. {code} for (Entry entry:rmContext.getRMNodes().entrySet()) { . // Recommissioning the nodes if (entry.getValue().getState() == NodeState.DECOMMISSIONING || entry.getValue().getState() == NodeState.DECOMMISSIONED) { this.rmContext.getDispatcher().getEventHandler() .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION)); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022715#comment-15022715 ] Karthik Kambatla commented on YARN-3980: +1. Will commit this later today. > Plumb resource-utilization info in node heartbeat through to the scheduler > -- > > Key: YARN-3980 > URL: https://issues.apache.org/jira/browse/YARN-3980 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.7.1 >Reporter: Karthik Kambatla >Assignee: Inigo Goiri > Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, > YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch, > YARN-3980-v5.patch, YARN-3980-v6.patch, YARN-3980-v7.patch, > YARN-3980-v8.patch, YARN-3980-v9.patch > > > YARN-1012 and YARN-3534 collect resource utilization information for all > containers and the node respectively and send it to the RM on node heartbeat. > We should plumb it through to the scheduler so the scheduler can make use of > it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent
[ https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022698#comment-15022698 ] Carlo Curino commented on YARN-4358: [~asuresh] thanks for the comments. I agree with them. In particular regarding (3), we are currently slightly abusing the use of RLESparseResourceAllocation to efficiently track time-varying quantities (which are not memory/core resources). As YARN-3926 lands, this can be made look much cleaner as we will be able to define new logical resources. I will address your comments and upload a new version soon. > Improve relationship between SharingPolicy and ReservationAgent > --- > > Key: YARN-4358 > URL: https://issues.apache.org/jira/browse/YARN-4358 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 2.8.0 > > Attachments: YARN-4358.patch > > > At the moment an agent places based on available resources, but has no > visibility to extra constraints imposed by the SharingPolicy. While not all > constraints are easily represented some (e.g., max-instantaneous resources) > are easily represented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4385) TestDistributedShell times out
[ https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-4385: - Attachment: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.txt Attaching a log when it fails. > TestDistributedShell times out > -- > > Key: YARN-4385 > URL: https://issues.apache.org/jira/browse/YARN-4385 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Tsuyoshi Ozawa > Attachments: > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4385) TestDistributedShell times out
[ https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-4385: - Component/s: test > TestDistributedShell times out > -- > > Key: YARN-4385 > URL: https://issues.apache.org/jira/browse/YARN-4385 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Tsuyoshi Ozawa > Attachments: > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3454) Add efficient merge operation to RLESparseResourceAllocation
[ https://issues.apache.org/jira/browse/YARN-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022688#comment-15022688 ] Carlo Curino commented on YARN-3454: [~asuresh] Thank you so much for the thoughtful review and commit. > Add efficient merge operation to RLESparseResourceAllocation > > > Key: YARN-3454 > URL: https://issues.apache.org/jira/browse/YARN-3454 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0, 2.7.1, 2.6.2 >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 2.8.0 > > Attachments: YARN-3454.1.patch, YARN-3454.2.patch, YARN-3454.3.patch, > YARN-3454.4.patch, YARN-3454.5.patch, YARN-3454.patch > > > The RLESparseResourceAllocation.removeInterval(...) method handles well exact > match interval removals, but does not handles correctly partial overlaps. > In the context of this fix, we also introduced static methods to "merge" two > RLESparseResourceAllocation, while applying an operator in the process > (add/subtract/min/max/subtractTestPositive) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8
[ https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-4380: - Attachment: org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt [~varun_saxena] attaching a log when the test fails. I use this simple script to reproduce some intermittent failures https://github.com/oza/failchecker > TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails > intermittently on branch-2.8 > -- > > Key: YARN-4380 > URL: https://issues.apache.org/jira/browse/YARN-4380 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Tsuyoshi Ozawa >Assignee: Varun Saxena > Attachments: > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt > > > {quote} > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.109 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > deletionService.delete( > "user0", > null, > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > Actual invocation has different arguments: > deletionService.delete( > "user0", > > /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42 > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4385) TestDistributedShell times out
[ https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022660#comment-15022660 ] Tsuyoshi Ozawa edited comment on YARN-4385 at 11/23/15 6:26 PM: On my local log: {quote} Tests run: 11, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 437.156 sec <<< FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testDSShellWithCustomLogPropertyFile(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 115.558 sec <<< ERROR! java.lang.Exception: test timed out after 9 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:734) at org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:715) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithCustomLogPropertyFile(TestDistributedShell.java:502) {quote} was (Author: ozawa): >From https://builds.apache.org/job/Hadoop-Yarn-trunk/1380/ {quote} TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShellWithNodeLabels.setup:47 » YarnRuntime java.io.IOException:... Tests run: 14, Failures: 0, Errors: 12, Skipped: 0 [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Hadoop YARN SUCCESS [ 4.803 s] [INFO] Apache Hadoop YARN API SUCCESS [04:44 min] [INFO] Apache Hadoop YARN Common . SUCCESS [03:31 min] [INFO] Apache Hadoop YARN Server . SUCCESS [ 0.109 s] [INFO] Apache Hadoop YARN Server Common .. SUCCESS [ 57.348 s] [INFO] Apache Hadoop YARN NodeManager SUCCESS [10:05 min] [INFO] Apache Hadoop YARN Web Proxy .. SUCCESS [ 29.458 s] [INFO] Apache Hadoop YARN ApplicationHistoryService .. SUCCESS [03:46 min] [INFO] Apache Hadoop YARN ResourceManager SUCCESS [ 01:03 h] [INFO] Apache Hadoop YARN Server Tests ... SUCCESS [01:52 min] [INFO] Apache Hadoop YARN Client . SUCCESS [07:21 min] [INFO] Apache Hadoop YARN SharedCacheManager . SUCCESS [ 32.136 s] [INFO] Apache Hadoop YARN Applications ... SUCCESS [ 0.053 s] [INFO] Apache Hadoop YARN DistributedShell ... FAILURE [ 29.403 s] [INFO] Apache Hadoop YARN Unmanaged Am Launcher .. SKIPPED [INFO] Apache Hadoop YARN Site ... SKIPPED [INFO] Apache Hadoop YARN Registry ... SKIPPED [INFO] Apache Hadoop YARN Project SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 01:37 h [INFO] Finished at: 2015-11-09T20:36:25+00:00 [INFO] Final Memory: 81M/690M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on project hadoop-yarn-applications-distributedshell: There are test failures. [ERROR] [ERROR] Please refer to /home/jenkins/jenkins-slave/workspace/Hadoop-Yarn-trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/surefire-reports for the individual test results. [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hadoop-yarn-applications-distributedshell Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Updating HDFS-9234 Sending e-mails to: yarn-...@hadoop.apache.org Email was triggered for: Failure - Any Sending email for trigger: Fa
[jira] [Comment Edited] (YARN-4385) TestDistributedShell times out
[ https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022660#comment-15022660 ] Tsuyoshi Ozawa edited comment on YARN-4385 at 11/23/15 6:25 PM: >From https://builds.apache.org/job/Hadoop-Yarn-trunk/1380/ {quote} TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShellWithNodeLabels.setup:47 » YarnRuntime java.io.IOException:... Tests run: 14, Failures: 0, Errors: 12, Skipped: 0 [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Hadoop YARN SUCCESS [ 4.803 s] [INFO] Apache Hadoop YARN API SUCCESS [04:44 min] [INFO] Apache Hadoop YARN Common . SUCCESS [03:31 min] [INFO] Apache Hadoop YARN Server . SUCCESS [ 0.109 s] [INFO] Apache Hadoop YARN Server Common .. SUCCESS [ 57.348 s] [INFO] Apache Hadoop YARN NodeManager SUCCESS [10:05 min] [INFO] Apache Hadoop YARN Web Proxy .. SUCCESS [ 29.458 s] [INFO] Apache Hadoop YARN ApplicationHistoryService .. SUCCESS [03:46 min] [INFO] Apache Hadoop YARN ResourceManager SUCCESS [ 01:03 h] [INFO] Apache Hadoop YARN Server Tests ... SUCCESS [01:52 min] [INFO] Apache Hadoop YARN Client . SUCCESS [07:21 min] [INFO] Apache Hadoop YARN SharedCacheManager . SUCCESS [ 32.136 s] [INFO] Apache Hadoop YARN Applications ... SUCCESS [ 0.053 s] [INFO] Apache Hadoop YARN DistributedShell ... FAILURE [ 29.403 s] [INFO] Apache Hadoop YARN Unmanaged Am Launcher .. SKIPPED [INFO] Apache Hadoop YARN Site ... SKIPPED [INFO] Apache Hadoop YARN Registry ... SKIPPED [INFO] Apache Hadoop YARN Project SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 01:37 h [INFO] Finished at: 2015-11-09T20:36:25+00:00 [INFO] Final Memory: 81M/690M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on project hadoop-yarn-applications-distributedshell: There are test failures. [ERROR] [ERROR] Please refer to /home/jenkins/jenkins-slave/workspace/Hadoop-Yarn-trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/surefire-reports for the individual test results. [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hadoop-yarn-applications-distributedshell Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Updating HDFS-9234 Sending e-mails to: yarn-...@hadoop.apache.org Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## 12 tests failed. FAILED: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithInvalidArgs Error Message: java.io.IOException: ResourceManager failed to start. Final state is STOPPED Stack Trace: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:331) at org.apache.hadoop.yarn.server.MiniYARNCluster.access$500(MiniYARNCluster.java:99) at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(
[jira] [Moved] (YARN-4385) TestDistributedShell times out
[ https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa moved HADOOP-12591 to YARN-4385: --- Key: YARN-4385 (was: HADOOP-12591) Project: Hadoop YARN (was: Hadoop Common) > TestDistributedShell times out > -- > > Key: YARN-4385 > URL: https://issues.apache.org/jira/browse/YARN-4385 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tsuyoshi Ozawa > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4385) TestDistributedShell times out
[ https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022660#comment-15022660 ] Tsuyoshi Ozawa commented on YARN-4385: -- >From https://builds.apache.org/job/Hadoop-Yarn-trunk/1380/ {quote} ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 11262 lines...] TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShellWithNodeLabels.setup:47 » YarnRuntime java.io.IOException:... Tests run: 14, Failures: 0, Errors: 12, Skipped: 0 [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Hadoop YARN SUCCESS [ 4.803 s] [INFO] Apache Hadoop YARN API SUCCESS [04:44 min] [INFO] Apache Hadoop YARN Common . SUCCESS [03:31 min] [INFO] Apache Hadoop YARN Server . SUCCESS [ 0.109 s] [INFO] Apache Hadoop YARN Server Common .. SUCCESS [ 57.348 s] [INFO] Apache Hadoop YARN NodeManager SUCCESS [10:05 min] [INFO] Apache Hadoop YARN Web Proxy .. SUCCESS [ 29.458 s] [INFO] Apache Hadoop YARN ApplicationHistoryService .. SUCCESS [03:46 min] [INFO] Apache Hadoop YARN ResourceManager SUCCESS [ 01:03 h] [INFO] Apache Hadoop YARN Server Tests ... SUCCESS [01:52 min] [INFO] Apache Hadoop YARN Client . SUCCESS [07:21 min] [INFO] Apache Hadoop YARN SharedCacheManager . SUCCESS [ 32.136 s] [INFO] Apache Hadoop YARN Applications ... SUCCESS [ 0.053 s] [INFO] Apache Hadoop YARN DistributedShell ... FAILURE [ 29.403 s] [INFO] Apache Hadoop YARN Unmanaged Am Launcher .. SKIPPED [INFO] Apache Hadoop YARN Site ... SKIPPED [INFO] Apache Hadoop YARN Registry ... SKIPPED [INFO] Apache Hadoop YARN Project SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 01:37 h [INFO] Finished at: 2015-11-09T20:36:25+00:00 [INFO] Final Memory: 81M/690M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on project hadoop-yarn-applications-distributedshell: There are test failures. [ERROR] [ERROR] Please refer to /home/jenkins/jenkins-slave/workspace/Hadoop-Yarn-trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/surefire-reports for the individual test results. [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hadoop-yarn-applications-distributedshell Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Updating HDFS-9234 Sending e-mails to: yarn-...@hadoop.apache.org Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## 12 tests failed. FAILED: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithInvalidArgs Error Message: java.io.IOException: ResourceManager failed to start. Final state is STOPPED Stack Trace: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:331) at org.apache.hadoop.yarn.server
[jira] [Commented] (YARN-3127) Avoid timeline events during RM recovery or restart
[ https://issues.apache.org/jira/browse/YARN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022644#comment-15022644 ] Naganarasimha G R commented on YARN-3127: - YARN-4306 and YARN-4318 have been already raised for the test failures > Avoid timeline events during RM recovery or restart > --- > > Key: YARN-3127 > URL: https://issues.apache.org/jira/browse/YARN-3127 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, timelineserver >Affects Versions: 2.6.0, 2.7.1 > Environment: RM HA with ATS >Reporter: Bibin A Chundatt >Assignee: Naganarasimha G R >Priority: Critical > Attachments: AppTransition.png, YARN-3127.20150213-1.patch, > YARN-3127.20150329-1.patch, YARN-3127.20150624-1.patch, > YARN-3127.20151123-1.patch > > > 1.Start RM with HA and ATS configured and run some yarn applications > 2.Once applications are finished sucessfully start timeline server > 3.Now failover HA form active to standby > 4.Access timeline server URL :/applicationhistory > //Note Earlier exception was thrown when accessed. > Incomplete information is shown in the ATS web UI. i.e. attempt container and > other information is not displayed. > Also even if timeline server is started with RM, and on RM restart/ recovery > ATS events for the applications already existing in ATS are resent which is > not required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3127) Avoid timeline events during RM recovery or restart
[ https://issues.apache.org/jira/browse/YARN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022621#comment-15022621 ] Naganarasimha G R commented on YARN-3127: - Seems like test failures are unrelated to the fix and check style is not valid. > Avoid timeline events during RM recovery or restart > --- > > Key: YARN-3127 > URL: https://issues.apache.org/jira/browse/YARN-3127 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, timelineserver >Affects Versions: 2.6.0, 2.7.1 > Environment: RM HA with ATS >Reporter: Bibin A Chundatt >Assignee: Naganarasimha G R >Priority: Critical > Attachments: AppTransition.png, YARN-3127.20150213-1.patch, > YARN-3127.20150329-1.patch, YARN-3127.20150624-1.patch, > YARN-3127.20151123-1.patch > > > 1.Start RM with HA and ATS configured and run some yarn applications > 2.Once applications are finished sucessfully start timeline server > 3.Now failover HA form active to standby > 4.Access timeline server URL :/applicationhistory > //Note Earlier exception was thrown when accessed. > Incomplete information is shown in the ATS web UI. i.e. attempt container and > other information is not displayed. > Also even if timeline server is started with RM, and on RM restart/ recovery > ATS events for the applications already existing in ATS are resent which is > not required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022606#comment-15022606 ] Eric Payne commented on YARN-4225: -- bq. someone might call has when they should be calling get. Maybe a name like "isPreemptionDisabledValid" or something would be more clear In order to remove the need for two methods, another alternative would be to have {{QueueInfoPBImpl#getPreemptionDisabled}} return a {{Boolean}} rather than a native type, and then have it return null if it internally determines that the field is not there. So, in {{QueueCLI#printQueueInfo}}, the code would look something like this: {code} Boolean preemptStatus = queueInfo.getPreemptionDisabled(); if (preemptStatus != null) { writer.print("\tPreemption : "); writer.println(preemptStatus ? "disabled" : "enabled"); } {code} In General, what is the Hadoop policy when a newer client talks to an older server and the protobuf output is different than expected. Should we expose some form of the {{has}} method, or should we overload the {{get}} method as I described here? I would appreciate any additional feedback from the community in general ([~vinodkv], do you have any thoughts?) > Add preemption status to yarn queue -status for capacity scheduler > -- > > Key: YARN-4225 > URL: https://issues.apache.org/jira/browse/YARN-4225 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 2.7.1 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Minor > Attachments: YARN-4225.001.patch, YARN-4225.002.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8
[ https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022590#comment-15022590 ] Varun Saxena commented on YARN-4380: Thanks for reporting this [~ozawa]. I tried running this test several times on branch-2 but could not simulate the failure. If you are able to simulate, it will be helpful if you can share the logs. > TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails > intermittently on branch-2.8 > -- > > Key: YARN-4380 > URL: https://issues.apache.org/jira/browse/YARN-4380 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Tsuyoshi Ozawa >Assignee: Varun Saxena > > {quote} > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.109 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > deletionService.delete( > "user0", > null, > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > Actual invocation has different arguments: > deletionService.delete( > "user0", > > /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42 > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8
[ https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-4380: -- Assignee: Varun Saxena > TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails > intermittently on branch-2.8 > -- > > Key: YARN-4380 > URL: https://issues.apache.org/jira/browse/YARN-4380 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Tsuyoshi Ozawa >Assignee: Varun Saxena > > {quote} > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.109 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > deletionService.delete( > "user0", > null, > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > Actual invocation has different arguments: > deletionService.delete( > "user0", > > /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42 > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout
[ https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-4348: - Attachment: YARN-4348-branch-2.7.003.patch > ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of > zkSessionTimeout > > > Key: YARN-4348 > URL: https://issues.apache.org/jira/browse/YARN-4348 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.2, 2.6.2 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa >Priority: Blocker > Attachments: YARN-4348-branch-2.7.002.patch, > YARN-4348-branch-2.7.003.patch, YARN-4348.001.patch, YARN-4348.001.patch, > log.txt > > > Jian mentioned that the current internal ZK configuration of ZKRMStateStore > can cause a following situation: > 1. syncInternal timeouts, > 2. but sync succeeded later on. > We should use zkResyncWaitTime as the timeout value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4384) updateNodeResource CLI should not accept negative values for resource
[ https://issues.apache.org/jira/browse/YARN-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4384: - Attachment: YARN-4384.patch Upload a patch to fix this. > updateNodeResource CLI should not accept negative values for resource > - > > Key: YARN-4384 > URL: https://issues.apache.org/jira/browse/YARN-4384 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Sushmitha Sreenivasan >Assignee: Junping Du > Fix For: 2.8.0 > > Attachments: YARN-4384.patch > > > updateNodeResource CLI should not accept negative values for MemSize and > vCores. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022480#comment-15022480 ] Jason Lowe commented on YARN-4225: -- Yes, I was thinking the client would refrain from reporting on a field it knew wasn't provided. However I think having a getPreemptionDisabled and hasPreemptionDisabled methods exposed outside the protobuf is very confusing -- someone might call has when they should be calling get. Maybe a name like "isPreemptionDisabledValid" or something would be more clear. > Add preemption status to yarn queue -status for capacity scheduler > -- > > Key: YARN-4225 > URL: https://issues.apache.org/jira/browse/YARN-4225 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 2.7.1 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Minor > Attachments: YARN-4225.001.patch, YARN-4225.002.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4384) updateNodeResource CLI should not accept negative values for resource
[ https://issues.apache.org/jira/browse/YARN-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022459#comment-15022459 ] Junping Du commented on YARN-4384: -- Thanks for reporting this, [~ssreenivasan]! I agree that we should check the value to make sure admin/user won't set some invalid values to memory and vCore unintentionally. Will update a patch to fix it. > updateNodeResource CLI should not accept negative values for resource > - > > Key: YARN-4384 > URL: https://issues.apache.org/jira/browse/YARN-4384 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Sushmitha Sreenivasan >Assignee: Junping Du > Fix For: 2.8.0 > > > updateNodeResource CLI should not accept negative values for MemSize and > vCores. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8
[ https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022458#comment-15022458 ] Jason Lowe commented on YARN-4380: -- [~varun_saxena] could you take a look? > TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails > intermittently on branch-2.8 > -- > > Key: YARN-4380 > URL: https://issues.apache.org/jira/browse/YARN-4380 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Tsuyoshi Ozawa > > {quote} > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.109 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > deletionService.delete( > "user0", > null, > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > Actual invocation has different arguments: > deletionService.delete( > "user0", > > /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42 > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4384) updateNodeResource CLI should not accept negative values for resource
[ https://issues.apache.org/jira/browse/YARN-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du reassigned YARN-4384: Assignee: Junping Du > updateNodeResource CLI should not accept negative values for resource > - > > Key: YARN-4384 > URL: https://issues.apache.org/jira/browse/YARN-4384 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Sushmitha Sreenivasan >Assignee: Junping Du > Fix For: 2.8.0 > > > updateNodeResource CLI should not accept negative values for MemSize and > vCores. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4384) updateNodeResource CLI should not accept negative values for resource
Sushmitha Sreenivasan created YARN-4384: --- Summary: updateNodeResource CLI should not accept negative values for resource Key: YARN-4384 URL: https://issues.apache.org/jira/browse/YARN-4384 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.8.0 Reporter: Sushmitha Sreenivasan Fix For: 2.8.0 updateNodeResource CLI should not accept negative values for MemSize and vCores. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3127) Avoid timeline events during RM recovery or restart
[ https://issues.apache.org/jira/browse/YARN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022310#comment-15022310 ] Hadoop QA commented on YARN-3127: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 48s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 43s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_85 with JDK v1.7.0_85 generated 1 new issues (was 2, now 2). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s {color} | {color:red} Patch generated 1 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 147, now 148). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 15s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 26s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 138m 10s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | JDK v1.7.0_85 Failed junit tests | hadoop.yarn.server.resour
[jira] [Commented] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS
[ https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022176#comment-15022176 ] Hadoop QA commented on YARN-3946: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 59s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s {color} | {color:red} Patch generated 15 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 655, now 666). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 55s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 51s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 183m 52s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimitsByPartition | | JDK v1.8.0_66 Timed out junit tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue | | JDK v1.7.0_85 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop
[jira] [Commented] (YARN-4132) Nodemanagers should try harder to connect to the RM
[ https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022161#comment-15022161 ] Chang Li commented on YARN-4132: TestWebApp is tracked by YARN-4379, not related to my change. [~jlowe], please help review the updated patch, thanks! > Nodemanagers should try harder to connect to the RM > --- > > Key: YARN-4132 > URL: https://issues.apache.org/jira/browse/YARN-4132 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4132.2.patch, YARN-4132.3.patch, YARN-4132.4.patch, > YARN-4132.5.patch, YARN-4132.6.2.patch, YARN-4132.6.patch, YARN-4132.7.patch, > YARN-4132.patch > > > Being part of the cluster, nodemanagers should try very hard (and possibly > never give up) to connect to a resourcemanager. Minimally we should have a > separate config to set how aggressively a nodemanager will connect to the RM > separate from what clients will do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4298) Fix findbugs warnings in hadoop-yarn-common
[ https://issues.apache.org/jira/browse/YARN-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022089#comment-15022089 ] Sunil G commented on YARN-4298: --- Thanks [~varun_saxena] !! Even after that fix, same warnings are shown. I will have a re-look in this now by locally verifying it. > Fix findbugs warnings in hadoop-yarn-common > --- > > Key: YARN-4298 > URL: https://issues.apache.org/jira/browse/YARN-4298 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Sunil G >Priority: Minor > Attachments: 0001-YARN-4298.patch, 0002-YARN-4298.patch > > > {noformat} > classname='org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl'> > category='MT_CORRECTNESS' message='Inconsistent synchronization of > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.builder; > locked 95% of time' lineNumber='390'/> > category='MT_CORRECTNESS' message='Inconsistent synchronization of > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.proto; > locked 94% of time' lineNumber='390'/> > category='MT_CORRECTNESS' message='Inconsistent synchronization of > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.viaProto; > locked 94% of time' lineNumber='390'/> > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container
[ https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022087#comment-15022087 ] Junping Du commented on YARN-4381: -- [~linyiqun], thank you for contributing the patch for YARN project. I just add u to yarn contributor and assign this JIRA to you. > Add container launchEvent and container localizeFailed metrics in container > --- > > Key: YARN-4381 > URL: https://issues.apache.org/jira/browse/YARN-4381 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4381.001.patch > > > Recently, I found a issue on nodemanager metrics.That's > {{NodeManagerMetrics#containersLaunched}} is not actually means the container > succeed launched times.Because in some time, it will be failed when receiving > the killing command or happening container-localizationFailed.This will lead > to a failed container.But now,this counter value will be increased in these > code whenever the container is started successfully or failed. > {code} > Credentials credentials = parseCredentials(launchContext); > Container container = > new ContainerImpl(getConfig(), this.dispatcher, > context.getNMStateStore(), launchContext, > credentials, metrics, containerTokenIdentifier); > ApplicationId applicationID = > containerId.getApplicationAttemptId().getApplicationId(); > if (context.getContainers().putIfAbsent(containerId, container) != null) { > NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER, > "ContainerManagerImpl", "Container already running on this node!", > applicationID, containerId); > throw RPCUtil.getRemoteException("Container " + containerIdStr > + " already is running on this node!!"); > } > this.readLock.lock(); > try { > if (!serviceStopped) { > // Create the application > Application application = > new ApplicationImpl(dispatcher, user, applicationID, credentials, > context); > if (null == context.getApplications().putIfAbsent(applicationID, > application)) { > LOG.info("Creating a new application reference for app " + > applicationID); > LogAggregationContext logAggregationContext = > containerTokenIdentifier.getLogAggregationContext(); > Map appAcls = > container.getLaunchContext().getApplicationACLs(); > context.getNMStateStore().storeApplication(applicationID, > buildAppProto(applicationID, user, credentials, appAcls, > logAggregationContext)); > dispatcher.getEventHandler().handle( > new ApplicationInitEvent(applicationID, appAcls, > logAggregationContext)); > } > this.context.getNMStateStore().storeContainer(containerId, request); > dispatcher.getEventHandler().handle( > new ApplicationContainerInitEvent(container)); > > this.context.getContainerTokenSecretManager().startContainerSuccessful( > containerTokenIdentifier); > NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER, > "ContainerManageImpl", applicationID, containerId); > // TODO launchedContainer misplaced -> doesn't necessarily mean a > container > // launch. A finished Application will not launch containers. > metrics.launchedContainer(); > metrics.allocateContainer(containerTokenIdentifier.getResource()); > } else { > throw new YarnException( > "Container start failed as the NodeManager is " + > "in the process of shutting down"); > } > {code} > In addition, we are lack of localzationFailed metric in container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container
[ https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4381: - Assignee: Lin Yiqun > Add container launchEvent and container localizeFailed metrics in container > --- > > Key: YARN-4381 > URL: https://issues.apache.org/jira/browse/YARN-4381 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4381.001.patch > > > Recently, I found a issue on nodemanager metrics.That's > {{NodeManagerMetrics#containersLaunched}} is not actually means the container > succeed launched times.Because in some time, it will be failed when receiving > the killing command or happening container-localizationFailed.This will lead > to a failed container.But now,this counter value will be increased in these > code whenever the container is started successfully or failed. > {code} > Credentials credentials = parseCredentials(launchContext); > Container container = > new ContainerImpl(getConfig(), this.dispatcher, > context.getNMStateStore(), launchContext, > credentials, metrics, containerTokenIdentifier); > ApplicationId applicationID = > containerId.getApplicationAttemptId().getApplicationId(); > if (context.getContainers().putIfAbsent(containerId, container) != null) { > NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER, > "ContainerManagerImpl", "Container already running on this node!", > applicationID, containerId); > throw RPCUtil.getRemoteException("Container " + containerIdStr > + " already is running on this node!!"); > } > this.readLock.lock(); > try { > if (!serviceStopped) { > // Create the application > Application application = > new ApplicationImpl(dispatcher, user, applicationID, credentials, > context); > if (null == context.getApplications().putIfAbsent(applicationID, > application)) { > LOG.info("Creating a new application reference for app " + > applicationID); > LogAggregationContext logAggregationContext = > containerTokenIdentifier.getLogAggregationContext(); > Map appAcls = > container.getLaunchContext().getApplicationACLs(); > context.getNMStateStore().storeApplication(applicationID, > buildAppProto(applicationID, user, credentials, appAcls, > logAggregationContext)); > dispatcher.getEventHandler().handle( > new ApplicationInitEvent(applicationID, appAcls, > logAggregationContext)); > } > this.context.getNMStateStore().storeContainer(containerId, request); > dispatcher.getEventHandler().handle( > new ApplicationContainerInitEvent(container)); > > this.context.getContainerTokenSecretManager().startContainerSuccessful( > containerTokenIdentifier); > NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER, > "ContainerManageImpl", applicationID, containerId); > // TODO launchedContainer misplaced -> doesn't necessarily mean a > container > // launch. A finished Application will not launch containers. > metrics.launchedContainer(); > metrics.allocateContainer(containerTokenIdentifier.getResource()); > } else { > throw new YarnException( > "Container start failed as the NodeManager is " + > "in the process of shutting down"); > } > {code} > In addition, we are lack of localzationFailed metric in container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3127) Avoid timeline events during RM recovery or restart
[ https://issues.apache.org/jira/browse/YARN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3127: Attachment: YARN-3127.20151123-1.patch Hi [~sjlee0], [~rohithsharma] & [~xgong], i have rebased the patch can you please take a look at it. Based on this we can get YARN-4350 corrected. > Avoid timeline events during RM recovery or restart > --- > > Key: YARN-3127 > URL: https://issues.apache.org/jira/browse/YARN-3127 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, timelineserver >Affects Versions: 2.6.0, 2.7.1 > Environment: RM HA with ATS >Reporter: Bibin A Chundatt >Assignee: Naganarasimha G R >Priority: Critical > Attachments: AppTransition.png, YARN-3127.20150213-1.patch, > YARN-3127.20150329-1.patch, YARN-3127.20150624-1.patch, > YARN-3127.20151123-1.patch > > > 1.Start RM with HA and ATS configured and run some yarn applications > 2.Once applications are finished sucessfully start timeline server > 3.Now failover HA form active to standby > 4.Access timeline server URL :/applicationhistory > //Note Earlier exception was thrown when accessed. > Incomplete information is shown in the ATS web UI. i.e. attempt container and > other information is not displayed. > Also even if timeline server is started with RM, and on RM restart/ recovery > ATS events for the applications already existing in ATS are resent which is > not required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4131) Add API and CLI to kill container on given containerId
[ https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022069#comment-15022069 ] Junping Du commented on YARN-4131: -- Cool. I will keep this JIRA open until we are sure nothing else need to be added. Thanks! > Add API and CLI to kill container on given containerId > -- > > Key: YARN-4131 > URL: https://issues.apache.org/jira/browse/YARN-4131 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, client >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch, > YARN-4131-v1.1.patch, YARN-4131-v1.2.patch, YARN-4131-v1.patch > > > Per YARN-3337, we need a handy tools to kill container in some scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated
[ https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022059#comment-15022059 ] Jun Gong commented on YARN-4382: [~lachisis] Thanks for reporting the issue. Please feel free to re-assign to yourself if you starts/wants to work on it. > Container hierarchy in cgroup may remain for ever after the container have be > terminated > > > Key: YARN-4382 > URL: https://issues.apache.org/jira/browse/YARN-4382 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.2 >Reporter: lachisis >Assignee: Jun Gong > > If we use LinuxContainerExecutor to executor the containers, this question > may happens. > In the common case, when a container run, a corresponding hierarchy will be > created in cgroup dir. And when the container terminate, the hierarchy will > be delete in some seconds(this time can be configured by > yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms). > In the code, I find that, CgroupsLCEResource send a signal to kill container > process asynchronously, and in the same time, it will try to delete the > container hierarchy in configured "delete-delay-ms" times. > But if the container process be killed for seconds which large than > "delete-delay-ms" time, the container hierarchy will remain for ever. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated
[ https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong reassigned YARN-4382: -- Assignee: Jun Gong > Container hierarchy in cgroup may remain for ever after the container have be > terminated > > > Key: YARN-4382 > URL: https://issues.apache.org/jira/browse/YARN-4382 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.2 >Reporter: lachisis >Assignee: Jun Gong > > If we use LinuxContainerExecutor to executor the containers, this question > may happens. > In the common case, when a container run, a corresponding hierarchy will be > created in cgroup dir. And when the container terminate, the hierarchy will > be delete in some seconds(this time can be configured by > yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms). > In the code, I find that, CgroupsLCEResource send a signal to kill container > process asynchronously, and in the same time, it will try to delete the > container hierarchy in configured "delete-delay-ms" times. > But if the container process be killed for seconds which large than > "delete-delay-ms" time, the container hierarchy will remain for ever. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4383) TeraGen Application allows same output directory for multiple jobs
tongshiquan created YARN-4383: - Summary: TeraGen Application allows same output directory for multiple jobs Key: YARN-4383 URL: https://issues.apache.org/jira/browse/YARN-4383 Project: Hadoop YARN Issue Type: Bug Reporter: tongshiquan When Teragen is run multiple times with the same output directory, normally it should validate and fail. But some cases it may continue and cause the exceptions which results failure in job later time. I think the reason behind it is {code} org.apache.hadoop.examples.terasort.TeraOutputFormat.checkOutputSpecs(TeraOutputFormat.java) have issue, it permit the output already exists if it have only 1 kid and it's PARTITION_FILENAME {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS
[ https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022035#comment-15022035 ] Naganarasimha G R commented on YARN-3946: - Hi [~wangda], Sorry for the delay. As per the offline discussion we had concluded to # We should only record AM launch related events with the patch, so we don't need to record recover/running state. (I think you can clean am-launch-diagnostic when AM container allocated). # Event time is good, but I think we should put it in a separated JIRA. Maybe we need do some refactoring of existing diagnostic part. I have taken care about the first point and have AM launch diagnostic messages till container is assigned to the AM process. and for the second point as it was simple modification, i have captured it in this jira itself. Please check it . Also another difference from the previous patch, as i was earlier mentioning in some cases the reason why the node is not assigned was getting overwritten by the following modification in LeafQueue. {code} @@ -904,7 +919,9 @@ public synchronized CSAssignment assignContainers(Resource clusterResource, // Done return assignment; - } else if (!assignment.getSkipped()) { + } else if (assignment.getSkipped()) { +application.updateNodeDiagnostics(node); + } else { {code} hence have handled in this patch by storing this diagnostic message temporarily and clear it once message is created Also have pasted some images related to the patch. > Allow fetching exact reason as to why a submitted app is in ACCEPTED state in > CS > > > Key: YARN-3946 > URL: https://issues.apache.org/jira/browse/YARN-3946 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: Sumit Nigam >Assignee: Naganarasimha G R > Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, > YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch > > > Currently there is no direct way to get the exact reason as to why a > submitted app is still in ACCEPTED state. It should be possible to know > through RM REST API as to what aspect is not being met - say, queue limits > being reached, or core/ memory requirement not being met, or AM limit being > reached, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4131) Add API and CLI to kill container on given containerId
[ https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022013#comment-15022013 ] Steve Loughran commented on YARN-4131: -- I think we are OK ...some needs to write the chaos monkey and see what -if any- they're missing > Add API and CLI to kill container on given containerId > -- > > Key: YARN-4131 > URL: https://issues.apache.org/jira/browse/YARN-4131 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, client >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch, > YARN-4131-v1.1.patch, YARN-4131-v1.2.patch, YARN-4131-v1.patch > > > Per YARN-3337, we need a handy tools to kill container in some scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4131) Add API and CLI to kill container on given containerId
[ https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15021988#comment-15021988 ] Junping Du commented on YARN-4131: -- I think YARN-1897 cover most of this JIRA's work. [~ste...@apache.org], any gap now you think for providing chaos monkey of YARN? btw, [~adhoot], sorry for replying late as I was taking a long vacation just after your above comments and miss your comments after come back. > Add API and CLI to kill container on given containerId > -- > > Key: YARN-4131 > URL: https://issues.apache.org/jira/browse/YARN-4131 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, client >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch, > YARN-4131-v1.1.patch, YARN-4131-v1.2.patch, YARN-4131-v1.patch > > > Per YARN-3337, we need a handy tools to kill container in some scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS
[ https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3946: Attachment: YARN-3946.v1.003.patch YARN-3946.v1.003.Images.zip > Allow fetching exact reason as to why a submitted app is in ACCEPTED state in > CS > > > Key: YARN-3946 > URL: https://issues.apache.org/jira/browse/YARN-3946 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: Sumit Nigam >Assignee: Naganarasimha G R > Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, > YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch > > > Currently there is no direct way to get the exact reason as to why a > submitted app is still in ACCEPTED state. It should be possible to know > through RM REST API as to what aspect is not being met - say, queue limits > being reached, or core/ memory requirement not being met, or AM limit being > reached, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations
[ https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-4344: Attachment: YARN-4344-branch-2.6.001.patch Uploaded a version for branch-2.6 > NMs reconnecting with changed capabilities can lead to wrong cluster resource > calculations > -- > > Key: YARN-4344 > URL: https://issues.apache.org/jira/browse/YARN-4344 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1, 2.6.2 >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Critical > Attachments: YARN-4344-branch-2.6.001.patch, YARN-4344.001.patch, > YARN-4344.002.patch > > > After YARN-3802, if an NM re-connects to the RM with changed capabilities, > there can arise situations where the overall cluster resource calculation for > the cluster will be incorrect leading to inconsistencies in scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)