[jira] [Created] (YARN-7668) Remove unused variables from ContainerLocalizer
Ray Chiang created YARN-7668: Summary: Remove unused variables from ContainerLocalizer Key: YARN-7668 URL: https://issues.apache.org/jira/browse/YARN-7668 Project: Hadoop YARN Issue Type: Task Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial While figuring out something else, I found two class constants in ContainerLocalizer that look like aren't being used anymore. {noformat} public static final String OUTPUTDIR = "output"; public static final String WORKDIR = "work"; {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7521) Add some misisng @VisibleForTesting annotations
Ray Chiang created YARN-7521: Summary: Add some misisng @VisibleForTesting annotations Key: YARN-7521 URL: https://issues.apache.org/jira/browse/YARN-7521 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial While reviewing some other code, I ran into a few places where the @VisibleForTesting annotation should be placed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6142) Support rolling upgrade between 2.x and 3.x
[ https://issues.apache.org/jira/browse/YARN-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang resolved YARN-6142. -- Resolution: Information Provided Fix Version/s: 3.0.0 Protobuf and JACC analysis done. Will continue rolling upgrade reviews at HDFS-11096. > Support rolling upgrade between 2.x and 3.x > --- > > Key: YARN-6142 > URL: https://issues.apache.org/jira/browse/YARN-6142 > Project: Hadoop YARN > Issue Type: Task > Components: rolling upgrade >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Assignee: Ray Chiang >Priority: Blocker > Fix For: 3.0.0 > > > Counterpart JIRA to HDFS-11096. We need to: > * examine YARN and MR's JACC report for binary and source incompatibilities > * run the [PB > differ|https://issues.apache.org/jira/browse/HDFS-11096?focusedCommentId=15816405=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15816405] > that Sean wrote for HDFS-11096 for the YARN PBs. > * sanity test some rolling upgrades between 2.x and 3.x. Ideally these are > automated and something we can run upstream. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7322) Remove annotations from org.apache.hadoop.yarn.server classes
Ray Chiang created YARN-7322: Summary: Remove annotations from org.apache.hadoop.yarn.server classes Key: YARN-7322 URL: https://issues.apache.org/jira/browse/YARN-7322 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.0.0-beta1 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor The main hadoop pom.xml has this section in the javadoc plugin: {noformat} org.apache.hadoop.authentication*,org.apache.hadoop.mapreduce.v2.proto,org.apache.hadoop.yarn.proto,org.apache.hadoop.yarn.server*,org.apache.hadoop.yarn.webapp* {noformat} Since the package org.apache.hadoop.yarn.server is ignored, the various @ annotations should be removed from those classes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7219) Fix AllocateRequestProto difference between branch-2/branch-2.8 and trunk
Ray Chiang created YARN-7219: Summary: Fix AllocateRequestProto difference between branch-2/branch-2.8 and trunk Key: YARN-7219 URL: https://issues.apache.org/jira/browse/YARN-7219 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Affects Versions: 3.0.0-beta1 Reporter: Ray Chiang Priority: Critical For yarn_service_protos.proto, we have the following code in (branch-2.8.0, branch-2.8, branch-2) {noformat} message AllocateRequestProto { repeated ResourceRequestProto ask = 1; repeated ContainerIdProto release = 2; optional ResourceBlacklistRequestProto blacklist_request = 3; optional int32 response_id = 4; optional float progress = 5; repeated ContainerResourceIncreaseRequestProto increase_request = 6; repeated UpdateContainerRequestProto update_requests = 7; } {noformat} For yarn_service_protos.proto, we have the following code in (trunk) {noformat} message AllocateRequestProto { repeated ResourceRequestProto ask = 1; repeated ContainerIdProto release = 2; optional ResourceBlacklistRequestProto blacklist_request = 3; optional int32 response_id = 4; optional float progress = 5; repeated UpdateContainerRequestProto update_requests = 6; } {noformat} Notes * YARN-3866 was the original JIRA for container resizing. * YARN-5221 is what introduced the incompatible change. * In branch-2/branch-2.8/branch-2.8.0, this protobuf change was undone by "Addendum patch to YARN-3866: fix incompatible API change." * There was a similar API fix done in YARN-6071. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6868) Add test scope to certain entries in hadoop-yarn-server-resourcemanager pom.xml
Ray Chiang created YARN-6868: Summary: Add test scope to certain entries in hadoop-yarn-server-resourcemanager pom.xml Key: YARN-6868 URL: https://issues.apache.org/jira/browse/YARN-6868 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.0.0-beta1 Reporter: Ray Chiang Assignee: Ray Chiang The tag {noformat} test {noformat} is missing from a few entries in the pom.xml for hadoop-yarn-server-resourcemanager. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6798) NM startup failure with old state store due to version mismatch
Ray Chiang created YARN-6798: Summary: NM startup failure with old state store due to version mismatch Key: YARN-6798 URL: https://issues.apache.org/jira/browse/YARN-6798 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0-alpha4 Reporter: Ray Chiang YARN-6703 rolled back the state store version number for the RM from 2.0 to 1.4. YARN-6127 bumped the version for the NM to 3.0 private static final Version CURRENT_VERSION_INFO = Version.newInstance(3, 0); YARN-5049 bumped the version for the NM to 2.0 private static final Version CURRENT_VERSION_INFO = Version.newInstance(2, 0); During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to alpha4. {noformat} 2017-07-07 15:48:17,259 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager org.apache.hadoop.service.ServiceStateException: java.io.IOException: Incompatible version for NM state: expecting NM state version 3.0, but loading version 2.0 at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809) Caused by: java.io.IOException: Incompatible version for NM state: expecting NM state version 3.0, but loading version 2.0 at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) ... 5 more 2017-07-07 15:48:17,277 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd / {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6717) [Umbrella] API related cleanup for Hadoop 3
Ray Chiang created YARN-6717: Summary: [Umbrella] API related cleanup for Hadoop 3 Key: YARN-6717 URL: https://issues.apache.org/jira/browse/YARN-6717 Project: Hadoop YARN Issue Type: Task Reporter: Ray Chiang Assignee: Ray Chiang Creating this umbrella JIRA for tracking various API related issues that need to be properly tracked, adjusted, or documented before Hadoop 3 release. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6273) TestAMRMClient#testAllocationWithBlacklist fails intermittently
Ray Chiang created YARN-6273: Summary: TestAMRMClient#testAllocationWithBlacklist fails intermittently Key: YARN-6273 URL: https://issues.apache.org/jira/browse/YARN-6273 Project: Hadoop YARN Issue Type: Test Components: yarn Affects Versions: 3.0.0-alpha2 Reporter: Ray Chiang I'm seeing this unit test fail in trunk: testAllocationWithBlacklist(org.apache.hadoop.yarn.client.api.impl.TestAMRMClient) Time elapsed: 0.738 sec <<< FAILURE! java.lang.AssertionError: expected:<2> but was:<1> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAllocationWithBlacklist(TestAMRMClient.java:721) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6272) TestAMRMClient#testAMRMClientWithContainerResourceChange fails intermittently
Ray Chiang created YARN-6272: Summary: TestAMRMClient#testAMRMClientWithContainerResourceChange fails intermittently Key: YARN-6272 URL: https://issues.apache.org/jira/browse/YARN-6272 Project: Hadoop YARN Issue Type: Test Components: yarn Affects Versions: 3.0.0-alpha3 Reporter: Ray Chiang I'm seeing this unit test fail fairly often in trunk: testAMRMClientWithContainerResourceChange(org.apache.hadoop.yarn.client.api.impl.TestAMRMClient) Time elapsed: 5.113 sec <<< FAILURE! java.lang.AssertionError: expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:1087) at org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:963) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5230) Document FairScheduler's allowPreemptionFrom flag
[ https://issues.apache.org/jira/browse/YARN-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang resolved YARN-5230. -- Resolution: Duplicate > Document FairScheduler's allowPreemptionFrom flag > - > > Key: YARN-5230 > URL: https://issues.apache.org/jira/browse/YARN-5230 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, fairscheduler >Affects Versions: 2.9.0 >Reporter: Grant Sohn >Assignee: Karthik Kambatla >Priority: Minor > > Feature added in https://issues.apache.org/jira/browse/YARN-4462 is not > documented in the Hadoop: Fair Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5644) Define exit code for allowing NodeManager health script to mar
[ https://issues.apache.org/jira/browse/YARN-5644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang resolved YARN-5644. -- Resolution: Duplicate > Define exit code for allowing NodeManager health script to mar > -- > > Key: YARN-5644 > URL: https://issues.apache.org/jira/browse/YARN-5644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 3.0.0-alpha2 >Reporter: Ray Chiang >Assignee: Yufei Gu > Labels: supportability > > Done as a alternate design to YARN-5567. Define a specific exit code for the > health checker script (property yarn.nodemanager.health-checker.script.path) > that allows the node to be blacklisted. > As discussed in the latter part of YARN-5567, the current design requirements > are: > # Ignore all exit codes from the script > ## _except_ the newly defined error code which will mark the NodeManager as > UNHEALTHY > ## This allows any syntax or functional errors in the script to be ignored > # Upon failure (or multiple recorded failures): > ## Store the status in the metrics2 state on the NodeManager > ## Allow the RM to blacklist the NM or allow the jobs to drain -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5644) Define exit code for allowing NodeManager health script to mar
Ray Chiang created YARN-5644: Summary: Define exit code for allowing NodeManager health script to mar Key: YARN-5644 URL: https://issues.apache.org/jira/browse/YARN-5644 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 3.0.0-alpha2 Reporter: Ray Chiang Assignee: Yufei Gu Done as a alternate design to YARN-5567. Define a specific exit code for the health checker script (property yarn.nodemanager.health-checker.script.path) that allows the node to be blacklisted. As discussed in the latter part of YARN-5567, the current design requirements are: # Ignore all exit codes from the script ## _except_ the newly defined error code which will mark the NodeManager as UNHEALTHY ## This allows any syntax or functional errors in the script to be ignored # Upon failure (or multiple recorded failures): ## Store the status in the metrics2 state on the NodeManager ## Allow the RM to blacklist the NM or allow the jobs to drain -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5549) AMLauncher#createAMContainerLaunchContext() should not log the command to be launched indiscriminately
[ https://issues.apache.org/jira/browse/YARN-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang resolved YARN-5549. -- Resolution: Fixed Fix Version/s: (was: 3.0.0-alpha2) 2.8.0 Pushed to branch-2 and branch-2.8. > AMLauncher#createAMContainerLaunchContext() should not log the command to be > launched indiscriminately > -- > > Key: YARN-5549 > URL: https://issues.apache.org/jira/browse/YARN-5549 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-5549.001.patch, YARN-5549.002.patch, > YARN-5549.003.patch, YARN-5549.004.patch, YARN-5549.005.patch, > YARN-5549.006.patch, YARN-5549.branch-2.001.patch > > > The command could contain sensitive information, such as keystore passwords > or AWS credentials or other. Instead of logging it as INFO, we should log it > as DEBUG and include a property to disable logging it at all. Logging it to > a different logger would also be viable and may create a smaller > administrative footprint. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang resolved YARN-5567. -- Resolution: Fixed Hadoop Flags: Incompatible change,Reviewed (was: Reviewed) Release Note: Prior to this fix, the NodeManager will ignore any non-zero exit code for any script in the yarn.nodemanager.health-checker.script.path property. With this change, any syntax errors in the health checking script will get flagged as an error in the same fashion (likely exit code 1) that the script detecting a health issue. (was: Prior to this fix, the NodeManager will ignore any non-zero exit code for any script in the yarn.nodemanager.health-checker.script.path property.) Thanks [~andrew.wang] for the info. Thanks to [~wilfreds] for bringing up the issue and thanks again to [~yufeigu] and [~Naganarasimha] for your comments. Reverted from branch-2.8 and branch-2. Marked as incompatible. > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5595) Update documentation and Javadoc to match change to NodeHealthScriptRunner#reportHealthStatus
Ray Chiang created YARN-5595: Summary: Update documentation and Javadoc to match change to NodeHealthScriptRunner#reportHealthStatus Key: YARN-5595 URL: https://issues.apache.org/jira/browse/YARN-5595 Project: Hadoop YARN Issue Type: Bug Reporter: Ray Chiang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5564) Fix typo in .RM_SCHEDULER_RESERVATION_THRESHOLD_INCREMENT_MULTIPLE
Ray Chiang created YARN-5564: Summary: Fix typo in .RM_SCHEDULER_RESERVATION_THRESHOLD_INCREMENT_MULTIPLE Key: YARN-5564 URL: https://issues.apache.org/jira/browse/YARN-5564 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial The variable RM_SCHEDULER_RESERVATION_THRESHOLD_INCERMENT_MULTIPLE has a typo in the "INCREMENT" part. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5529) Create new DiskValidator class with metrics
Ray Chiang created YARN-5529: Summary: Create new DiskValidator class with metrics Key: YARN-5529 URL: https://issues.apache.org/jira/browse/YARN-5529 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Ray Chiang Assignee: Yufei Gu With really large clusters, the basic DiskValidator isn't sufficient for some of the less common types of disk failures. Look at a new DiskValidator that could do one or more of the following: - Add new tests to find more problems - Add new metrics to at least characterize problems that we haven't predicted -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5495) Clean up imports in CapacityScheduler
Ray Chiang created YARN-5495: Summary: Clean up imports in CapacityScheduler Key: YARN-5495 URL: https://issues.apache.org/jira/browse/YARN-5495 Project: Hadoop YARN Issue Type: Task Components: capacityscheduler Affects Versions: 3.0.0-alpha2 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial YARN-4091 swapped a bunch of org.apache.hadoop.yarn.server.resourcemanager.scheduler with the wildcard version. Assuming things haven't changed in the Style Guide, we disallow wildcards in the import. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5399) Add configuration to remember ad-hoc queues upon configuration reload
Ray Chiang created YARN-5399: Summary: Add configuration to remember ad-hoc queues upon configuration reload Key: YARN-5399 URL: https://issues.apache.org/jira/browse/YARN-5399 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Ray Chiang By default, FairScheduler detects and loads a changed configuration file. When that load happens, ad-hoc queues are not re-created. This can cause issues with those ad-hoc queues that still have jobs running. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5398) Standardize whitespace trimming and splitting in FairScheduler code
Ray Chiang created YARN-5398: Summary: Standardize whitespace trimming and splitting in FairScheduler code Key: YARN-5398 URL: https://issues.apache.org/jira/browse/YARN-5398 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Ray Chiang Assignee: Wilfred Spiegelenburg There is more trimming and splitting of whitespace (e.g. in queue names) in the FairScheduler code that needs to be standardised and use utility methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5397) [Umbrella] Usability improvements in FairScheduler
Ray Chiang created YARN-5397: Summary: [Umbrella] Usability improvements in FairScheduler Key: YARN-5397 URL: https://issues.apache.org/jira/browse/YARN-5397 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Ray Chiang Assignee: Ray Chiang Tracking a bunch of FairScheduler fixes. This includes, but is not necessarily limited to: * Usability fixes * Smaller improvements and features * De-duplicating code For preemption related fixes, use YARN-4752. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5285) Refactor common code in initScheduler across schedulers
Ray Chiang created YARN-5285: Summary: Refactor common code in initScheduler across schedulers Key: YARN-5285 URL: https://issues.apache.org/jira/browse/YARN-5285 Project: Hadoop YARN Issue Type: Sub-task Reporter: Ray Chiang Assignee: Ray Chiang initScheduler() in both CapacityScheduler and FairScheduler have some common code. Move the common code into AbstractYarnScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5284) Refactor handle across schedulers
Ray Chiang created YARN-5284: Summary: Refactor handle across schedulers Key: YARN-5284 URL: https://issues.apache.org/jira/browse/YARN-5284 Project: Hadoop YARN Issue Type: Sub-task Reporter: Ray Chiang Assignee: Ray Chiang The handle() method in both CapacityScheduler and FairScheduler have a lot of common code. With a bit of rearranging, it's possible to move all of handle() into AbstractYarnScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5283) Refactor container assignment into AbstractYarnScheduler#assignContainers
Ray Chiang created YARN-5283: Summary: Refactor container assignment into AbstractYarnScheduler#assignContainers Key: YARN-5283 URL: https://issues.apache.org/jira/browse/YARN-5283 Project: Hadoop YARN Issue Type: Sub-task Reporter: Ray Chiang Assignee: Ray Chiang CapacityScheduler#allocateContainersToNode() and FairScheduler#attemptScheduling() have some common code that can be refactored into a common abstract method like AbstractYarnScheduler#assignContainers(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5282) Fix typos in CapacityScheduler documentation
Ray Chiang created YARN-5282: Summary: Fix typos in CapacityScheduler documentation Key: YARN-5282 URL: https://issues.apache.org/jira/browse/YARN-5282 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial Found some minor typos while reading the CapacityScheduler documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5137) Make DiskChecker pluggable
Ray Chiang created YARN-5137: Summary: Make DiskChecker pluggable Key: YARN-5137 URL: https://issues.apache.org/jira/browse/YARN-5137 Project: Hadoop YARN Issue Type: Sub-task Reporter: Ray Chiang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5129) Update FairScheduler to use SchedulerHealth
Ray Chiang created YARN-5129: Summary: Update FairScheduler to use SchedulerHealth Key: YARN-5129 URL: https://issues.apache.org/jira/browse/YARN-5129 Project: Hadoop YARN Issue Type: Task Components: fairscheduler Reporter: Ray Chiang After YARN-5047, the SchedulerHealth information added in YARN-3293 is visible to FairScheduler. Add this information to the metrics and WebUI for FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5128) Investigate potential race condition in the scheuduler nodeUpdate() method
Ray Chiang created YARN-5128: Summary: Investigate potential race condition in the scheuduler nodeUpdate() method Key: YARN-5128 URL: https://issues.apache.org/jira/browse/YARN-5128 Project: Hadoop YARN Issue Type: Task Reporter: Ray Chiang This section of code exists in the various schedulers in the method nodeUpdate(): {code} // If the node is decommissioning, send an update to have the total // resource equal to the used resource, so no available resource to // schedule. // TODO: Fix possible race-condition when request comes in before // update is propagated if (nm.getState() == NodeState.DECOMMISSIONING) { this.rmContext .getDispatcher() .getEventHandler() .handle( new RMNodeResourceUpdateEvent(nm.getNodeID(), ResourceOption .newInstance(getSchedulerNode(nm.getNodeID()) .getAllocatedResource(), 0))); } {code} Investigate the TODO section. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5078) [Umbrella] NodeManager health checker improvements
Ray Chiang created YARN-5078: Summary: [Umbrella] NodeManager health checker improvements Key: YARN-5078 URL: https://issues.apache.org/jira/browse/YARN-5078 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Ray Chiang Assignee: Ray Chiang There have been a bunch of NodeManager health checker improvement requests in the past. Right now, I expect that initially there just need to be a bunch of base functionality added. The most obvious parts are: - Finding appropriate measurements of health - Storing measurements as metrics. This should allow easy comparison of good nodes and bad nodes. This should eventually lead to threshold blacklisting/whitelisting. - Adding metrics to the NodeManager UI After this basic functionality is added, we can start consider some enhanced form of NodeManager health status conditions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5047) Refactor nodeUpdate() from FairScheduler and CapacityScheduler
Ray Chiang created YARN-5047: Summary: Refactor nodeUpdate() from FairScheduler and CapacityScheduler Key: YARN-5047 URL: https://issues.apache.org/jira/browse/YARN-5047 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler, scheduler Affects Versions: 3.0.0 Reporter: Ray Chiang Assignee: Ray Chiang FairScheduler#nodeUpdate() and CapacityScheduler#nodeUpdate() have a lot of commonality in their code. See about refactoring the common parts into AbstractYARNScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5046) [Umbrella] Refactor scheduler code
Ray Chiang created YARN-5046: Summary: [Umbrella] Refactor scheduler code Key: YARN-5046 URL: https://issues.apache.org/jira/browse/YARN-5046 Project: Hadoop YARN Issue Type: Task Components: capacity scheduler, fairscheduler, resourcemanager, scheduler Affects Versions: 3.0.0 Reporter: Ray Chiang Assignee: Ray Chiang At this point in time, there are several places where code common to the schedulers can be moved from one or more of the schedulers into AbstractYARNScheduler or a related interface. Creating this umbrella JIRA to track this refactoring. In general, it is preferable to create a subtask JIRA on a per-method basis. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-4911) Bad placement policy in FairScheduler causes the RM to crash
Ray Chiang created YARN-4911: Summary: Bad placement policy in FairScheduler causes the RM to crash Key: YARN-4911 URL: https://issues.apache.org/jira/browse/YARN-4911 Project: Hadoop YARN Issue Type: Bug Components: yarn Reporter: Ray Chiang Assignee: Ray Chiang When you have a fair-scheduler.xml with the rule: and the queue okay1 doesn't exist, the following exception occurs in the RM: 2016-04-01 16:56:33,383 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ADDED to the scheduler java.lang.IllegalStateException: Should have applied a rule before reaching here at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:173) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:728) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:634) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1224) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:691) at java.lang.Thread.run(Thread.java:745) which causes the RM to crash. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4704) TestResourceManager#testResourceAllocation() fails when using FairScheduler
Ray Chiang created YARN-4704: Summary: TestResourceManager#testResourceAllocation() fails when using FairScheduler Key: YARN-4704 URL: https://issues.apache.org/jira/browse/YARN-4704 Project: Hadoop YARN Issue Type: Test Components: fairscheduler, test Affects Versions: 2.7.2 Reporter: Ray Chiang Assignee: Ray Chiang When using FairScheduler, TestResourceManager#testResourceAllocation() fails with the following error: java.lang.IllegalStateException: Trying to stop a non-running task: 1 of application application_1455833410011_0001 at org.apache.hadoop.yarn.server.resourcemanager.Task.stop(Task.java:117) at org.apache.hadoop.yarn.server.resourcemanager.Application.finishTask(Application.java:266) at org.apache.hadoop.yarn.server.resourcemanager.TestResourceManager.testResourceAllocation(TestResourceManager.java:167) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4579) Allow container directory permissions to be configurable
Ray Chiang created YARN-4579: Summary: Allow container directory permissions to be configurable Key: YARN-4579 URL: https://issues.apache.org/jira/browse/YARN-4579 Project: Hadoop YARN Issue Type: Improvement Components: yarn Affects Versions: 2.8.0 Reporter: Ray Chiang Assignee: Ray Chiang By default, container directory permissions are hardcoded to this member in DefaultContainerExecutor: static final short LOGDIR_PERM = (short)0710; There are some cases where less restrictive permissions are desired. Make this configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4569) Remove incorrect documentation on maxResources in FairScheduler
Ray Chiang created YARN-4569: Summary: Remove incorrect documentation on maxResources in FairScheduler Key: YARN-4569 URL: https://issues.apache.org/jira/browse/YARN-4569 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Ray Chiang Assignee: Ray Chiang The maxResources property states: {panel} For the single-resource fairness policy, the vcores value is ignored. {panel} This is not correct and should be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4568) Fix message when NodeManager runs into errors initializing the recovery directory
Ray Chiang created YARN-4568: Summary: Fix message when NodeManager runs into errors initializing the recovery directory Key: YARN-4568 URL: https://issues.apache.org/jira/browse/YARN-4568 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.8.0 Reporter: Ray Chiang Assignee: Ray Chiang Attachments: YARN-4568.001.patch When the NodeManager tries to initialize the recovery directory, the method NativeIO#chmod() can throw one of several Errno style exceptions. This propagates up to the top without any try/catch statement. It would be nice to have a cleaner error message in this situation (plus the original exception) to give users an idea about what part of the system has gone wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4560) Make
Ray Chiang created YARN-4560: Summary: Make Key: YARN-4560 URL: https://issues.apache.org/jira/browse/YARN-4560 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.8.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial If the YARN properties below are poorly configured: {code} yarn.scheduler.minimum-allocation-mb yarn.scheduler.maximum-allocation-mb {code} The error message that shows up in the RM is: {panel} 2016-01-07 14:47:03,711 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Invalid resource scheduler memory allocation configuration, yarn.scheduler.minimum-allocation-mb=-1, yarn.scheduler.maximum-allocation-mb=-3, min should equal greater than 0, max should be no smaller than min. at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.validateConf(FairScheduler.java:215) {panel} While it's technically correct, it's not very user friendly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4541) Change log message in LocalizedResource#handle() to DEBUG
Ray Chiang created YARN-4541: Summary: Change log message in LocalizedResource#handle() to DEBUG Key: YARN-4541 URL: https://issues.apache.org/jira/browse/YARN-4541 Project: Hadoop YARN Issue Type: Task Affects Versions: 2.8.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor This section of code can fill up a log fairly quickly. if (oldState != newState) { LOG.info("Resource " + resourcePath + (localPath != null ? "(->" + localPath + ")": "") + " transitioned from " + oldState + " to " + newState); } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart
[ https://issues.apache.org/jira/browse/YARN-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang resolved YARN-4406. -- Resolution: Duplicate > RM Web UI continues to show decommissioned nodes even after RM restart > -- > > Key: YARN-4406 > URL: https://issues.apache.org/jira/browse/YARN-4406 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Ray Chiang >Priority: Minor > > If you start up a cluster, decommission a NodeManager, and restart the RM, > the decommissioned node list will still show a positive number (1 in the case > of 1 node) and if you click on the list, it will be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4406) RM Web UI continues to show decommissioned nodes even after RM restart
Ray Chiang created YARN-4406: Summary: RM Web UI continues to show decommissioned nodes even after RM restart Key: YARN-4406 URL: https://issues.apache.org/jira/browse/YARN-4406 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Ray Chiang Priority: Minor If you start up a cluster, decommission a NodeManager, and restart the RM, the decommissioned node list will still show a positive number (1 in the case of 1 node) and if you click on the list, it will be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3912) Fix typos in hadoop-yarn-project module
Ray Chiang created YARN-3912: Summary: Fix typos in hadoop-yarn-project module Key: YARN-3912 URL: https://issues.apache.org/jira/browse/YARN-3912 Project: Hadoop YARN Issue Type: Task Affects Versions: 2.7.1 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Fix a bunch of typos in comments, strings, variable names, and method names in the hadoop-yarn-project module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3823) Fix mismatch in default values for yarn.scheduler.maximum-allocation-vcores property
Ray Chiang created YARN-3823: Summary: Fix mismatch in default values for yarn.scheduler.maximum-allocation-vcores property Key: YARN-3823 URL: https://issues.apache.org/jira/browse/YARN-3823 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor In yarn-default.xml, the property is defined as: XML Property: yarn.scheduler.maximum-allocation-vcores XML Value: 32 In YarnConfiguration.java the corresponding member variable is defined as: Config Name: DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES Config Value: 4 The Config value comes from YARN-193 and the default xml property comes from YARN-2. Should we keep it this way or should one of the values get updated? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3825) Add automatic search of default Configuration variables to TestConfigurationFieldsBase
Ray Chiang created YARN-3825: Summary: Add automatic search of default Configuration variables to TestConfigurationFieldsBase Key: YARN-3825 URL: https://issues.apache.org/jira/browse/YARN-3825 Project: Hadoop YARN Issue Type: Test Components: test Affects Versions: 2.7.0 Reporter: Ray Chiang Assignee: Ray Chiang Add functionality given a Configuration variable FOO, to at least check the xml file value against DEFAULT_FOO. Without waivers and a mapping for exceptions, this can probably never be a test method that generates actual errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3824) Fix two minor nits in member variable properties of YarnConfiguration
Ray Chiang created YARN-3824: Summary: Fix two minor nits in member variable properties of YarnConfiguration Key: YARN-3824 URL: https://issues.apache.org/jira/browse/YARN-3824 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial Attachments: YARN-3824.001.patch Two nitpicks that could be cleaned up easily: - DEFAULT_YARN_INTERMEDIATE_DATA_ENCRYPTION is defined as a java.lang.Boolean instead of a boolean primitive - DEFAULT_RM_PROXY_USER_PRIVILEGES_ENABLED is missing the final keyword -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3182) Cleanup switch statement in ApplicationMasterLauncher#handle()
Ray Chiang created YARN-3182: Summary: Cleanup switch statement in ApplicationMasterLauncher#handle() Key: YARN-3182 URL: https://issues.apache.org/jira/browse/YARN-3182 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Attachments: YARN-3182.001.patch The last case in the switch statement relies on the break coming from the default case instead of having it's own break. It's a bit dangerous for any future code modifications in this section. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3179) Update use of Iterator to Iterable
Ray Chiang created YARN-3179: Summary: Update use of Iterator to Iterable Key: YARN-3179 URL: https://issues.apache.org/jira/browse/YARN-3179 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Found these using the IntelliJ Findbugs-IDEA plugin, which uses findbugs3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3107) Update TestYarnConfigurationFields to flag missing properties in yarn-default.xml with an error
[ https://issues.apache.org/jira/browse/YARN-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang resolved YARN-3107. -- Resolution: Duplicate Merge code changes into YARN-3069 instead. Update TestYarnConfigurationFields to flag missing properties in yarn-default.xml with an error --- Key: YARN-3107 URL: https://issues.apache.org/jira/browse/YARN-3107 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: supportability Attachments: YARN-3107.001.patch TestYarnConfigurationFields currently makes sure each property in yarn-default.xml is documented in one of the YARN configuration Java classes. The reverse check can be turned on once the each YARN property is: A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3107) Update TestYarnConfigurationFields to flag missing properties in yarn-default.xml with an error
Ray Chiang created YARN-3107: Summary: Update TestYarnConfigurationFields to flag missing properties in yarn-default.xml with an error Key: YARN-3107 URL: https://issues.apache.org/jira/browse/YARN-3107 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang TestYarnConfigurationFields currently makes sure each property in yarn-default.xml is documented in one of the YARN configuration Java classes. The reverse check can be turned on once the each YARN property is: A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3105) Add generic state transition metrics to existing framework
Ray Chiang created YARN-3105: Summary: Add generic state transition metrics to existing framework Key: YARN-3105 URL: https://issues.apache.org/jira/browse/YARN-3105 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang This issue came up in the discussion of adding a container metric in YARN-2868 (and related YARN-2802). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3069) Document missing properties in yarn-default.xml
Ray Chiang created YARN-3069: Summary: Document missing properties in yarn-default.xml Key: YARN-3069 URL: https://issues.apache.org/jira/browse/YARN-3069 Project: Hadoop YARN Issue Type: Bug Reporter: Ray Chiang Assignee: Ray Chiang The following properties are currently not defined in yarn-default.xml. These properties should either be A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test Any comments for any of the properties below are welcome. org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore security.applicationhistory.protocol.acl yarn.app.container.log.backups yarn.app.container.log.dir yarn.app.container.log.filesize yarn.client.app-submission.poll-interval yarn.client.application-client-protocol.poll-timeout-ms yarn.is.minicluster yarn.log.server.url yarn.minicluster.control-resource-monitoring yarn.minicluster.fixed.ports yarn.minicluster.use-rpc yarn.node-labels.fs-store.retry-policy-spec yarn.node-labels.fs-store.root-dir yarn.node-labels.manager-class yarn.nodemanager.container-executor.os.sched.priority.adjustment yarn.nodemanager.container-monitor.process-tree.class yarn.nodemanager.disk-health-checker.enable yarn.nodemanager.docker-container-executor.image-name yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms yarn.nodemanager.linux-container-executor.group yarn.nodemanager.log.deletion-threads-count yarn.nodemanager.user-home-dir yarn.nodemanager.webapp.https.address yarn.nodemanager.webapp.spnego-keytab-file yarn.nodemanager.webapp.spnego-principal yarn.nodemanager.windows-secure-container-executor.group yarn.resourcemanager.configuration.file-system-based-store yarn.resourcemanager.delegation-token-renewer.thread-count yarn.resourcemanager.delegation.key.update-interval yarn.resourcemanager.delegation.token.max-lifetime yarn.resourcemanager.delegation.token.renew-interval yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size yarn.resourcemanager.metrics.runtime.buckets yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs yarn.resourcemanager.reservation-system.class yarn.resourcemanager.reservation-system.enable yarn.resourcemanager.reservation-system.plan.follower yarn.resourcemanager.reservation-system.planfollower.time-step yarn.resourcemanager.rm.container-allocation.expiry-interval-ms yarn.resourcemanager.webapp.spnego-keytab-file yarn.resourcemanager.webapp.spnego-principal yarn.scheduler.include-port-in-node-name yarn.timeline-service.delegation.key.update-interval yarn.timeline-service.delegation.token.max-lifetime yarn.timeline-service.delegation.token.renew-interval yarn.timeline-service.generic-application-history.enabled yarn.timeline-service.generic-application-history.fs-history-store.compression-type yarn.timeline-service.generic-application-history.fs-history-store.uri yarn.timeline-service.generic-application-history.store-class yarn.timeline-service.http-cross-origin.enabled yarn.tracking.url.generator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2957) Create unit test to automatically compare YarnConfiguration and yarn-default.xml
Ray Chiang created YARN-2957: Summary: Create unit test to automatically compare YarnConfiguration and yarn-default.xml Key: YARN-2957 URL: https://issues.apache.org/jira/browse/YARN-2957 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Create a unit test that will automatically compare the fields in YarnConfiguration and yarn-default.xml. It should throw an error if a property is missing in either the class or the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2927) YARN InMemorySCMStore properties need fixing
Ray Chiang created YARN-2927: Summary: YARN InMemorySCMStore properties need fixing Key: YARN-2927 URL: https://issues.apache.org/jira/browse/YARN-2927 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ray Chiang Assignee: Ray Chiang I see these properties in the yarn-default.xml file: yarn.sharedcache.store.in-memory.check-period-mins yarn.sharedcache.store.in-memory.initial-delay-mins yarn.sharedcache.store.in-memory.staleness-period-mins YarnConfiguration looks like it's missing some properties: public static final String SHARED_CACHE_PREFIX = yarn.sharedcache.; public static final String SCM_STORE_PREFIX = SHARED_CACHE_PREFIX + store.; public static final String IN_MEMORY_STORE_PREFIX = SHARED_CACHE_PREFIX + in-memory.; public static final String IN_MEMORY_STALENESS_PERIOD_MINS = IN_MEMORY_STORE_PREFIX + staleness-period-mins; It looks like the definition for IN_MEMORY_STORE_PREFIX should be: public static final String IN_MEMORY_STORE_PREFIX = SCM_STORE_PREFIX + in-memory.; Just to be clear, there are properties that exist in yarn-default.xml that are effectively misspelled in the *Java* file, not the .xml file. This is similar to YARN-2461 and MAPREDUCE-6087. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2868) Add metric for initial container launch time
Ray Chiang created YARN-2868: Summary: Add metric for initial container launch time Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2610) Hamlet doesn't close table tags
Ray Chiang created YARN-2610: Summary: Hamlet doesn't close table tags Key: YARN-2610 URL: https://issues.apache.org/jira/browse/YARN-2610 Project: Hadoop YARN Issue Type: Bug Reporter: Ray Chiang Assignee: Ray Chiang Revisiting a subset of MAPREDUCE-2993. The th, td, thead, tfoot, tr tags are not configured to close properly in Hamlet. While this is allowed in HTML 4.01, missing closing table tags tends to wreak havoc with a lot of HTML processors (although not usually browsers). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2460) Remove obsolete entries from yarn-default.xml
Ray Chiang created YARN-2460: Summary: Remove obsolete entries from yarn-default.xml Key: YARN-2460 URL: https://issues.apache.org/jira/browse/YARN-2460 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Ray Chiang Priority: Minor The following properties are defined in yarn-default.xml, but do not exist in YarnConfiguration. mapreduce.job.hdfs-servers mapreduce.job.jar yarn.ipc.exception.factory.class yarn.ipc.serializer.type yarn.nodemanager.aux-services.mapreduce_shuffle.class yarn.nodemanager.hostname yarn.nodemanager.resourcemanager.connect.retry_interval.secs yarn.nodemanager.resourcemanager.connect.wait.secs yarn.resourcemanager.amliveliness-monitor.interval-ms yarn.resourcemanager.application-tokens.master-key-rolling-interval-secs yarn.resourcemanager.container.liveness-monitor.interval-ms yarn.resourcemanager.nm.liveness-monitor.interval-ms yarn.timeline-service.hostname yarn.timeline-service.http-authentication.simple.anonymous.allowed yarn.timeline-service.http-authentication.type Presumably, the mapreduce.* properties are okay. Similarly, the yarn.timeline-service.* properties are for the future TimelineService. However, the rest are likely fully deprecated. Submitting bug for comment/feedback about which other properties should be kept in yarn-default.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2461) Fix PROCFS_USE_SMAPS_BASED_RSS_ENABLED property in YarnConfiguration
Ray Chiang created YARN-2461: Summary: Fix PROCFS_USE_SMAPS_BASED_RSS_ENABLED property in YarnConfiguration Key: YARN-2461 URL: https://issues.apache.org/jira/browse/YARN-2461 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Attachments: YARN-2461-01.patch The property PROCFS_USE_SMAPS_BASED_RSS_ENABLED has an extra period. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2450) Fix typos in log messages
Ray Chiang created YARN-2450: Summary: Fix typos in log messages Key: YARN-2450 URL: https://issues.apache.org/jira/browse/YARN-2450 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Ray Chiang Priority: Trivial There are a bunch of typos in log messages. HADOOP-10946 was initially created, but may have failed due to being in multiple components. Try fixing typos on a per-component basis. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-2275) When log aggregation not enabled, message should point to NM HTTP port, not IPC port
[ https://issues.apache.org/jira/browse/YARN-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang resolved YARN-2275. -- Resolution: Won't Fix Unable to fix this using a single Configuration property. Patch which hacks and uses two properties considered not acceptable. Closing this bug as won't fix. When log aggregation not enabled, message should point to NM HTTP port, not IPC port - Key: YARN-2275 URL: https://issues.apache.org/jira/browse/YARN-2275 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Ray Chiang Labels: usability Attachments: MAPREDUCE5185-01.patch When I try to get a container's logs in the JHS without log aggregation enabled, I get a message that looks like this: Aggregation is not enabled. Try the nodemanager at sandy-ThinkPad-T530:33224 This could be a lot more helpful by actually pointing the URL that would show the container logs on the NM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2284) Find missing config options in YarnConfiguration and yarn-default.xml
Ray Chiang created YARN-2284: Summary: Find missing config options in YarnConfiguration and yarn-default.xml Key: YARN-2284 URL: https://issues.apache.org/jira/browse/YARN-2284 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.4.1 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor YarnConfiguration has one set of properties. yarn-default.xml has another set of properties. Ideally, there should be an automatic way to find missing properties in either location. This is analogous to MAPREDUCE-5130, but for yarn-default.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2201) TestRMWebServicesAppsModification dependent on yarn-default.xml
Ray Chiang created YARN-2201: Summary: TestRMWebServicesAppsModification dependent on yarn-default.xml Key: YARN-2201 URL: https://issues.apache.org/jira/browse/YARN-2201 Project: Hadoop YARN Issue Type: Bug Reporter: Ray Chiang Assignee: Ray Chiang In yarn-default.xml: 1) Changing yarn.resourcemanager.scheduler.class from capacity.CapacityScheduler to fair.FairScheduler gives the error: Running org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 79.047 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification testSingleAppKillUnauthorized[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) Time elapsed: 3.22 sec FAILURE! java.lang.AssertionError: expected:Forbidden but was:Accepted at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458) 2) Changing yarn.acl.enable from false to true results in the following errors: Running org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification Tests run: 10, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 49.044 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification testSingleAppKill[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) Time elapsed: 2.986 sec FAILURE! java.lang.AssertionError: expected:Accepted but was:Unauthorized at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:287) testSingleAppKillInvalidState[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) Time elapsed: 2.258 sec FAILURE! java.lang.AssertionError: expected:Bad Request but was:Unauthorized at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidState(TestRMWebServicesAppsModification.java:369) testSingleAppKillUnauthorized[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) Time elapsed: 2.263 sec FAILURE! java.lang.AssertionError: expected:Forbidden but was:Unauthorized at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458) testSingleAppKillInvalidId[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) Time elapsed: 0.214 sec FAILURE! java.lang.AssertionError: expected:Not Found but was:Unauthorized at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidId(TestRMWebServicesAppsModification.java:482) I'm opening this JIRA as a discussion for the best way to fix this. I've got a few ideas, but I would like to get some feedback about potentially more robust ways to fix this test. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2159) allocateContainer() in SchedulerNode needs a clearer LOG.info message
Ray Chiang created YARN-2159: Summary: allocateContainer() in SchedulerNode needs a clearer LOG.info message Key: YARN-2159 URL: https://issues.apache.org/jira/browse/YARN-2159 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor This bit of code: LOG.info(Assigned container + container.getId() + of capacity + container.getResource() + on host + rmNode.getNodeAddress() + , which currently has + numContainers + containers, + getUsedResource() + used and + getAvailableResource() + available); results in a line like: 2014-05-30 16:17:43,573 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: Assigned container container_14000_0009_01_00 of capacity memory:1536, vCores:1 on host machine.host.domain.com:8041, which currently has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 available That message is fine in most cases, but looks pretty bad after the last available allocation, since it says something like vCores:0 available. Perhaps one of the following phrasings is better? - which has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 available after allocation -- This message was sent by Atlassian JIRA (v6.2#6252)