[jira] [Updated] (YARN-8817) [Submarine] In cases when user doesn't ask HDFS path while submitting job but framework requires user to set HDFS related environments
[ https://issues.apache.org/jira/browse/YARN-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-8817: - Summary: [Submarine] In cases when user doesn't ask HDFS path while submitting job but framework requires user to set HDFS related environments (was: [Submarine] In some cases HDFS is not asked by user when submit job but framework requires user to set HDFS related environments) > [Submarine] In cases when user doesn't ask HDFS path while submitting job but > framework requires user to set HDFS related environments > -- > > Key: YARN-8817 > URL: https://issues.apache.org/jira/browse/YARN-8817 > Project: Hadoop YARN > Issue Type: Sub-task > Components: submarine >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8817.001.patch > > > User who submit the job can see the error message like: > 18/09/24 23:12:58 ERROR yarnservice.YarnServiceJobSubmitter: When hdfs is > being used to read/write models/data. Followingenvs are required: 1) > DOCKER_HADOOP_HDFS_HOME= 2) > DOCKER_JAVA_HOME=. You can use --env to > pass these envars. > Exception in thread "main" java.io.IOException: Failed to detect HDFS-related > environments > Even if hdfs is not asked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5939) FSDownload leaks FileSystem resources
[ https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626804#comment-16626804 ] Hadoop QA commented on YARN-5939: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 28s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: The patch generated 0 new + 90 unchanged - 1 fixed = 90 total (was 91) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 2s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 9s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 50m 55s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-5939 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941163/YARN-5939.005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 349a37e98c4e 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 29dad7d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21961/testReport/ | | Max. process+thread count | 409 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21961/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > FSDownload leaks FileSystem
[jira] [Commented] (YARN-8815) Both RM in standby after restart(restart failure)
[ https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626802#comment-16626802 ] Rohith Sharma K S commented on YARN-8815: - +1 lgtm.. > Both RM in standby after restart(restart failure) > - > > Key: YARN-8815 > URL: https://issues.apache.org/jira/browse/YARN-8815 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Rakesh Shah >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-8815.001.patch, YARN-8815.002.patch > > > > *while running a un managed am jar and restarting the RM - RM goes into > standby* > *Below is the exception trace--* > {noformat} > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) > 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at >
[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
[ https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626792#comment-16626792 ] Rohith Sharma K S commented on YARN-8627: - bq. One thing I noticed is that only the "domainlog" file was present in these type of repeated appid directories. Other types such as summarylog/entitylog were present only in the normal expected directory structure. Interesting!! It means retrieving these entities _*had problem i.e with ACLs but unidentified*_. Can we see content of both the domain log files? I still would like to find root cause of this issue! I suspect something going wrong while moving from active to done directory. Some doubts # Does this happening in HDFS or any other Filesystem? # Does the cluster is enabled with *yarn.timeline-service.entity-group-fs-store.with-user-dir* flag? > EntityGroupFSTimelineStore hdfs done directory keeps on accumulating > > > Key: YARN-8627 > URL: https://issues.apache.org/jira/browse/YARN-8627 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-8627.001.patch, YARN-8627.002.patch > > > The EntityLogCleaner threads exits with the following ERROR every time it > runs. > {code:java} > 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268 > 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files > java.io.FileNotFoundException: File > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > does not exist. at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480) > > {code} > > Each time the thread gets scheduled, it is a different folder encountering > the error. As a result, the thread is not able to clean all the old done > directories, since it stops after this error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8816) YARN Unit Tests Fail with Ubuntu VM
[ https://issues.apache.org/jira/browse/YARN-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626785#comment-16626785 ] Akira Ajisaka commented on YARN-8816: - Okay. I'll reopen HADOOP-15764 and close this issue. > YARN Unit Tests Fail with Ubuntu VM > --- > > Key: YARN-8816 > URL: https://issues.apache.org/jira/browse/YARN-8816 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: Akira Ajisaka >Priority: Major > Attachments: YARN-8816.01.patch > > > {code} > Linux apache-dev 4.15.0-34-generic #37~16.04.1-Ubuntu SMP Tue Aug 28 10:44:06 > UTC 2018 x86_64 x86_64 x86_64 GNU/Linux > {code} > {code} > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 3.926 > s <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands > [ERROR] > testRemoveApplicationFromStateStoreCmdForZK(org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands) > Time elapsed: 2.668 s <<< ERROR! > java.lang.ExceptionInInitializerError > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316) > at > org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304) > at > org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1828) > at > org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:710) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:660) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:286) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.serviceInit(MockRM.java:1381) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:164) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:143) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:139) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands.testRemoveApplicationFromStateStoreCmdForZK(TestRMStoreCommands.java:79) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.security.SecurityUtil$QualifiedHostResolver.(SecurityUtil.java:593) > at >
[jira] [Updated] (YARN-5939) FSDownload leaks FileSystem resources
[ https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-5939: -- Attachment: YARN-5939.005.patch > FSDownload leaks FileSystem resources > - > > Key: YARN-5939 > URL: https://issues.apache.org/jira/browse/YARN-5939 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1, 2.7.3 >Reporter: liuxiangwei >Assignee: Weiwei Yang >Priority: Major > Labels: leak > Attachments: YARN-5939.004.patch, YARN-5939.005.patch, > YARN-5939.01.patch, YARN-5939.02.patch, YARN-5939.03.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Background > To use our self-defined FileSystem class, the item of configuration > "fs.%s.impl.disable.cache" should set to true. > In YARN's source code, the class named > "org.apache.hadoop.yarn.util.FSDownload" use getFileSystem but never close, > which leading to file descriptor leak because our self-defined FileSystem > class close the file descriptor when the close function is invoked. > My Question below: > 1. whether invoking "getFileSystem" but never close is YARN's expected > behavior > 2. what should we do in our self-defined FileSystem resolve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5939) FSDownload leaks FileSystem resources
[ https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626770#comment-16626770 ] Weiwei Yang commented on YARN-5939: --- Hi [~bsteinbach] Appreciate your help to rebase it. The wrapper class was to simply extend \{{LocalFileSystem}} and do a count on every initialize and close call. As we are expecting file system should be always closed to avoid leaking resource. This is only used in the test class. > FSDownload leaks FileSystem resources > - > > Key: YARN-5939 > URL: https://issues.apache.org/jira/browse/YARN-5939 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1, 2.7.3 >Reporter: liuxiangwei >Assignee: Weiwei Yang >Priority: Major > Labels: leak > Attachments: YARN-5939.004.patch, YARN-5939.01.patch, > YARN-5939.02.patch, YARN-5939.03.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Background > To use our self-defined FileSystem class, the item of configuration > "fs.%s.impl.disable.cache" should set to true. > In YARN's source code, the class named > "org.apache.hadoop.yarn.util.FSDownload" use getFileSystem but never close, > which leading to file descriptor leak because our self-defined FileSystem > class close the file descriptor when the close function is invoked. > My Question below: > 1. whether invoking "getFileSystem" but never close is YARN's expected > behavior > 2. what should we do in our self-defined FileSystem resolve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8816) YARN Unit Tests Fail with Ubuntu VM
[ https://issues.apache.org/jira/browse/YARN-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626765#comment-16626765 ] Bibin A Chundatt commented on YARN-8816: [~ajisakaa] I think its better to add as addendum patch for HADOOP-15764 ?? > YARN Unit Tests Fail with Ubuntu VM > --- > > Key: YARN-8816 > URL: https://issues.apache.org/jira/browse/YARN-8816 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: Akira Ajisaka >Priority: Major > Attachments: YARN-8816.01.patch > > > {code} > Linux apache-dev 4.15.0-34-generic #37~16.04.1-Ubuntu SMP Tue Aug 28 10:44:06 > UTC 2018 x86_64 x86_64 x86_64 GNU/Linux > {code} > {code} > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 3.926 > s <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands > [ERROR] > testRemoveApplicationFromStateStoreCmdForZK(org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands) > Time elapsed: 2.668 s <<< ERROR! > java.lang.ExceptionInInitializerError > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316) > at > org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304) > at > org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1828) > at > org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:710) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:660) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:286) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.serviceInit(MockRM.java:1381) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:164) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:143) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:139) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands.testRemoveApplicationFromStateStoreCmdForZK(TestRMStoreCommands.java:79) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413) > Caused by: java.lang.NullPointerException > at >
[jira] [Commented] (YARN-8657) User limit calculation should be read-lock-protected within LeafQueue
[ https://issues.apache.org/jira/browse/YARN-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626764#comment-16626764 ] Weiwei Yang commented on YARN-8657: --- Hi [~sunilg], It seems the UT failure was related, I tried that locally, seems reproducible. Can you pls check? > User limit calculation should be read-lock-protected within LeafQueue > - > > Key: YARN-8657 > URL: https://issues.apache.org/jira/browse/YARN-8657 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8657.001.patch, YARN-8657.002.patch > > > When async scheduling is enabled, user limit calculation could be wrong: > It is possible that scheduler calculated a user_limit, but inside > {{canAssignToUser}} it becomes staled. > We need to protect user limit calculation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8816) YARN Unit Tests Fail with Ubuntu VM
[ https://issues.apache.org/jira/browse/YARN-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated YARN-8816: Attachment: YARN-8816.01.patch > YARN Unit Tests Fail with Ubuntu VM > --- > > Key: YARN-8816 > URL: https://issues.apache.org/jira/browse/YARN-8816 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: Akira Ajisaka >Priority: Major > Attachments: YARN-8816.01.patch > > > {code} > Linux apache-dev 4.15.0-34-generic #37~16.04.1-Ubuntu SMP Tue Aug 28 10:44:06 > UTC 2018 x86_64 x86_64 x86_64 GNU/Linux > {code} > {code} > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 3.926 > s <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands > [ERROR] > testRemoveApplicationFromStateStoreCmdForZK(org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands) > Time elapsed: 2.668 s <<< ERROR! > java.lang.ExceptionInInitializerError > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316) > at > org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304) > at > org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1828) > at > org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:710) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:660) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:286) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.serviceInit(MockRM.java:1381) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:164) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:143) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:139) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands.testRemoveApplicationFromStateStoreCmdForZK(TestRMStoreCommands.java:79) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.security.SecurityUtil$QualifiedHostResolver.(SecurityUtil.java:593) > at >
[jira] [Updated] (YARN-8815) Both RM in standby after restart(restart failure)
[ https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8815: --- Attachment: YARN-8815.002.patch > Both RM in standby after restart(restart failure) > - > > Key: YARN-8815 > URL: https://issues.apache.org/jira/browse/YARN-8815 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Rakesh Shah >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-8815.001.patch, YARN-8815.002.patch > > > > *while running a un managed am jar and restarting the RM - RM goes into > standby* > *Below is the exception trace--* > {noformat} > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) > 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at >
[jira] [Assigned] (YARN-8816) YARN Unit Tests Fail with Ubuntu VM
[ https://issues.apache.org/jira/browse/YARN-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka reassigned YARN-8816: --- Assignee: Akira Ajisaka > YARN Unit Tests Fail with Ubuntu VM > --- > > Key: YARN-8816 > URL: https://issues.apache.org/jira/browse/YARN-8816 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: Akira Ajisaka >Priority: Major > > {code} > Linux apache-dev 4.15.0-34-generic #37~16.04.1-Ubuntu SMP Tue Aug 28 10:44:06 > UTC 2018 x86_64 x86_64 x86_64 GNU/Linux > {code} > {code} > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 3.926 > s <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands > [ERROR] > testRemoveApplicationFromStateStoreCmdForZK(org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands) > Time elapsed: 2.668 s <<< ERROR! > java.lang.ExceptionInInitializerError > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316) > at > org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304) > at > org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1828) > at > org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:710) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:660) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:286) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.serviceInit(MockRM.java:1381) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:164) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:143) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:139) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands.testRemoveApplicationFromStateStoreCmdForZK(TestRMStoreCommands.java:79) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.security.SecurityUtil$QualifiedHostResolver.(SecurityUtil.java:593) > at > org.apache.hadoop.security.SecurityUtil.setTokenServiceUseIp(SecurityUtil.java:129) > at >
[jira] [Updated] (YARN-8815) Both RM in standby after restart(restart failure)
[ https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8815: --- Priority: Critical (was: Major) > Both RM in standby after restart(restart failure) > - > > Key: YARN-8815 > URL: https://issues.apache.org/jira/browse/YARN-8815 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Rakesh Shah >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-8815.001.patch > > > > *while running a un managed am jar and restarting the RM - RM goes into > standby* > *Below is the exception trace--* > {noformat} > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) > 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at >
[jira] [Commented] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626714#comment-16626714 ] Weiwei Yang commented on YARN-8468: --- Hi [~haibochen]/[~bsteinbach] {quote}{{ApplicationMasterProtocol}} is the connection path between AMs and RM for non-AM container requests. For AM-container requests, the normalization is done by RMAppManager. {quote} That's correct. What I was not clear about was why we have changes in (*#1*) RMAppManager({{ApplicationClientProtocol}}), DefaultAMSProcessor and PlacementConstraintProcessor ({{ApplicationMasterProtocol}}), but there are changes in (*#2*) {{FS/CS#allocate}}. I thought we could do just #1 or #2, not both ... Given the situation now, all #1 seems are done, #2 is not completed (see my last comment for CS side, it doesn't do normalization against queue level max-resource nor the updated requests), can we enforce the validation & normalization for #1 (this also helps to reduce the overhead in scheduler). So I am thinking if we could do * Revert the API changes in YarnScheduler, AbstractYarnScheduler * Refine the changes in PlacementConstraintProcessor, RMAppManager changes like {code:java} this.scheduler.getNormalizedResource(reqResource, maxAllocation); {code} can be replaced with utility methods: {code:java} SchedulerUtils.getNormalizedResource(...) // or RMServerUtils.normalizeAndValidateRequests(...){code} would that simplify the patch? Just a thought, things around normalization seem to be over-complex now, trying to sort out if we can do minimal changes to get this work. Please let me know if that makes sense, thanks. > Enable the use of queue based maximum container allocation limit and > implement it in FairScheduler > -- > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Attachments: YARN-8468.000.patch, YARN-8468.001.patch, > YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, > YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, > YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, > YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch, > YARN-8468.014.patch, YARN-8468.015.patch, YARN-8468.016.patch, > YARN-8468.017.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > The goal of this ticket is to allow this value to be set on a per queue basis. > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > Suggested solution: > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability(String queueName) in both > FSParentQueue and FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * Enforce the use of queue based maximum allocation limit if it is > available, if not use the general scheduler level setting > ** Use it during validation and normalization of requests in > scheduler.allocate, app submit and resource request -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8815) Both RM in standby after restart(restart failure)
[ https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh Shah updated YARN-8815: -- Priority: Major (was: Critical) > Both RM in standby after restart(restart failure) > - > > Key: YARN-8815 > URL: https://issues.apache.org/jira/browse/YARN-8815 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Rakesh Shah >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8815.001.patch > > > > *while running a un managed am jar and restarting the RM - RM goes into > standby* > *Below is the exception trace--* > {noformat} > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) > 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at >
[jira] [Updated] (YARN-8815) Both RM in standby after restart(restart failure)
[ https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8815: --- Target Version/s: 3.2.0, 3.1.2 (was: 3.2.0) > Both RM in standby after restart(restart failure) > - > > Key: YARN-8815 > URL: https://issues.apache.org/jira/browse/YARN-8815 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Rakesh Shah >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8815.001.patch > > > > *while running a un managed am jar and restarting the RM - RM goes into > standby* > *Below is the exception trace--* > {noformat} > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) > 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at >
[jira] [Commented] (YARN-8817) [Submarine] In some cases HDFS is not asked by user when submit job but framework requires user to set HDFS related environments
[ https://issues.apache.org/jira/browse/YARN-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626694#comment-16626694 ] Sunil Govindan commented on YARN-8817: -- Thanks [~leftnoteasy]. Fix looks straight forward. Committing shortly. > [Submarine] In some cases HDFS is not asked by user when submit job but > framework requires user to set HDFS related environments > > > Key: YARN-8817 > URL: https://issues.apache.org/jira/browse/YARN-8817 > Project: Hadoop YARN > Issue Type: Sub-task > Components: submarine >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8817.001.patch > > > User who submit the job can see the error message like: > 18/09/24 23:12:58 ERROR yarnservice.YarnServiceJobSubmitter: When hdfs is > being used to read/write models/data. Followingenvs are required: 1) > DOCKER_HADOOP_HDFS_HOME= 2) > DOCKER_JAVA_HOME=. You can use --env to > pass these envars. > Exception in thread "main" java.io.IOException: Failed to detect HDFS-related > environments > Even if hdfs is not asked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.7.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch, YARN-8789.3.patch, > YARN-8789.4.patch, YARN-8789.5.patch, YARN-8789.6.patch, YARN-8789.7.patch, > YARN-8789.7.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. > Logging Message: > Size of event-queue is xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626680#comment-16626680 ] Hadoop QA commented on YARN-8789: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 12 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 52s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 35s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 22s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 58s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 41s{color} | {color:orange} root: The patch generated 8 new + 889 unchanged - 11 fixed = 897 total (was 900) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 23s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 48s{color} | {color:red} hadoop-common-project/hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 39s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 3s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 41s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}192m 59s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 19s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 47s{color} | {color:red} hadoop-mapreduce-client-app in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}323m 49s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-common-project/hadoop-common | | | Dead store to useIp in org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(Configuration) At
[jira] [Commented] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626675#comment-16626675 ] Hadoop QA commented on YARN-8789: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 12 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 50s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 21m 24s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 34s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 45s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 39s{color} | {color:orange} root: The patch generated 8 new + 889 unchanged - 11 fixed = 897 total (was 900) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 44s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 52s{color} | {color:red} hadoop-common-project/hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 46s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 14s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 40s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}190m 31s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 5s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 8s{color} | {color:red} hadoop-mapreduce-client-app in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 54s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}333m 5s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-common-project/hadoop-common | | | Dead store to useIp in org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(Configuration) At
[jira] [Assigned] (YARN-8815) Both RM in standby after restart(restart failure)
[ https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt reassigned YARN-8815: -- Assignee: Bibin A Chundatt > Both RM in standby after restart(restart failure) > - > > Key: YARN-8815 > URL: https://issues.apache.org/jira/browse/YARN-8815 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Rakesh Shah >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-8815.001.patch > > > > *while running a un managed am jar and restarting the RM - RM goes into > standby* > *Below is the exception trace--* > {noformat} > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) > 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at >
[jira] [Commented] (YARN-8800) Updated documentation of Submarine with latest examples.
[ https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626672#comment-16626672 ] Hadoop QA commented on YARN-8800: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 54s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 46s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} pylint {color} | {color:orange} 0m 8s{color} | {color:orange} The patch generated 74 new + 0 unchanged - 0 fixed = 74 total (was 0) {color} | | {color:red}-1{color} | {color:red} shellcheck {color} | {color:red} 0m 0s{color} | {color:red} The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 33s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 25 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 1s{color} | {color:red} The patch 5 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 4s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 55s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s{color} | {color:green} hadoop-project in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 47s{color} | {color:green} hadoop-yarn-submarine in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 48s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 88m 21s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8800 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941133/YARN-8800.002.patch | | Optional Tests | dupname asflicense mvnsite xml compile javac javadoc mvninstall unit shadedclient shellcheck shelldocs pylint | | uname | Linux 266a81df1c77
[jira] [Commented] (YARN-8758) PreemptionMessage when using AMRMClientAsync
[ https://issues.apache.org/jira/browse/YARN-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626660#comment-16626660 ] Hadoop QA commented on YARN-8758: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 16s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 16s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: The patch generated 1 new + 13 unchanged - 0 fixed = 14 total (was 13) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 38s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 24m 50s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 76m 38s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8758 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941126/YARN-8758.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux bb9963491f93 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 29dad7d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/21956/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21956/testReport/ | | Max. process+thread count | 684 (vs. ulimit of 1) | | modules | C:
[jira] [Commented] (YARN-8817) [Submarine] In some cases HDFS is not asked by user when submit job but framework requires user to set HDFS related environments
[ https://issues.apache.org/jira/browse/YARN-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626619#comment-16626619 ] Wangda Tan commented on YARN-8817: -- Verified this in a cluster. > [Submarine] In some cases HDFS is not asked by user when submit job but > framework requires user to set HDFS related environments > > > Key: YARN-8817 > URL: https://issues.apache.org/jira/browse/YARN-8817 > Project: Hadoop YARN > Issue Type: Sub-task > Components: submarine >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8817.001.patch > > > User who submit the job can see the error message like: > 18/09/24 23:12:58 ERROR yarnservice.YarnServiceJobSubmitter: When hdfs is > being used to read/write models/data. Followingenvs are required: 1) > DOCKER_HADOOP_HDFS_HOME= 2) > DOCKER_JAVA_HOME=. You can use --env to > pass these envars. > Exception in thread "main" java.io.IOException: Failed to detect HDFS-related > environments > Even if hdfs is not asked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8800) Updated documentation of Submarine with latest examples.
[ https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626618#comment-16626618 ] Wangda Tan commented on YARN-8800: -- Attached ver.2 patch includes notebooks examples as well. > Updated documentation of Submarine with latest examples. > > > Key: YARN-8800 > URL: https://issues.apache.org/jira/browse/YARN-8800 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8800.001.patch, YARN-8800.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8817) [Submarine] In some cases HDFS is not asked by user when submit job but framework requires user to set HDFS related environments
[ https://issues.apache.org/jira/browse/YARN-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626617#comment-16626617 ] Hadoop QA commented on YARN-8817: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 33s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 32s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 31s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 34s{color} | {color:green} hadoop-yarn-submarine in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 50m 12s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8817 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941121/YARN-8817.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f796a129781b 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 29dad7d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/21955/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-submarine-warnings.html | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21955/testReport/ | | Max. process+thread count | 395 (vs. ulimit of 1) | | modules | C:
[jira] [Updated] (YARN-8800) Updated documentation of Submarine with latest examples.
[ https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8800: - Attachment: YARN-8800.002.patch > Updated documentation of Submarine with latest examples. > > > Key: YARN-8800 > URL: https://issues.apache.org/jira/browse/YARN-8800 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8800.001.patch, YARN-8800.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8758) PreemptionMessage when using AMRMClientAsync
[ https://issues.apache.org/jira/browse/YARN-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626601#comment-16626601 ] Zian Chen commented on YARN-8758: - Hi [~sunilg] [~weiweiyagn666], could you help review the patch? Thanks > PreemptionMessage when using AMRMClientAsync > > > Key: YARN-8758 > URL: https://issues.apache.org/jira/browse/YARN-8758 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.1.1 >Reporter: Krishna Kishore >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8758.001.patch > > > Hi, > The preemption notification messages sent in the time period defined by > the following parameter now work only on AMRMClient, but not on > AMRMClientAsync. > *yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill* > We want this work on the AMRMClientAsync also because our implementations are > based on this one. > > Thanks, > Kishore -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8758) PreemptionMessage when using AMRMClientAsync
[ https://issues.apache.org/jira/browse/YARN-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8758: Attachment: YARN-8758.001.patch > PreemptionMessage when using AMRMClientAsync > > > Key: YARN-8758 > URL: https://issues.apache.org/jira/browse/YARN-8758 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.1.1 >Reporter: Krishna Kishore >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8758.001.patch > > > Hi, > The preemption notification messages sent in the time period defined by > the following parameter now work only on AMRMClient, but not on > AMRMClientAsync. > *yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill* > We want this work on the AMRMClientAsync also because our implementations are > based on this one. > > Thanks, > Kishore -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8734) Readiness check for remote service belongs to the same user
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626598#comment-16626598 ] Billie Rinaldi commented on YARN-8734: -- Makes sense to me. Thanks, [~eyang]! > Readiness check for remote service belongs to the same user > --- > > Key: YARN-8734 > URL: https://issues.apache.org/jira/browse/YARN-8734 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: Dependency check vs.pdf, YARN-8734.001.patch, > YARN-8734.002.patch, YARN-8734.003.patch, YARN-8734.004.patch, > YARN-8734.005.patch > > > When a service is deploying, there can be remote service dependency. It > would be nice to describe ZooKeeper as a dependent service, and the service > has reached a stable state, then deploy HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8616) systemClock should be used in RMAppImpl instead of System.currentTimeMills() to be consistent
[ https://issues.apache.org/jira/browse/YARN-8616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626572#comment-16626572 ] Hudson commented on YARN-8616: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15047 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15047/]) YARN-8616. systemClock should be used in RMAppImpl instead of (haibochen: rev 29dad7d258c621a0ff3a64c595a2e32c66c59d11) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java > systemClock should be used in RMAppImpl instead of System.currentTimeMills() > to be consistent > - > > Key: YARN-8616 > URL: https://issues.apache.org/jira/browse/YARN-8616 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8616.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8817) [Submarine] In some cases HDFS is not asked by user when submit job but framework requires user to set HDFS related environments
[ https://issues.apache.org/jira/browse/YARN-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626568#comment-16626568 ] Wangda Tan commented on YARN-8817: -- The root cause is: In any case we will generate checkpoint_path, saved_model_path for jobs, by default we will use HDFS as default FS. However it is possible we will not use them. So instead of relying on checkpoint_path / saved_model_path. We will check launch command about hdfs usage. > [Submarine] In some cases HDFS is not asked by user when submit job but > framework requires user to set HDFS related environments > > > Key: YARN-8817 > URL: https://issues.apache.org/jira/browse/YARN-8817 > Project: Hadoop YARN > Issue Type: Sub-task > Components: submarine >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8817.001.patch > > > User who submit the job can see the error message like: > 18/09/24 23:12:58 ERROR yarnservice.YarnServiceJobSubmitter: When hdfs is > being used to read/write models/data. Followingenvs are required: 1) > DOCKER_HADOOP_HDFS_HOME= 2) > DOCKER_JAVA_HOME=. You can use --env to > pass these envars. > Exception in thread "main" java.io.IOException: Failed to detect HDFS-related > environments > Even if hdfs is not asked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8817) [Submarine] In some cases HDFS is not asked by user when submit job but framework requires user to set HDFS related environments
[ https://issues.apache.org/jira/browse/YARN-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626569#comment-16626569 ] Wangda Tan commented on YARN-8817: -- cc: [~sunilg] > [Submarine] In some cases HDFS is not asked by user when submit job but > framework requires user to set HDFS related environments > > > Key: YARN-8817 > URL: https://issues.apache.org/jira/browse/YARN-8817 > Project: Hadoop YARN > Issue Type: Sub-task > Components: submarine >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8817.001.patch > > > User who submit the job can see the error message like: > 18/09/24 23:12:58 ERROR yarnservice.YarnServiceJobSubmitter: When hdfs is > being used to read/write models/data. Followingenvs are required: 1) > DOCKER_HADOOP_HDFS_HOME= 2) > DOCKER_JAVA_HOME=. You can use --env to > pass these envars. > Exception in thread "main" java.io.IOException: Failed to detect HDFS-related > environments > Even if hdfs is not asked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8817) [Submarine] In some cases HDFS is not asked by user when submit job but framework requires user to set HDFS related environments
[ https://issues.apache.org/jira/browse/YARN-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8817: - Attachment: YARN-8817.001.patch > [Submarine] In some cases HDFS is not asked by user when submit job but > framework requires user to set HDFS related environments > > > Key: YARN-8817 > URL: https://issues.apache.org/jira/browse/YARN-8817 > Project: Hadoop YARN > Issue Type: Sub-task > Components: submarine >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8817.001.patch > > > User who submit the job can see the error message like: > 18/09/24 23:12:58 ERROR yarnservice.YarnServiceJobSubmitter: When hdfs is > being used to read/write models/data. Followingenvs are required: 1) > DOCKER_HADOOP_HDFS_HOME= 2) > DOCKER_JAVA_HOME=. You can use --env to > pass these envars. > Exception in thread "main" java.io.IOException: Failed to detect HDFS-related > environments > Even if hdfs is not asked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8817) [Submarine] In some cases HDFS is not asked by user when submit job but framework requires user to set HDFS related environments
[ https://issues.apache.org/jira/browse/YARN-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reassigned YARN-8817: Assignee: Wangda Tan > [Submarine] In some cases HDFS is not asked by user when submit job but > framework requires user to set HDFS related environments > > > Key: YARN-8817 > URL: https://issues.apache.org/jira/browse/YARN-8817 > Project: Hadoop YARN > Issue Type: Sub-task > Components: submarine >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8817.001.patch > > > User who submit the job can see the error message like: > 18/09/24 23:12:58 ERROR yarnservice.YarnServiceJobSubmitter: When hdfs is > being used to read/write models/data. Followingenvs are required: 1) > DOCKER_HADOOP_HDFS_HOME= 2) > DOCKER_JAVA_HOME=. You can use --env to > pass these envars. > Exception in thread "main" java.io.IOException: Failed to detect HDFS-related > environments > Even if hdfs is not asked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8734) Readiness check for remote service belongs to the same user
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626565#comment-16626565 ] Eric Yang commented on YARN-8734: - [~billie.rinaldi] Good catch on the relationship between liveness monitor and AM startup. It makes sense to do the dependency check after AM registered with RM to ensure the dependency wait doesn't get expired by RM in case the high level coordinator submit applications and expect the jobs to queue and sort out the dependencies automatically. I'll revise the patch accordingly if you agree. > Readiness check for remote service belongs to the same user > --- > > Key: YARN-8734 > URL: https://issues.apache.org/jira/browse/YARN-8734 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: Dependency check vs.pdf, YARN-8734.001.patch, > YARN-8734.002.patch, YARN-8734.003.patch, YARN-8734.004.patch, > YARN-8734.005.patch > > > When a service is deploying, there can be remote service dependency. It > would be nice to describe ZooKeeper as a dependent service, and the service > has reached a stable state, then deploy HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues
[ https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626564#comment-16626564 ] Jason Lowe commented on YARN-8804: -- Thanks for updating the patch! +1 lgtm. I'll commit this by Wednesday if there are no objections. > resourceLimits may be wrongly calculated when leaf-queue is blocked in > cluster with 3+ level queues > --- > > Key: YARN-8804 > URL: https://issues.apache.org/jira/browse/YARN-8804 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8804.001.patch, YARN-8804.002.patch, > YARN-8804.003.patch > > > This problem is due to YARN-4280, parent queue will deduct child queue's > headroom when the child queue reached its resource limit and the skipped type > is QUEUE_LIMIT, the resource limits of deepest parent queue will be correctly > calculated, but for non-deepest parent queue, its headroom may be much more > than the sum of reached-limit child queues' headroom, so that the resource > limit of non-deepest parent may be much less than its true value and block > the allocation for later queues. > To reproduce this problem with UT: > (1) Cluster has two nodes whose node resource both are <10GB, 10core> and > 3-level queues as below, among them max-capacity of "c1" is 10 and others are > all 100, so that max-capacity of queue "c1" is <2GB, 2core> > {noformat} > Root > / | \ > a bc >10 20 70 > | \ > c1 c2 > 10(max=10) 90 > {noformat} > (2) Submit app1 to queue "c1" and launch am1(resource=<1GB, 1 core>) on nm1 > (3) Submit app2 to queue "b" and launch am2(resource=<1GB, 1 core>) on nm1 > (4) app1 and app2 both ask one <2GB, 1core> containers. > (5) nm1 do 1 heartbeat > Now queue "c" has lower capacity percentage than queue "b", the allocation > sequence will be "a" -> "c" -> "b", > queue "c1" has reached queue limit so that requests of app1 should be > pending, > headroom of queue "c1" is <1GB, 1core> (=max-capacity - used), > headroom of queue "c" is <18GB, 18core> (=max-capacity - used), > after allocation for queue "c", resource limit of queue "b" will be wrongly > calculated as <2GB, 2core>, > headroom of queue "b" will be <1GB, 1core> (=resource-limit - used) > so that scheduler won't allocate one container for app2 on nm1 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8817) [Submarine] In some cases HDFS is not asked by user when submit job but framework requires user to set HDFS related environments
Wangda Tan created YARN-8817: Summary: [Submarine] In some cases HDFS is not asked by user when submit job but framework requires user to set HDFS related environments Key: YARN-8817 URL: https://issues.apache.org/jira/browse/YARN-8817 Project: Hadoop YARN Issue Type: Sub-task Components: submarine Reporter: Wangda Tan User who submit the job can see the error message like: 18/09/24 23:12:58 ERROR yarnservice.YarnServiceJobSubmitter: When hdfs is being used to read/write models/data. Followingenvs are required: 1) DOCKER_HADOOP_HDFS_HOME= 2) DOCKER_JAVA_HOME=. You can use --env to pass these envars. Exception in thread "main" java.io.IOException: Failed to detect HDFS-related environments Even if hdfs is not asked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6338) Typos in Docker docs: contains => containers
[ https://issues.apache.org/jira/browse/YARN-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626550#comment-16626550 ] Hudson commented on YARN-6338: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15046 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15046/]) YARN-6338. Typos in Docker docs: contains => containers. (Contributed by (haibochen: rev cf62ff9a6a48b97cd93b405e13b58dbbaea1925f) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md > Typos in Docker docs: contains => containers > > > Key: YARN-6338 > URL: https://issues.apache.org/jira/browse/YARN-6338 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.9.0, 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Zoltan Siegl >Priority: Minor > Labels: docs > Fix For: 3.2.0 > > Attachments: YARN-6338.001.patch > > > "allowed to request privileged contains" should be "allowed to request > privileged containers" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8616) systemClock should be used in RMAppImpl instead of System.currentTimeMills() to be consistent
[ https://issues.apache.org/jira/browse/YARN-8616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-8616: - Summary: systemClock should be used in RMAppImpl instead of System.currentTimeMills() to be consistent (was: System.currentTimeMillis() used in RMAppImpl, instead of getting value from systemClock) > systemClock should be used in RMAppImpl instead of System.currentTimeMills() > to be consistent > - > > Key: YARN-8616 > URL: https://issues.apache.org/jira/browse/YARN-8616 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8616.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8616) System.currentTimeMillis() used in RMAppImpl, instead of getting value from systemClock
[ https://issues.apache.org/jira/browse/YARN-8616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626544#comment-16626544 ] Haibo Chen commented on YARN-8616: -- +1 on the patch. Thanks [~snemeth] for the fix, commiting shortly. > System.currentTimeMillis() used in RMAppImpl, instead of getting value from > systemClock > --- > > Key: YARN-8616 > URL: https://issues.apache.org/jira/browse/YARN-8616 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8616.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6338) Typos in Docker docs: contains => containers
[ https://issues.apache.org/jira/browse/YARN-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626536#comment-16626536 ] Haibo Chen commented on YARN-6338: -- +1 on the patch. Committing it shortly. > Typos in Docker docs: contains => containers > > > Key: YARN-6338 > URL: https://issues.apache.org/jira/browse/YARN-6338 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.9.0, 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Zoltan Siegl >Priority: Minor > Labels: docs > Attachments: YARN-6338.001.patch > > > "allowed to request privileged contains" should be "allowed to request > privileged containers" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8759) Copy of "resource-types.xml" is not deleted if test fails, causes other test failures
[ https://issues.apache.org/jira/browse/YARN-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8759: - Labels: unit-test (was: ) > Copy of "resource-types.xml" is not deleted if test fails, causes other test > failures > - > > Key: YARN-8759 > URL: https://issues.apache.org/jira/browse/YARN-8759 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Major > Labels: unit-test > Attachments: YARN-8759.001.patch, YARN-8759.002.patch, > YARN-8759.003.patch, YARN-8759.004.patch > > > resource-types.xml is copied in several tests to the test machine, but it is > deleted only at the end of the test. In case the test fails the file will not > be deleted and other tests will fail, because of the wrong configuration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8734) Readiness check for remote service belongs to the same user
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8734: - Summary: Readiness check for remote service belongs to the same user (was: Readiness check for remote service) > Readiness check for remote service belongs to the same user > --- > > Key: YARN-8734 > URL: https://issues.apache.org/jira/browse/YARN-8734 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: Dependency check vs.pdf, YARN-8734.001.patch, > YARN-8734.002.patch, YARN-8734.003.patch, YARN-8734.004.patch, > YARN-8734.005.patch > > > When a service is deploying, there can be remote service dependency. It > would be nice to describe ZooKeeper as a dependent service, and the service > has reached a stable state, then deploy HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8758) PreemptionMessage when using AMRMClientAsync
[ https://issues.apache.org/jira/browse/YARN-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626513#comment-16626513 ] Zian Chen commented on YARN-8758: - I'll work on this Jira and provide an initial patch. > PreemptionMessage when using AMRMClientAsync > > > Key: YARN-8758 > URL: https://issues.apache.org/jira/browse/YARN-8758 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.1.1 >Reporter: Krishna Kishore >Priority: Major > > Hi, > The preemption notification messages sent in the time period defined by > the following parameter now work only on AMRMClient, but not on > AMRMClientAsync. > *yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill* > We want this work on the AMRMClientAsync also because our implementations are > based on this one. > > Thanks, > Kishore -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8758) PreemptionMessage when using AMRMClientAsync
[ https://issues.apache.org/jira/browse/YARN-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen reassigned YARN-8758: --- Assignee: Zian Chen > PreemptionMessage when using AMRMClientAsync > > > Key: YARN-8758 > URL: https://issues.apache.org/jira/browse/YARN-8758 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.1.1 >Reporter: Krishna Kishore >Assignee: Zian Chen >Priority: Major > > Hi, > The preemption notification messages sent in the time period defined by > the following parameter now work only on AMRMClient, but not on > AMRMClientAsync. > *yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill* > We want this work on the AMRMClientAsync also because our implementations are > based on this one. > > Thanks, > Kishore -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8758) PreemptionMessage when using AMRMClientAsync
[ https://issues.apache.org/jira/browse/YARN-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8758: - Target Version/s: 3.1.2 (was: 3.1.1) > PreemptionMessage when using AMRMClientAsync > > > Key: YARN-8758 > URL: https://issues.apache.org/jira/browse/YARN-8758 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.1.1 >Reporter: Krishna Kishore >Priority: Major > > Hi, > The preemption notification messages sent in the time period defined by > the following parameter now work only on AMRMClient, but not on > AMRMClientAsync. > *yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill* > We want this work on the AMRMClientAsync also because our implementations are > based on this one. > > Thanks, > Kishore -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8758) PreemptionMessage when using AMRMClientAsync
[ https://issues.apache.org/jira/browse/YARN-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8758: - Fix Version/s: (was: 2.7.6) > PreemptionMessage when using AMRMClientAsync > > > Key: YARN-8758 > URL: https://issues.apache.org/jira/browse/YARN-8758 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.1.1 >Reporter: Krishna Kishore >Priority: Major > > Hi, > The preemption notification messages sent in the time period defined by > the following parameter now work only on AMRMClient, but not on > AMRMClientAsync. > *yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill* > We want this work on the AMRMClientAsync also because our implementations are > based on this one. > > Thanks, > Kishore -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626455#comment-16626455 ] Chandni Singh edited comment on YARN-7644 at 9/24/18 9:14 PM: -- For {{LAUNCH_CONTAINER}}, {{RELAUNCH_CONTAINER}}, {{RECOVER_CONTAINER}}, and {{RECOVER_PAUSED_CONTAINER}}, the {{ContainersLauncher}} service creates tasks and submits it to the executor to be performed in a non-blocking way: {code:java} containerLauncher.submit(launch); {code} However, for {{CLEANUP_CONTAINER}}, {{CLEANUP_CONTAINER_FOR_REINIT}}, {{SIGNAL_CONTAINER}}, {{PAUSE_CONTAINER}}, {{RESUME_CONTAINER}}, the actions are performed in a blocking way. {code:java} launcher.cleanupContainer(); {code} With this Jira, I can focus on {{CLEANUP_CONTAINER}} and {{CLEANUP_CONTAINER_FOR_REINIT}} events to be performed in a non-blocking way. Doesn't look the caller ({{ContainerImpl}}) waits anywhere for {{cleanupContainer()}} to be performed synchronously. It is triggered by dispatching {{ContainersLauncherEventType.CLEANUP_CONTAINER}} events. cc. [~ebadger] [~jlowe] was (Author: csingh): For {{LAUNCH_CONTAINER}}, {{RELAUNCH_CONTAINER}}, {{RECOVER_CONTAINER}}, and {{RECOVER_PAUSED_CONTAINER}}, the {{ContainersLauncher}} service creates tasks and submits it to the executor to be performed in a non-blocking way: {code:java} containerLauncher.submit(launch); {code} However, for {{CLEANUP_CONTAINER}}, {{CLEANUP_CONTAINER_FOR_REINIT}}, {{SIGNAL_CONTAINER}}, {{PAUSE_CONTAINER}}, {{RESUME_CONTAINER}}, the actions are performed in a blocking way. {code:java} launcher.cleanupContainer(); {code} With this Jira, I can focus on {{CLEANUP_CONTAINER}} and {{CLEANUP_CONTAINER_FOR_REINIT}} events to be performed in a non-blocking way. Doesn't look the caller ({{ContainerImpl}}) waits anywhere for {{cleanupContainer()}} to be performed synchronously. It is triggered by dispatching {{ContainersLauncherEventType.CLEANUP_CONTAINER}} events. > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626455#comment-16626455 ] Chandni Singh commented on YARN-7644: - For {{LAUNCH_CONTAINER}}, {{RELAUNCH_CONTAINER}}, {{RECOVER_CONTAINER}}, and {{RECOVER_PAUSED_CONTAINER}}, the {{ContainersLauncher}} service creates tasks and submits it to the executor to be performed in a non-blocking way: {code:java} containerLauncher.submit(launch); {code} However, for {{CLEANUP_CONTAINER}}, {{CLEANUP_CONTAINER_FOR_REINIT}}, {{SIGNAL_CONTAINER}}, {{PAUSE_CONTAINER}}, {{RESUME_CONTAINER}}, the actions are performed in a blocking way. {code:java} launcher.cleanupContainer(); {code} With this Jira, I can focus on {{CLEANUP_CONTAINER}} and {{CLEANUP_CONTAINER_FOR_REINIT}} events to be performed in a non-blocking way. Doesn't look the caller ({{ContainerImpl}}) waits anywhere for {{cleanupContainer()}} to be performed synchronously. It is triggered by dispatching {{ContainersLauncherEventType.CLEANUP_CONTAINER}} events. > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8800) Updated documentation of Submarine with latest examples.
[ https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626448#comment-16626448 ] Hadoop QA commented on YARN-8800: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 53s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} pylint {color} | {color:orange} 0m 7s{color} | {color:orange} The patch generated 74 new + 0 unchanged - 0 fixed = 74 total (was 0) {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 0s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 17s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 21 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 42s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 30s{color} | {color:green} hadoop-yarn-submarine in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 22s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 45m 55s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8800 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941103/YARN-8800.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient shellcheck shelldocs pylint | | uname | Linux bd0dfe41f7ea 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c07715e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | shellcheck | v0.4.6 | | pylint | v1.9.2 | | pylint | https://builds.apache.org/job/PreCommit-YARN-Build/21952/artifact/out/diff-patch-pylint.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/21952/artifact/out/whitespace-eol.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21952/testReport/ | | asflicense |
[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
[ https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626418#comment-16626418 ] Wangda Tan commented on YARN-8627: -- [~rohithsharma], could u help to review the latest comment? > EntityGroupFSTimelineStore hdfs done directory keeps on accumulating > > > Key: YARN-8627 > URL: https://issues.apache.org/jira/browse/YARN-8627 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-8627.001.patch, YARN-8627.002.patch > > > The EntityLogCleaner threads exits with the following ERROR every time it > runs. > {code:java} > 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268 > 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files > java.io.FileNotFoundException: File > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > does not exist. at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480) > > {code} > > Each time the thread gets scheduled, it is a different folder encountering > the error. As a result, the thread is not able to clean all the old done > directories, since it stops after this error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.7.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch, YARN-8789.3.patch, > YARN-8789.4.patch, YARN-8789.5.patch, YARN-8789.6.patch, YARN-8789.7.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. > Logging Message: > Size of event-queue is xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8665) Yarn Service Upgrade: Support cancelling upgrade
[ https://issues.apache.org/jira/browse/YARN-8665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626399#comment-16626399 ] Hadoop QA commented on YARN-8665: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 9 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 23s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 49s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 28s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 24s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 3 new + 443 unchanged - 4 fixed = 446 total (was 447) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 40s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 17s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 56s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 25m 17s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 42s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 45s{color} | {color:green} hadoop-yarn-services-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}148m 1s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.TestNMProxy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce
[jira] [Commented] (YARN-8800) Updated documentation of Submarine with latest examples.
[ https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626345#comment-16626345 ] Wangda Tan commented on YARN-8800: -- Attached ver.1 patch, includes some screenshots that's why size of the patch is large. > Updated documentation of Submarine with latest examples. > > > Key: YARN-8800 > URL: https://issues.apache.org/jira/browse/YARN-8800 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8800.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8800) Updated documentation of Submarine with latest examples.
[ https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8800: - Attachment: YARN-8800.001.patch > Updated documentation of Submarine with latest examples. > > > Key: YARN-8800 > URL: https://issues.apache.org/jira/browse/YARN-8800 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8800.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8815) Both RM in standby after restart(restart failure)
[ https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626315#comment-16626315 ] Hadoop QA commented on YARN-8815: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 34s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 39s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 178 unchanged - 0 fixed = 179 total (was 178) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 20s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 77m 3s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}136m 4s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8815 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941075/YARN-8815.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 7057662844d3 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 62f817d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/21949/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21949/testReport/ | | Max. process+thread count | 892 (vs. ulimit of 1) | | modules | C:
[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers
[ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626288#comment-16626288 ] Haibo Chen commented on YARN-1011: -- The tests look good locally. I have pushed my local branch upstream. Let me know if you see issues [~asuresh]. > [Umbrella] Schedule containers based on utilization of currently allocated > containers > - > > Key: YARN-1011 > URL: https://issues.apache.org/jira/browse/YARN-1011 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun C Murthy >Assignee: Karthik Kambatla >Priority: Major > Attachments: patch-for-yarn-1011.patch, yarn-1011-design-v0.pdf, > yarn-1011-design-v1.pdf, yarn-1011-design-v2.pdf, yarn-1011-design-v3.pdf > > > Currently RM allocates containers and assumes resources allocated are > utilized. > RM can, and should, get to a point where it measures utilization of allocated > containers and, if appropriate, allocate more (speculative?) containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
[ https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626273#comment-16626273 ] Hudson commented on YARN-8696: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15042 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15042/]) YARN-8696. [AMRMProxy] FederationInterceptor upgrade: home sub-cluster (gifuma: rev 3090922805699b8374a359e92323884a4177dc4e) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/uam/UnmanagedAMPoolManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/AMHeartbeatRequestHandler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestFederationInterceptor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/uam/UnmanagedApplicationManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/AMRMClientUtils.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/FederationInterceptor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestableFederationInterceptor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/MockResourceManagerFacade.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/utils/FederationRegistryClient.java > [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async > --- > > Key: YARN-8696 > URL: https://issues.apache.org/jira/browse/YARN-8696 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8696-branch-2.v6.patch, YARN-8696.v1.patch, > YARN-8696.v2.patch, YARN-8696.v3.patch, YARN-8696.v4.patch, > YARN-8696.v5.patch, YARN-8696.v6.patch > > > Today in _FederationInterceptor_, the heartbeat to home sub-cluster is > synchronous. After the heartbeat is sent out to home sub-cluster, it waits > for the home response to come back before merging and returning the (merged) > heartbeat result to back AM. If home sub-cluster is suffering from connection > issues, or down during an YarnRM master-slave switch, all heartbeat threads > in _FederationInterceptor_ will be blocked waiting for home response. As a > result, the successful UAM heartbeats from secondary sub-clusters will not be > returned to AM at all. Additionally, because of the fact that we kept the > same heartbeat responseId between AM and home RM, lots of tricky handling are > needed regarding the responseId resync when it comes to > _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart > (YARN-6127, YARN-1336), home RM master-slave switch etc. > In this patch, we change the heartbeat to home sub-cluster to asynchronous, > same as the way we handle UAM heartbeats in secondaries. So that any > sub-cluster down or connection issues won't impact AM getting responses from > other sub-clusters. The responseId is also managed separately for home > sub-cluster and AM, and they increment independently. The resync logic > becomes much cleaner. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8810) Yarn Service: discrepancy between hashcode and equals of ConfigFile
[ https://issues.apache.org/jira/browse/YARN-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626266#comment-16626266 ] Hadoop QA commented on YARN-8810: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 42s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 10s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 65m 25s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8810 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941081/YARN-8810.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux e2c37fe17ebc 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8de5c92 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21951/testReport/ | | Max. process+thread count | 757 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21951/console | |
[jira] [Updated] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
[ https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8696: --- Component/s: nodemanager > [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async > --- > > Key: YARN-8696 > URL: https://issues.apache.org/jira/browse/YARN-8696 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8696-branch-2.v6.patch, YARN-8696.v1.patch, > YARN-8696.v2.patch, YARN-8696.v3.patch, YARN-8696.v4.patch, > YARN-8696.v5.patch, YARN-8696.v6.patch > > > Today in _FederationInterceptor_, the heartbeat to home sub-cluster is > synchronous. After the heartbeat is sent out to home sub-cluster, it waits > for the home response to come back before merging and returning the (merged) > heartbeat result to back AM. If home sub-cluster is suffering from connection > issues, or down during an YarnRM master-slave switch, all heartbeat threads > in _FederationInterceptor_ will be blocked waiting for home response. As a > result, the successful UAM heartbeats from secondary sub-clusters will not be > returned to AM at all. Additionally, because of the fact that we kept the > same heartbeat responseId between AM and home RM, lots of tricky handling are > needed regarding the responseId resync when it comes to > _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart > (YARN-6127, YARN-1336), home RM master-slave switch etc. > In this patch, we change the heartbeat to home sub-cluster to asynchronous, > same as the way we handle UAM heartbeats in secondaries. So that any > sub-cluster down or connection issues won't impact AM getting responses from > other sub-clusters. The responseId is also managed separately for home > sub-cluster and AM, and they increment independently. The resync logic > becomes much cleaner. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
[ https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8696: --- Fix Version/s: 3.2.0 > [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async > --- > > Key: YARN-8696 > URL: https://issues.apache.org/jira/browse/YARN-8696 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8696-branch-2.v6.patch, YARN-8696.v1.patch, > YARN-8696.v2.patch, YARN-8696.v3.patch, YARN-8696.v4.patch, > YARN-8696.v5.patch, YARN-8696.v6.patch > > > Today in _FederationInterceptor_, the heartbeat to home sub-cluster is > synchronous. After the heartbeat is sent out to home sub-cluster, it waits > for the home response to come back before merging and returning the (merged) > heartbeat result to back AM. If home sub-cluster is suffering from connection > issues, or down during an YarnRM master-slave switch, all heartbeat threads > in _FederationInterceptor_ will be blocked waiting for home response. As a > result, the successful UAM heartbeats from secondary sub-clusters will not be > returned to AM at all. Additionally, because of the fact that we kept the > same heartbeat responseId between AM and home RM, lots of tricky handling are > needed regarding the responseId resync when it comes to > _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart > (YARN-6127, YARN-1336), home RM master-slave switch etc. > In this patch, we change the heartbeat to home sub-cluster to asynchronous, > same as the way we handle UAM heartbeats in secondaries. So that any > sub-cluster down or connection issues won't impact AM getting responses from > other sub-clusters. The responseId is also managed separately for home > sub-cluster and AM, and they increment independently. The resync logic > becomes much cleaner. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
[ https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626251#comment-16626251 ] Giovanni Matteo Fumarola commented on YARN-8696: Thanks [~botong] for the patch. Committed [^YARN-8696.v6.patch] to trunk and [^YARN-8696-branch-2.v6.patch] to branch-2. > [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async > --- > > Key: YARN-8696 > URL: https://issues.apache.org/jira/browse/YARN-8696 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8696-branch-2.v6.patch, YARN-8696.v1.patch, > YARN-8696.v2.patch, YARN-8696.v3.patch, YARN-8696.v4.patch, > YARN-8696.v5.patch, YARN-8696.v6.patch > > > Today in _FederationInterceptor_, the heartbeat to home sub-cluster is > synchronous. After the heartbeat is sent out to home sub-cluster, it waits > for the home response to come back before merging and returning the (merged) > heartbeat result to back AM. If home sub-cluster is suffering from connection > issues, or down during an YarnRM master-slave switch, all heartbeat threads > in _FederationInterceptor_ will be blocked waiting for home response. As a > result, the successful UAM heartbeats from secondary sub-clusters will not be > returned to AM at all. Additionally, because of the fact that we kept the > same heartbeat responseId between AM and home RM, lots of tricky handling are > needed regarding the responseId resync when it comes to > _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart > (YARN-6127, YARN-1336), home RM master-slave switch etc. > In this patch, we change the heartbeat to home sub-cluster to asynchronous, > same as the way we handle UAM heartbeats in secondaries. So that any > sub-cluster down or connection issues won't impact AM getting responses from > other sub-clusters. The responseId is also managed separately for home > sub-cluster and AM, and they increment independently. The resync logic > becomes much cleaner. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8815) Both RM in standby after restart(restart failure)
[ https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626139#comment-16626139 ] Bibin A Chundatt edited comment on YARN-8815 at 9/24/18 6:31 PM: - {quote} finished unmanaged app? {quote} Will happen for Finished unmanaged app , since in final state the AppData will be pruned. In addition to issue mentioned in jira ,we do have impact on applicationReport and RMAppBlock display fields. [~sunilg]/[~rohithsharma] could you cross check ? was (Author: bibinchundatt): {quote} finished unmanaged app? {quote} Will happen for Finished unmanaged app , since in final state the AppData will be pruned. Addition we should be having impact on applicationReport and RMAppBlock display fields. > Both RM in standby after restart(restart failure) > - > > Key: YARN-8815 > URL: https://issues.apache.org/jira/browse/YARN-8815 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Rakesh Shah >Priority: Critical > Attachments: YARN-8815.001.patch > > > > *while running a un managed am jar and restarting the RM - RM goes into > standby* > *Below is the exception trace--* > {noformat} > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) > 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at
[jira] [Updated] (YARN-8810) Yarn Service: discrepancy between hashcode and equals of ConfigFile
[ https://issues.apache.org/jira/browse/YARN-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8810: Attachment: YARN-8810.001.patch > Yarn Service: discrepancy between hashcode and equals of ConfigFile > --- > > Key: YARN-8810 > URL: https://issues.apache.org/jira/browse/YARN-8810 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Minor > Attachments: YARN-8810.001.patch > > > The {{ConfigFile}} class {{equals}} method doesn't check the equality of > {{properties}}. The {{hashCode}} does include the {{properties}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8665) Yarn Service Upgrade: Support cancelling upgrade
[ https://issues.apache.org/jira/browse/YARN-8665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8665: Attachment: YARN-8665.005.patch > Yarn Service Upgrade: Support cancelling upgrade > - > > Key: YARN-8665 > URL: https://issues.apache.org/jira/browse/YARN-8665 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8665.001.patch, YARN-8665.002.patch, > YARN-8665.003.patch, YARN-8665.004.patch, YARN-8665.005.patch > > > When a service is upgraded without auto-finalization or express upgrade, then > the upgrade can be cancelled. This provides the user ability to test upgrade > of a single instance and if that doesn't go well, they get a chance to cancel > it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8815) Both RM in standby after restart(restart failure)
[ https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626139#comment-16626139 ] Bibin A Chundatt commented on YARN-8815: {quote} finished unmanaged app? {quote} Will happen for Finished unmanaged app , since in final state the AppData will be pruned. Addition we should be having impact on applicationReport and RMAppBlock display fields. > Both RM in standby after restart(restart failure) > - > > Key: YARN-8815 > URL: https://issues.apache.org/jira/browse/YARN-8815 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Rakesh Shah >Priority: Critical > Attachments: YARN-8815.001.patch > > > > *while running a un managed am jar and restarting the RM - RM goes into > standby* > *Below is the exception trace--* > {noformat} > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) > 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at >
[jira] [Commented] (YARN-8808) Use aggregate container utilization instead of node utilization to determine resources available for oversubscription
[ https://issues.apache.org/jira/browse/YARN-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626131#comment-16626131 ] Haibo Chen commented on YARN-8808: -- Hi [~asuresh]. Not sure if I follow you correctly. Sounds like you are referring to aggregateUtilization as the aggregate resource *ALLOCATION* of all containers, right? In the case of Fair Scheduler + oversubscription, aggregateUtilization is the aggregate resource *UTILIZATION* of all containers on a node (rather than aggregate allocation). The issue here came up in our testing configuration where only a fraction of the node's hardware resources is allowed to run containers (Say the node has 10GB memory and 10 vcores, but through configuration we allow the RM to only see 8GB and 8vcores). Therefore, I think the scheduler side should just see numbers based on the configured node capacity (8GB, 8vcores). nodeUtlization by default, is detected from OS and is therefore based on the node's actually capacity (10GB, 10vcores). aggregateUtilization/nodeUtilization would tell us how much percentage of the node utilization is attributed to running containers, no? > Use aggregate container utilization instead of node utilization to determine > resources available for oversubscription > - > > Key: YARN-8808 > URL: https://issues.apache.org/jira/browse/YARN-8808 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8088-YARN-1011.01.patch, > YARN-8808-YARN-1011.00.patch > > > Resource oversubscription should be bound to the amount of the resources that > can be allocated to containers, hence the allocation threshold should be with > respect to aggregate container utilization. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8815) Both RM in standby after restart(restart failure)
[ https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626124#comment-16626124 ] Bibin A Chundatt edited comment on YARN-8815 at 9/24/18 5:04 PM: - [~rohithsharma]/[~sunilg] During recovery, applicationSubmissionContext doesnt have info regarding app tunManaged or not. If Resource=null & amReqs is empty then above exception will be thrown from {{RMAppManager#validateAndCreateResourceRequest}} Added testcase to simulate issue at same method. was (Author: bibinchundatt): [~rohithsharma]/[~sunilg] During recovery, applicationSubmissionContext doesnt have info regarding app tunManaged or not. If resourceRequest=null & amReqs is empty then above exception will be thrown from {{RMAppManager#validateAndCreateResourceRequest}} Added testcase to simulate issue at same method. > Both RM in standby after restart(restart failure) > - > > Key: YARN-8815 > URL: https://issues.apache.org/jira/browse/YARN-8815 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Rakesh Shah >Priority: Critical > Attachments: YARN-8815.001.patch > > > > *while running a un managed am jar and restarting the RM - RM goes into > standby* > *Below is the exception trace--* > {noformat} > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) > 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at
[jira] [Comment Edited] (YARN-8815) Both RM in standby after restart(restart failure)
[ https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626124#comment-16626124 ] Bibin A Chundatt edited comment on YARN-8815 at 9/24/18 5:04 PM: - [~rohithsharma]/[~sunilg] During recovery, applicationSubmissionContext doesnt have info regarding app type is unManaged or not. If Resource=null & amReqs is empty then above exception will be thrown from {{RMAppManager#validateAndCreateResourceRequest}} Added testcase to simulate issue at same method. was (Author: bibinchundatt): [~rohithsharma]/[~sunilg] During recovery, applicationSubmissionContext doesnt have info regarding app tunManaged or not. If Resource=null & amReqs is empty then above exception will be thrown from {{RMAppManager#validateAndCreateResourceRequest}} Added testcase to simulate issue at same method. > Both RM in standby after restart(restart failure) > - > > Key: YARN-8815 > URL: https://issues.apache.org/jira/browse/YARN-8815 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Rakesh Shah >Priority: Critical > Attachments: YARN-8815.001.patch > > > > *while running a un managed am jar and restarting the RM - RM goes into > standby* > *Below is the exception trace--* > {noformat} > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) > 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at
[jira] [Commented] (YARN-8815) Both RM in standby after restart(restart failure)
[ https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626124#comment-16626124 ] Bibin A Chundatt commented on YARN-8815: [~rohithsharma]/[~sunilg] During recovery, applicationSubmissionContext doesnt have info regarding app tunManaged or not. If resourceRequest=null & amReqs is empty then above exception will be thrown from {{RMAppManager#validateAndCreateResourceRequest}} Added testcase to simulate issue at same method. > Both RM in standby after restart(restart failure) > - > > Key: YARN-8815 > URL: https://issues.apache.org/jira/browse/YARN-8815 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Rakesh Shah >Priority: Critical > Attachments: YARN-8815.001.patch > > > > *while running a un managed am jar and restarting the RM - RM goes into > standby* > *Below is the exception trace--* > {noformat} > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) > 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at >
[jira] [Updated] (YARN-8815) Both RM in standby after restart(restart failure)
[ https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8815: --- Attachment: YARN-8815.001.patch > Both RM in standby after restart(restart failure) > - > > Key: YARN-8815 > URL: https://issues.apache.org/jira/browse/YARN-8815 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Rakesh Shah >Priority: Critical > Attachments: YARN-8815.001.patch > > > > *while running a un managed am jar and restarting the RM - RM goes into > standby* > *Below is the exception trace--* > {noformat} > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) > 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at >
[jira] [Updated] (YARN-8809) Refactor AbstractYarnScheduler and CapacityScheduler OPPORTUNISTIC container completion codepaths
[ https://issues.apache.org/jira/browse/YARN-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-8809: - Description: When OPPORTUNISTIC containers are released, fair scheduler does not update the queue metrics correctly. > Refactor AbstractYarnScheduler and CapacityScheduler OPPORTUNISTIC container > completion codepaths > - > > Key: YARN-8809 > URL: https://issues.apache.org/jira/browse/YARN-8809 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8809-YARN-1011.00.patch, > YARN-8809-YARN-1011.01.patch > > > When OPPORTUNISTIC containers are released, fair scheduler does not update > the queue metrics correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers
[ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626084#comment-16626084 ] Haibo Chen commented on YARN-1011: -- Sure. I am testing my local rebased branch. Will push it once the tests finish without failures. > [Umbrella] Schedule containers based on utilization of currently allocated > containers > - > > Key: YARN-1011 > URL: https://issues.apache.org/jira/browse/YARN-1011 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun C Murthy >Assignee: Karthik Kambatla >Priority: Major > Attachments: patch-for-yarn-1011.patch, yarn-1011-design-v0.pdf, > yarn-1011-design-v1.pdf, yarn-1011-design-v2.pdf, yarn-1011-design-v3.pdf > > > Currently RM allocates containers and assumes resources allocated are > utilized. > RM can, and should, get to a point where it measures utilization of allocated > containers and, if appropriate, allocate more (speculative?) containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626081#comment-16626081 ] Haibo Chen commented on YARN-8468: -- [~cheersyang] The reason why normalization is done in scheduler is that we introduced a queue-level configuration here and queues are scheduler-dependent concepts. {{ApplicationMasterProtocol}} is the connection path between AMs and RM for non-AM container requests. For AM-container requests, the normalization is done by RMAppManager. I hope that clarifies your question. > Enable the use of queue based maximum container allocation limit and > implement it in FairScheduler > -- > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Attachments: YARN-8468.000.patch, YARN-8468.001.patch, > YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, > YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, > YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, > YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch, > YARN-8468.014.patch, YARN-8468.015.patch, YARN-8468.016.patch, > YARN-8468.017.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > The goal of this ticket is to allow this value to be set on a per queue basis. > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > Suggested solution: > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability(String queueName) in both > FSParentQueue and FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * Enforce the use of queue based maximum allocation limit if it is > available, if not use the general scheduler level setting > ** Use it during validation and normalization of requests in > scheduler.allocate, app submit and resource request -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8811) Support Container Storage Interface (CSI) in YARN
[ https://issues.apache.org/jira/browse/YARN-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626064#comment-16626064 ] Eric Yang commented on YARN-8811: - [~cheersyang] {quote}For the comment about object store user API key information, I am not sure about this point, could you please elaborate.{quote} In [Hadoop AWS Integration document|https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html], we need to specify fs.s3a.access.key and fs.s3a.secret.key to connect to aws account. The same principal applies to swift fs, and other object store. Therefore, the CSI specification should include ability to pass the key information to connect to object store. In the current specification, it is also missing source storage information. Propagation options are required to make sure multiple mount of the same source storage system can be shared or exclusive mount. Without this defined, it might be troublesome for source storage system to decide the locking mechanism. > Support Container Storage Interface (CSI) in YARN > - > > Key: YARN-8811 > URL: https://issues.apache.org/jira/browse/YARN-8811 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: Support Container Storage Interface(CSI) in YARN_design > doc_20180921.pdf > > > The Container Storage Interface (CSI) is a vendor neutral interface to bridge > Container Orchestrators and Storage Providers. With the adoption of CSI in > YARN, it will be easier to integrate 3rd party storage systems, and provide > the ability to attach persistent volumes for stateful applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8815) Both RM in standby after restart(restart failure)
[ https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626062#comment-16626062 ] Rohith Sharma K S commented on YARN-8815: - [~Rakesh_Shah] Could you give bit more details does this happens while recovering running unmanaged am or finished unmanaged app? [~bibinchundatt] Could you clarify how YARN-5028 breaks? > Both RM in standby after restart(restart failure) > - > > Key: YARN-8815 > URL: https://issues.apache.org/jira/browse/YARN-8815 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Rakesh Shah >Priority: Critical > > > *while running a un managed am jar and restarting the RM - RM goes into > standby* > *Below is the exception trace--* > {noformat} > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) > 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at >
[jira] [Comment Edited] (YARN-7592) yarn.federation.failover.enabled missing in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625870#comment-16625870 ] Rahul Anand edited comment on YARN-7592 at 9/24/18 3:46 PM: Thanks [~bibinchundatt] and [~subru]. Removing *yarn.federation.enabled* from yarn-site.xml can solve this issue but would definitely create a confusion. So, instead of changing/removing a meaningful federation flag or updating doc, an alternative solution can be creation of a {{FederationCustomClientRMProxy}} which can override the {{ClientRMProxy#createRMProxy}} in {{AMRMClientUtils}} to always select *proxy provider* as {{FederationRMFailoverProxyProvider}} for federation. {code:java} public static T createRMProxy(final Configuration configuration, final Class protocol, UserGroupInformation user, final Token token) throws IOException { ... return FederationCustomClientRMProxy.createRMProxy(configuration, protocol); } ... } } {code} After this, we can remove the {{isFederationEnabled}} check from the {{RMProxy.java}} as before. {code:java} protected static T createRMProxy(final Configuration configuration, final Class protocol, RMProxy instance) throws IOException { ... RetryPolicy retryPolicy = createRetryPolicy(conf, (HAUtil.isHAEnabled(conf))); ... } {code} {code:java} protected static T createRMProxy(final Configuration configuration, final Class protocol, RMProxy instance, final long retryTime, final long retryInterval) throws IOException { ... RetryPolicy retryPolicy = createRetryPolicy(conf, retryTime, retryInterval, HAUtil.isHAEnabled(conf)); ... } {code} With this change we don't need to separately specify the *proxy provider* for HA and non-HA scenarios in case of federation while other non federation settings will continue as it is. was (Author: rahulanand90): Thanks [~bibinchundatt] and [~subru]. Removing *yarn.federation.enabled* from yarn-site.xml can solve this issue but would definitely create a confusion. So, instead of changing/removing a meaningful federation flag or updating doc, an alternative solution can be creation of a {{FederationCustomClientRMProxy}} which can override the {{ClientRMProxy#createRMProxy}} in {{AMRMClientUtils}} to always select *proxy provider* as {{FederationRMFailoverProxyProvider}} for federation. {code:java} public static T createRMProxy(final Configuration configuration, final Class protocol, UserGroupInformation user, final Token token) throws IOException { ... return FederationCustomClientRMProxy.createRMProxy(configuration, protocol); } ... } } {code} After this, we can remove the {{isFederationEnabled}} check from the {{RMProxy.java}} as before. {code:java} protected static T createRMProxy(final Configuration configuration, final Class protocol, RMProxy instance) throws IOException { ... RetryPolicy retryPolicy = createRetryPolicy(conf, (HAUtil.isHAEnabled(conf))); ... } {code} {code:java} protected static T createRMProxy(final Configuration configuration, final Class protocol, RMProxy instance, final long retryTime, final long retryInterval) throws IOException { ... RetryPolicy retryPolicy = createRetryPolicy(conf, retryTime, retryInterval, HAUtil.isHAEnabled(conf)); ... } {code} With this change, we don't need to seperately specify the *proxy provider* for HA and non-HA scenarios. > yarn.federation.failover.enabled missing in yarn-default.xml > > > Key: YARN-7592 > URL: https://issues.apache.org/jira/browse/YARN-7592 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.0.0-beta1 >Reporter: Gera Shegalov >Priority: Major > Attachments: IssueReproduce.patch > > > yarn.federation.failover.enabled should be documented in yarn-default.xml. I > am also not sure why it should be true by default and force the HA retry > policy in {{RMProxy#createRMProxy}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5939) FSDownload leaks FileSystem resources
[ https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625938#comment-16625938 ] Hadoop QA commented on YARN-5939: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 21s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 30s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: The patch generated 1 new + 90 unchanged - 1 fixed = 91 total (was 91) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 5s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 21s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 61m 41s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-5939 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941050/YARN-5939.004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux c9a6bf4aed86 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 32a35dc | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/21948/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21948/testReport/ | | Max. process+thread count | 300 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output |
[jira] [Updated] (YARN-8816) YARN Unit Tests Fail with Ubuntu VM
[ https://issues.apache.org/jira/browse/YARN-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8816: -- Description: {code} Linux apache-dev 4.15.0-34-generic #37~16.04.1-Ubuntu SMP Tue Aug 28 10:44:06 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux {code} {code} [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 3.926 s <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands [ERROR] testRemoveApplicationFromStateStoreCmdForZK(org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands) Time elapsed: 2.668 s <<< ERROR! java.lang.ExceptionInInitializerError at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304) at org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1828) at org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:710) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:660) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:286) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.serviceInit(MockRM.java:1381) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:164) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:143) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:139) at org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands.testRemoveApplicationFromStateStoreCmdForZK(TestRMStoreCommands.java:79) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413) Caused by: java.lang.NullPointerException at org.apache.hadoop.security.SecurityUtil$QualifiedHostResolver.(SecurityUtil.java:593) at org.apache.hadoop.security.SecurityUtil.setTokenServiceUseIp(SecurityUtil.java:129) at org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:102) at org.apache.hadoop.security.SecurityUtil.(SecurityUtil.java:88) ... 38 more {code} was: {code} [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 3.926 s <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands [ERROR] testRemoveApplicationFromStateStoreCmdForZK(org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands) Time elapsed: 2.668 s <<< ERROR!
[jira] [Created] (YARN-8816) YARN Unit Tests Fail with Ubuntu VM
BELUGA BEHR created YARN-8816: - Summary: YARN Unit Tests Fail with Ubuntu VM Key: YARN-8816 URL: https://issues.apache.org/jira/browse/YARN-8816 Project: Hadoop YARN Issue Type: Improvement Components: yarn Affects Versions: 3.2.0 Reporter: BELUGA BEHR {code} [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 3.926 s <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands [ERROR] testRemoveApplicationFromStateStoreCmdForZK(org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands) Time elapsed: 2.668 s <<< ERROR! java.lang.ExceptionInInitializerError at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304) at org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1828) at org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:710) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:660) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:286) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.serviceInit(MockRM.java:1381) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:164) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:143) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:139) at org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands.testRemoveApplicationFromStateStoreCmdForZK(TestRMStoreCommands.java:79) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413) Caused by: java.lang.NullPointerException at org.apache.hadoop.security.SecurityUtil$QualifiedHostResolver.(SecurityUtil.java:593) at org.apache.hadoop.security.SecurityUtil.setTokenServiceUseIp(SecurityUtil.java:129) at org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:102) at org.apache.hadoop.security.SecurityUtil.(SecurityUtil.java:88) ... 38 more {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7592) yarn.federation.failover.enabled missing in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625870#comment-16625870 ] Rahul Anand commented on YARN-7592: --- Thanks [~bibinchundatt] and [~subru]. Removing *yarn.federation.enabled* from yarn-site.xml can solve this issue but would definitely create a confusion. So, instead of changing/removing a meaningful federation flag or updating doc, an alternative solution can be creation of a {{FederationCustomClientRMProxy}} which can override the {{ClientRMProxy#createRMProxy}} in {{AMRMClientUtils}} to always select *proxy provider* as {{FederationRMFailoverProxyProvider}} for federation. {code:java} public static T createRMProxy(final Configuration configuration, final Class protocol, UserGroupInformation user, final Token token) throws IOException { ... return FederationCustomClientRMProxy.createRMProxy(configuration, protocol); } ... } } {code} After this, we can remove the {{isFederationEnabled}} check from the {{RMProxy.java}} as before. {code:java} protected static T createRMProxy(final Configuration configuration, final Class protocol, RMProxy instance) throws IOException { ... RetryPolicy retryPolicy = createRetryPolicy(conf, (HAUtil.isHAEnabled(conf))); ... } {code} {code:java} protected static T createRMProxy(final Configuration configuration, final Class protocol, RMProxy instance, final long retryTime, final long retryInterval) throws IOException { ... RetryPolicy retryPolicy = createRetryPolicy(conf, retryTime, retryInterval, HAUtil.isHAEnabled(conf)); ... } {code} With this change, we don't need to seperately specify the *proxy provider* for HA and non-HA scenarios. > yarn.federation.failover.enabled missing in yarn-default.xml > > > Key: YARN-7592 > URL: https://issues.apache.org/jira/browse/YARN-7592 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.0.0-beta1 >Reporter: Gera Shegalov >Priority: Major > Attachments: IssueReproduce.patch > > > yarn.federation.failover.enabled should be documented in yarn-default.xml. I > am also not sure why it should be true by default and force the HA retry > policy in {{RMProxy#createRMProxy}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5939) FSDownload leaks FileSystem resources
[ https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625853#comment-16625853 ] Antal Bálint Steinbach commented on YARN-5939: -- Hi [~cheersyang] , I saw this ticket is open for a while. I rebased the patch to apply to the current trunk. I was wondering how your test works? How to use the wrapper class easily? > FSDownload leaks FileSystem resources > - > > Key: YARN-5939 > URL: https://issues.apache.org/jira/browse/YARN-5939 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1, 2.7.3 >Reporter: liuxiangwei >Assignee: Weiwei Yang >Priority: Major > Labels: leak > Attachments: YARN-5939.004.patch, YARN-5939.01.patch, > YARN-5939.02.patch, YARN-5939.03.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Background > To use our self-defined FileSystem class, the item of configuration > "fs.%s.impl.disable.cache" should set to true. > In YARN's source code, the class named > "org.apache.hadoop.yarn.util.FSDownload" use getFileSystem but never close, > which leading to file descriptor leak because our self-defined FileSystem > class close the file descriptor when the close function is invoked. > My Question below: > 1. whether invoking "getFileSystem" but never close is YARN's expected > behavior > 2. what should we do in our self-defined FileSystem resolve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5939) FSDownload leaks FileSystem resources
[ https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Bálint Steinbach updated YARN-5939: - Attachment: YARN-5939.004.patch > FSDownload leaks FileSystem resources > - > > Key: YARN-5939 > URL: https://issues.apache.org/jira/browse/YARN-5939 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1, 2.7.3 >Reporter: liuxiangwei >Assignee: Weiwei Yang >Priority: Major > Labels: leak > Attachments: YARN-5939.004.patch, YARN-5939.01.patch, > YARN-5939.02.patch, YARN-5939.03.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Background > To use our self-defined FileSystem class, the item of configuration > "fs.%s.impl.disable.cache" should set to true. > In YARN's source code, the class named > "org.apache.hadoop.yarn.util.FSDownload" use getFileSystem but never close, > which leading to file descriptor leak because our self-defined FileSystem > class close the file descriptor when the close function is invoked. > My Question below: > 1. whether invoking "getFileSystem" but never close is YARN's expected > behavior > 2. what should we do in our self-defined FileSystem resolve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8657) User limit calculation should be read-lock-protected within LeafQueue
[ https://issues.apache.org/jira/browse/YARN-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625814#comment-16625814 ] Sunil Govindan commented on YARN-8657: -- {code:java} } finally { readLock.unlock(); }{code} We use the same now in this new method {{canAssignToUserWithCache}} , correct ? > User limit calculation should be read-lock-protected within LeafQueue > - > > Key: YARN-8657 > URL: https://issues.apache.org/jira/browse/YARN-8657 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8657.001.patch, YARN-8657.002.patch > > > When async scheduling is enabled, user limit calculation could be wrong: > It is possible that scheduler calculated a user_limit, but inside > {{canAssignToUser}} it becomes staled. > We need to protect user limit calculation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8815) Both RM in standby after restart(restart failure)
[ https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625793#comment-16625793 ] Sunil Govindan commented on YARN-8815: -- Thanks [~Rakesh_Shah] and [~bibinchundatt] I just saw this issue is marked for 3.2.0. Do we have a solution for this? If so, pls help to share the patch. From the description, this looks like a problem to me. Thanks. > Both RM in standby after restart(restart failure) > - > > Key: YARN-8815 > URL: https://issues.apache.org/jira/browse/YARN-8815 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Rakesh Shah >Priority: Critical > > > *while running a un managed am jar and restarting the RM - RM goes into > standby* > *Below is the exception trace--* > {noformat} > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) > 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at >
[jira] [Commented] (YARN-8657) User limit calculation should be read-lock-protected within LeafQueue
[ https://issues.apache.org/jira/browse/YARN-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625725#comment-16625725 ] Antal Bálint Steinbach commented on YARN-8657: -- Hi [~leftnoteasy] , Thanks for the patch. I ran into a very small issue while reading your patch. In line 1531 {code:java} try { readLock.lock();{code} it is a good pattern to do it like: {code:java} readLock.lock(); try {...} finally { readLock.unlock(); } {code} There are some threads around this on Stackoverflow. For example [https://stackoverflow.com/questions/31058681/java-locking-structure-best-pattern|http://example.com/] There are some more examples on this in the file, I just wanted to raise this while you did some modification around this. > User limit calculation should be read-lock-protected within LeafQueue > - > > Key: YARN-8657 > URL: https://issues.apache.org/jira/browse/YARN-8657 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8657.001.patch, YARN-8657.002.patch > > > When async scheduling is enabled, user limit calculation could be wrong: > It is possible that scheduler calculated a user_limit, but inside > {{canAssignToUser}} it becomes staled. > We need to protect user limit calculation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625710#comment-16625710 ] Hadoop QA commented on YARN-8468: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 15 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 30s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 14s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 17s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 38s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 8 new + 879 unchanged - 23 fixed = 887 total (was 902) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 23s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 71m 24s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}172m 17s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8468 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941024/YARN-8468.017.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs
[jira] [Commented] (YARN-8815) Both RM in standby after restart(restart failure)
[ https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625672#comment-16625672 ] Bibin A Chundatt commented on YARN-8815: Thank you [~Rakesh_Shah] for raising the issue Seems related to YARN-5028. Once the application is in FINAL state {{pruneAppState(ApplicationStateData appState)}} doesnt set the application is managed or unmanaged. > Both RM in standby after restart(restart failure) > - > > Key: YARN-8815 > URL: https://issues.apache.org/jira/browse/YARN-8815 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Rakesh Shah >Priority: Critical > > > *while running a un managed am jar and restarting the RM - RM goes into > standby* > *Below is the exception trace--* > {noformat} > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) > 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at >
[jira] [Updated] (YARN-8815) Both RM in standby after restart(restart failure)
[ https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8815: --- Priority: Critical (was: Major) > Both RM in standby after restart(restart failure) > - > > Key: YARN-8815 > URL: https://issues.apache.org/jira/browse/YARN-8815 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Rakesh Shah >Priority: Critical > > > *while running a un managed am jar and restarting the RM - RM goes into > standby* > *Below is the exception trace--* > {noformat} > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) > 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at >
[jira] [Updated] (YARN-8815) Both RM in standby after restart(restart failure)
[ https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8815: --- Target Version/s: 3.2.0 > Both RM in standby after restart(restart failure) > - > > Key: YARN-8815 > URL: https://issues.apache.org/jira/browse/YARN-8815 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Rakesh Shah >Priority: Critical > > > *while running a un managed am jar and restarting the RM - RM goes into > standby* > *Below is the exception trace--* > {noformat} > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) > 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, no resources requested > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at >
[jira] [Updated] (YARN-8815) Both RM in standby after restart(restart failure)
[ https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8815: --- Description: *while running a un managed am jar and restarting the RM - RM goes into standby* *Below is the exception trace--* {noformat} org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, no resources requested at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: Service RMActiveServices failed in state STARTED org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, no resources requested at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) {noformat} was: *while running a un managed am jar and restarting the RM - RM goes into standby* *Below is the exception trace--* org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, no resources requested at
[jira] [Created] (YARN-8815) Both RM in standby after restart(restart failure)
Rakesh Shah created YARN-8815: - Summary: Both RM in standby after restart(restart failure) Key: YARN-8815 URL: https://issues.apache.org/jira/browse/YARN-8815 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.1.1 Reporter: Rakesh Shah *while running a un managed am jar and restarting the RM - RM goes into standby* *Below is the exception trace--* org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, no resources requested at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: Service RMActiveServices failed in state STARTED org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, no resources requested at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For
[jira] [Commented] (YARN-8657) User limit calculation should be read-lock-protected within LeafQueue
[ https://issues.apache.org/jira/browse/YARN-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625623#comment-16625623 ] Sunil Govindan commented on YARN-8657: -- [~cheersyang] cud u pls check latest patch > User limit calculation should be read-lock-protected within LeafQueue > - > > Key: YARN-8657 > URL: https://issues.apache.org/jira/browse/YARN-8657 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8657.001.patch, YARN-8657.002.patch > > > When async scheduling is enabled, user limit calculation could be wrong: > It is possible that scheduler calculated a user_limit, but inside > {{canAssignToUser}} it becomes staled. > We need to protect user limit calculation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8657) User limit calculation should be read-lock-protected within LeafQueue
[ https://issues.apache.org/jira/browse/YARN-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625616#comment-16625616 ] Hadoop QA commented on YARN-8657: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 27s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 32s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 42 unchanged - 2 fixed = 43 total (was 44) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 25s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 16s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}124m 23s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerMultiNodes | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8657 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941021/YARN-8657.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 5f0e73189a29 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 32a35dc | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle |
[jira] [Commented] (YARN-7957) [UI2] Yarn service delete option disappears after stopping application
[ https://issues.apache.org/jira/browse/YARN-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625577#comment-16625577 ] Hadoop QA commented on YARN-7957: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 27m 24s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 49s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 39m 42s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-7957 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941023/YARN-7957.002.patch | | Optional Tests | dupname asflicense shadedclient | | uname | Linux ccc4b8c7a09e 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 32a35dc | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 406 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21946/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > [UI2] Yarn service delete option disappears after stopping application > -- > > Key: YARN-7957 > URL: https://issues.apache.org/jira/browse/YARN-7957 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Akhil PB >Priority: Critical > Attachments: YARN-7957.001.patch, YARN-7957.002.patch > > > Steps: > 1) Launch yarn service > 2) Go to service page and click on Setting button->"Stop Service". The > application will be stopped. > 3) Refresh page > Here, setting button disappears. Thus, user can not delete service from UI > after stopping application > Expected behavior: > Setting button should be present on UI page after application is stopped. If > application is stopped, setting button should only have "Delete Service" > action available. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625559#comment-16625559 ] Antal Bálint Steinbach commented on YARN-8468: -- Hi [~cheersyang] , Thanks for your additional suggestions. 1) Fixed - very good point thx 2) Fixed Further checkstyle issues fixed. Quite hard to find these manually as I cannot use auto-format or import organization. As for your question on the normalization topic. Maybe [~haibochen] can give some more details on this because he is a bit more deeper in FS. > Enable the use of queue based maximum container allocation limit and > implement it in FairScheduler > -- > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Attachments: YARN-8468.000.patch, YARN-8468.001.patch, > YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, > YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, > YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, > YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch, > YARN-8468.014.patch, YARN-8468.015.patch, YARN-8468.016.patch, > YARN-8468.017.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > The goal of this ticket is to allow this value to be set on a per queue basis. > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > Suggested solution: > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability(String queueName) in both > FSParentQueue and FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * Enforce the use of queue based maximum allocation limit if it is > available, if not use the general scheduler level setting > ** Use it during validation and normalization of requests in > scheduler.allocate, app submit and resource request -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Bálint Steinbach updated YARN-8468: - Attachment: YARN-8468.017.patch > Enable the use of queue based maximum container allocation limit and > implement it in FairScheduler > -- > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Attachments: YARN-8468.000.patch, YARN-8468.001.patch, > YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, > YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, > YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, > YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch, > YARN-8468.014.patch, YARN-8468.015.patch, YARN-8468.016.patch, > YARN-8468.017.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > The goal of this ticket is to allow this value to be set on a per queue basis. > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > Suggested solution: > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability(String queueName) in both > FSParentQueue and FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * Enforce the use of queue based maximum allocation limit if it is > available, if not use the general scheduler level setting > ** Use it during validation and normalization of requests in > scheduler.allocate, app submit and resource request -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7957) [UI2] Yarn service delete option disappears after stopping application
[ https://issues.apache.org/jira/browse/YARN-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625539#comment-16625539 ] Akhil PB commented on YARN-7957: Thanks [~sunilg] for your comments. Attached v2 patch with the above changes. > [UI2] Yarn service delete option disappears after stopping application > -- > > Key: YARN-7957 > URL: https://issues.apache.org/jira/browse/YARN-7957 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Akhil PB >Priority: Critical > Attachments: YARN-7957.001.patch, YARN-7957.002.patch > > > Steps: > 1) Launch yarn service > 2) Go to service page and click on Setting button->"Stop Service". The > application will be stopped. > 3) Refresh page > Here, setting button disappears. Thus, user can not delete service from UI > after stopping application > Expected behavior: > Setting button should be present on UI page after application is stopped. If > application is stopped, setting button should only have "Delete Service" > action available. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7957) [UI2] Yarn service delete option disappears after stopping application
[ https://issues.apache.org/jira/browse/YARN-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akhil PB updated YARN-7957: --- Attachment: YARN-7957.002.patch > [UI2] Yarn service delete option disappears after stopping application > -- > > Key: YARN-7957 > URL: https://issues.apache.org/jira/browse/YARN-7957 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Akhil PB >Priority: Critical > Attachments: YARN-7957.001.patch, YARN-7957.002.patch > > > Steps: > 1) Launch yarn service > 2) Go to service page and click on Setting button->"Stop Service". The > application will be stopped. > 3) Refresh page > Here, setting button disappears. Thus, user can not delete service from UI > after stopping application > Expected behavior: > Setting button should be present on UI page after application is stopped. If > application is stopped, setting button should only have "Delete Service" > action available. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org